From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/46902 Path: news.gmane.org!not-for-mail From: "Bart C. Wise" Newsgroups: gmane.comp.tex.context Subject: Re: PDF Meta Tags Date: Tue, 20 Jan 2009 09:00:45 -0700 Message-ID: <200901200900.45831.bntgcontext@wiseguysweb.com> References: <200901192206.37977.bntgcontext@wiseguysweb.com> <200901200651.36129.bntgcontext@wiseguysweb.com> <20090120151904.GO22175@phare.normalesup.org> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0189212018==" X-Trace: ger.gmane.org 1232467417 26052 80.91.229.12 (20 Jan 2009 16:03:37 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 20 Jan 2009 16:03:37 +0000 (UTC) To: ntg-context@ntg.nl Original-X-From: ntg-context-bounces@ntg.nl Tue Jan 20 17:04:48 2009 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1LPJ5Q-0002fy-NV for gctc-ntg-context-518@m.gmane.org; Tue, 20 Jan 2009 17:04:20 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 1114A1FE09; Tue, 20 Jan 2009 17:03:01 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 24467-01; Tue, 20 Jan 2009 17:01:41 +0100 (CET) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id A46221FDE1; Tue, 20 Jan 2009 17:01:40 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id BE4761FDE1 for ; Tue, 20 Jan 2009 17:01:37 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 22861-01-5 for ; Tue, 20 Jan 2009 17:00:59 +0100 (CET) Original-Received: from filter1-nij.mf.surf.net (filter1-nij.mf.surf.net [195.169.124.152]) by ronja.ntg.nl (Postfix) with ESMTP id F3C9B1FD93 for ; Tue, 20 Jan 2009 17:00:58 +0100 (CET) Original-Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by filter1-nij.mf.surf.net (8.13.8/8.13.8/Debian-3) with ESMTP id n0KG0u10010561 for ; Tue, 20 Jan 2009 17:00:57 +0100 Original-Received: from mx04.mta.xmission.com ([166.70.13.214]) by out02.mta.xmission.com with esmtp (Exim 4.62) (envelope-from ) id 1LPJ26-0004cW-TX for ntg-context@ntg.nl; Tue, 20 Jan 2009 09:00:54 -0700 Original-Received: from bwise1.provo.novell.com ([137.65.171.141] helo=bwise1.localnet) by mx04.mta.xmission.com with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1LPJ25-0003h2-6Z for ntg-context@ntg.nl; Tue, 20 Jan 2009 09:00:54 -0700 User-Agent: KMail/1.10.4 (Linux/2.6.27-11-generic; KDE/4.1.4; x86_64; ; ) In-Reply-To: <20090120151904.GO22175@phare.normalesup.org> X-XM-SPF: eid=; ; ; mid=; ; ; hst=mx04.mta.xmission.com; ; ; ip=137.65.171.141; ; ; frm=bntgcontext@wiseguysweb.com; ; ; spf=none X-SA-Exim-Connect-IP: 137.65.171.141 X-SA-Exim-Rcpt-To: ntg-context@ntg.nl X-SA-Exim-Mail-From: bntgcontext@wiseguysweb.com X-Spam-DCC: XMission; sa01 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;ntg-context@ntg.nl X-Spam-Relay-Country: X-SA-Exim-Version: 4.2.1 (built Thu, 07 Dec 2006 04:40:56 +0000) X-SA-Exim-Scanned: Yes (on mx04.mta.xmission.com) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=166.70.13.232; country=US; region=UT; city=Salt Lake City; latitude=40.7242; longitude=-111.8787; metrocode=770; areacode=801; http://maps.google.com/maps?q=40.7242,-111.8787&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 168128819 - d46955b95e55 X-Scanned-By: CanIt (www . roaringpenguin . com) on 195.169.124.152 X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.11 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:46902 Archived-At: --===============0189212018== Content-Type: multipart/alternative; boundary="Boundary-00=_tUfdJ3hgeyqAaoo" Content-Transfer-Encoding: 7bit Content-Disposition: inline --Boundary-00=_tUfdJ3hgeyqAaoo Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit On Tuesday 20 January 2009 08:19:04 am Arthur Reutenauer wrote: > > But he did say that his > > printing shop wants the ability to download just the header information > > from a pdf rather than the whole pdf file which may be up to 80 mbytes. > > OK. That's not Tagged PDF. Tagged PDF's main features focus on > accessibility, adding information for the visually impaired (you can, for > example, tag some text as part of the page header, by contrast to the > page body: an application that reads the document out loud would know > not to read that part). It also allows better archiving (the PDF/A > standard). All concerns very distinct from the needs of publishers. > > I'm just learning about XMP (Extensible Metadata Platform) which Luigi > mentioned, but it doesn't really look like it contains the information > you mention (although you can apparently add all sort of metadata, > including images). > > Actually, the kind of information the printing shop asks for is > available in any PDF file in a straightforward way: the very format has > been designed so that all the PDF objects can be accessed directly with > extreme efficience (there is a cross-reference table with the byte > offsets to every object inside the file). Individual pages are objects > in a PDF file; they contain references to the resources needed to render > them (fonts, images, etc.), so the basic functionality to render each > page individually is already present in the format. And it's been there > from day one -- which is, by the way, the reason why the insides of a > PDF file look so undecipherable to the human eye: it's designed to be > efficient to process automatically, not to be read by a programmer. By > contrast, an XML-based format would be (somewhat) more human-friendly, > but much slower to parse. > > There's a variation on this basic feature: if you look at a PDF file > over the Internet, the cross-reference table isn't conveniently located > because it is at the very end of the file; so you need to download the > entire file before your PDF viewer can start displaying it (I think the > argument behind that design decision was that a PDF-producing > application only knows the entire list of objects at the end of the > first pass, and can thus output the whole file sequentially in a single > pass. Of course that clashes directly with the needs of PDF-consuming > applications). To circumvent this, Adobe devised a special type of > object that contains the same information as the cross-reference table, > which you can put at the very beginning of the file, together with the > material needed to render the first pages. This is Linearized PDF > (sometimes, confusingly enough, called "optimized" PDF). It's rather > unlikely that it'd be what your printer wants (I suppose the file is > already available on disk somewhere), but in any case, Ghostscript can > produce it with the utility pdfopt. ConTeXt isn't able to produce it; > it has been ruled that it was beyond the scope of pdfTeX and luaTeX. > > > When I get specific information from the printing shop, I'll pass it > > along. > > I'm interested, too. > > > But needless to say, I'm very concerned. If tagged pdf support is not > > available in ConTeXt/LuaTeX, I feel that difficulties are either here > > now, or at best, looming on the horizon. > > Why? There's progress made every day. Tagged PDF is indeed a problem > for the moment, but it's clearly not the feature your printer asks for, > and as a rule, you can be sure that if some functionality is essential > to publishers, it will be added quickly to ConTeXt :-) > Thanks again to all for the responses. The information has been very enlightening. I have sent an optimized (I know, badly named,) PDF file off to the publisher and I'm waiting for his response. From all indications on this thread, I'm somewhat optimistic that it will solve the problem. I'll let you know what I hear back. Thanks so much again, Bart --Boundary-00=_tUfdJ3hgeyqAaoo Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: 7bit

On Tuesday 20 January 2009 08:19:04 am Arthur Reutenauer wrote:

> > But he did say that his

> > printing shop wants the ability to download just the header information

> > from a pdf rather than the whole pdf file which may be up to 80 mbytes.

>

> OK. That's not Tagged PDF. Tagged PDF's main features focus on

> accessibility, adding information for the visually impaired (you can, for

> example, tag some text as part of the page header, by contrast to the

> page body: an application that reads the document out loud would know

> not to read that part). It also allows better archiving (the PDF/A

> standard). All concerns very distinct from the needs of publishers.

>

> I'm just learning about XMP (Extensible Metadata Platform) which Luigi

> mentioned, but it doesn't really look like it contains the information

> you mention (although you can apparently add all sort of metadata,

> including images).

>

> Actually, the kind of information the printing shop asks for is

> available in any PDF file in a straightforward way: the very format has

> been designed so that all the PDF objects can be accessed directly with

> extreme efficience (there is a cross-reference table with the byte

> offsets to every object inside the file). Individual pages are objects

> in a PDF file; they contain references to the resources needed to render

> them (fonts, images, etc.), so the basic functionality to render each

> page individually is already present in the format. And it's been there

> from day one -- which is, by the way, the reason why the insides of a

> PDF file look so undecipherable to the human eye: it's designed to be

> efficient to process automatically, not to be read by a programmer. By

> contrast, an XML-based format would be (somewhat) more human-friendly,

> but much slower to parse.

>

> There's a variation on this basic feature: if you look at a PDF file

> over the Internet, the cross-reference table isn't conveniently located

> because it is at the very end of the file; so you need to download the

> entire file before your PDF viewer can start displaying it (I think the

> argument behind that design decision was that a PDF-producing

> application only knows the entire list of objects at the end of the

> first pass, and can thus output the whole file sequentially in a single

> pass. Of course that clashes directly with the needs of PDF-consuming

> applications). To circumvent this, Adobe devised a special type of

> object that contains the same information as the cross-reference table,

> which you can put at the very beginning of the file, together with the

> material needed to render the first pages. This is Linearized PDF

> (sometimes, confusingly enough, called "optimized" PDF). It's rather

> unlikely that it'd be what your printer wants (I suppose the file is

> already available on disk somewhere), but in any case, Ghostscript can

> produce it with the utility pdfopt. ConTeXt isn't able to produce it;

> it has been ruled that it was beyond the scope of pdfTeX and luaTeX.

>

> > When I get specific information from the printing shop, I'll pass it

> > along.

>

> I'm interested, too.

>

> > But needless to say, I'm very concerned. If tagged pdf support is not

> > available in ConTeXt/LuaTeX, I feel that difficulties are either here

> > now, or at best, looming on the horizon.

>

> Why? There's progress made every day. Tagged PDF is indeed a problem

> for the moment, but it's clearly not the feature your printer asks for,

> and as a rule, you can be sure that if some functionality is essential

> to publishers, it will be added quickly to ConTeXt :-)

>

Thanks again to all for the responses. The information has been very enlightening.

I have sent an optimized (I know, badly named,) PDF file off to the publisher and I'm waiting for his response. From all indications on this thread, I'm somewhat optimistic that it will solve the problem. I'll let you know what I hear back.

Thanks so much again,

Bart

--Boundary-00=_tUfdJ3hgeyqAaoo-- --===============0189212018== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============0189212018==--