ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Arthur Reutenauer <arthur.reutenauer@normalesup.org>
To: Mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: PDF Meta Tags
Date: Tue, 20 Jan 2009 16:19:04 +0100	[thread overview]
Message-ID: <20090120151904.GO22175@phare.normalesup.org> (raw)
In-Reply-To: <200901200651.36129.bntgcontext@wiseguysweb.com>

>                                                 But he did say that his 
> printing shop wants the ability to download just the header information from a 
> pdf rather than the whole pdf file which may be up to 80 mbytes.

  OK.  That's not Tagged PDF.  Tagged PDF's main features focus on
accessibility, adding information for the visually impaired (you can, for
example, tag some text as part of the page header, by contrast to the
page body: an application that reads the document out loud would know
not to read that part).  It also allows better archiving (the PDF/A
standard).  All concerns very distinct from the needs of publishers.

  I'm just learning about XMP (Extensible Metadata Platform) which Luigi
mentioned, but it doesn't really look like it contains the information
you mention (although you can apparently add all sort of metadata,
including images).

  Actually, the kind of information the printing shop asks for is
available in any PDF file in a straightforward way: the very format has
been designed so that all the PDF objects can be accessed directly with
extreme efficience (there is a cross-reference table with the byte
offsets to every object inside the file).  Individual pages are objects
in a PDF file; they contain references to the resources needed to render
them (fonts, images, etc.), so the basic functionality to render each
page individually is already present in the format.  And it's been there
from day one -- which is, by the way, the reason why the insides of a
PDF file look so undecipherable to the human eye: it's designed to be
efficient to process automatically, not to be read by a programmer.  By
contrast, an XML-based format would be (somewhat) more human-friendly,
but much slower to parse.

  There's a variation on this basic feature: if you look at a PDF file
over the Internet, the cross-reference table isn't conveniently located
because it is at the very end of the file; so you need to download the
entire file before your PDF viewer can start displaying it (I think the
argument behind that design decision was that a PDF-producing
application only knows the entire list of objects at the end of the
first pass, and can thus output the whole file sequentially in a single
pass.  Of course that clashes directly with the needs of PDF-consuming
applications).  To circumvent this, Adobe devised a special type of
object that contains the same information as the cross-reference table,
which you can put at the very beginning of the file, together with the
material needed to render the first pages.  This is Linearized PDF
(sometimes, confusingly enough, called "optimized" PDF).  It's rather
unlikely that it'd be what your printer wants (I suppose the file is
already available on disk somewhere), but in any case, Ghostscript can
produce it with the utility pdfopt.  ConTeXt isn't able to produce it;
it has been ruled that it was beyond the scope of pdfTeX and luaTeX.

> When I get specific information from the printing shop, I'll pass it along.

  I'm interested, too.

> But needless to say, I'm very concerned.  If tagged pdf support is not 
> available in ConTeXt/LuaTeX, I feel that difficulties are either here now, or at 
> best, looming on the horizon.

  Why?  There's progress made every day.  Tagged PDF is indeed a problem
for the moment, but it's clearly not the feature your printer asks for,
and as a rule, you can be sure that if some functionality is essential
to publishers, it will be added quickly to ConTeXt :-)

	Arthur
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


  parent reply	other threads:[~2009-01-20 15:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-20  5:06 Bart C. Wise
2009-01-20  5:53 ` Bart C. Wise
2009-01-20  8:12   ` luigi scarso
2009-01-20  8:30 ` Martin Schröder
2009-01-20  8:37   ` luigi scarso
2009-01-20 11:10   ` Arthur Reutenauer
2009-01-20 11:33     ` luigi scarso
2009-01-20 13:51       ` Bart C. Wise
2009-01-20 13:59         ` luigi scarso
2009-01-20 14:02         ` Martin Schröder
2009-01-20 14:06         ` Henning Hraban Ramm
2009-01-20 14:30           ` Bart C. Wise
2009-01-21  9:04             ` luigi scarso
2009-01-21 11:51             ` Henning Hraban Ramm
2009-01-20 15:19         ` Arthur Reutenauer [this message]
2009-01-20 15:26           ` luigi scarso
2009-01-20 16:00           ` Bart C. Wise
2010-06-28 14:15           ` Tagged PDF Steffen Wolfrum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090120151904.GO22175@phare.normalesup.org \
    --to=arthur.reutenauer@normalesup.org \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).