From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/91775 Path: news.gmane.org!not-for-mail From: luigi scarso Newsgroups: gmane.comp.tex.context Subject: Re: Accessibility and Tagged PDFs: Bugs and Feature Requests Date: Tue, 30 Jun 2015 10:32:29 +0200 Message-ID: References: Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1052876322==" X-Trace: ger.gmane.org 1435653247 21880 80.91.229.3 (30 Jun 2015 08:34:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 30 Jun 2015 08:34:07 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Tue Jun 30 10:33:57 2015 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([5.39.185.229]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Z9qzL-0004yi-N8 for gctc-ntg-context-518@m.gmane.org; Tue, 30 Jun 2015 10:33:55 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id AE3DA10232 for ; Tue, 30 Jun 2015 10:33:54 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id CA1B_TNHUc1t for ; Tue, 30 Jun 2015 10:33:53 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [IPv6:::1]) by balder.ntg.nl (Postfix) with ESMTP id 8E25B10248 for ; Tue, 30 Jun 2015 10:33:14 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id DFFC510200 for ; Tue, 30 Jun 2015 10:33:10 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id G1rJFwxzTnOB for ; Tue, 30 Jun 2015 10:33:09 +0200 (CEST) Original-Received: from filter3-utr.mf.surf.net (filter3-utr.mf.surf.net [195.169.124.154]) by balder.ntg.nl (Postfix) with ESMTP id 628F7101F7 for ; Tue, 30 Jun 2015 10:33:09 +0200 (CEST) Original-Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) by filter3-utr.mf.surf.net (8.14.3/8.14.3/Debian-9.4) with ESMTP id t5U8WU8Y024147 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 30 Jun 2015 10:32:42 +0200 Original-Received: by wiwl6 with SMTP id l6so123818148wiw.0 for ; Tue, 30 Jun 2015 01:32:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=TYXBbV/cbX+bfWZDZxue3Po+lLrrvypV1aCWbMWtnVE=; b=UbxPdCwm/VI2JBFa29FORwLwSLeve9zMJYK9JNJYS+u2CpMOgC42iqGZJjUT3/iizL K9B7tbTBq+OM3fFa0Eu0KZjSTgphA2YIjBVV2fIWnibgXLeVIqI4LCnbBLMzZNLfLIiE qwsElXE/97IC78PZ7s5IYkRvAwXWSECc7IT3EDJZPBrFf83VIqDI/n4wiMSykwqooeep SV8IhJFDYcjIuXoHvD/Luwkfxl8TxjpYCaQBQBo4MuEDFNZhuGK6x6f+LA/Cnee0uPw4 rGfiAHrLlkWlV4tpMk0yLIzNMxodttdHzLWJpcsrL319XQtqQk04z05OBqHVtHhjA5e4 +0Qg== X-Received: by 10.180.95.10 with SMTP id dg10mr30538224wib.41.1435653149861; Tue, 30 Jun 2015 01:32:29 -0700 (PDT) Original-Received: by 10.194.200.106 with HTTP; Tue, 30 Jun 2015 01:32:29 -0700 (PDT) In-Reply-To: X-Bayes-Prob: 0.5 (Score 0, tokens from: ntg-context@ntg.nl, base:default, @@RPTN) X-CanIt-Geo: ip=2a00:1450:400c:c05::234; country=BE; region=Brussels Capital; city=Brussels; latitude=50.8466; longitude=4.3528; http://maps.google.com/maps?q=50.8466,4.3528&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 08OKwwAvf - 8ad0bcd5525b - 20150630 (trained as not-spam) Received-SPF: pass (filter3-utr.mf.surf.net: domain of luigi.scarso@gmail.com designates 2a00:1450:400c:c05::234 as permitted sender) receiver=filter3-utr.mf.surf.net; client-ip=2a00:1450:400c:c05::234; envelope-from=; helo=mail-wi0-x234.google.com; identity=mailfrom X-Scanned-By: CanIt (www . roaringpenguin . com) X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.16 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.org gmane.comp.tex.context:91775 Archived-At: --===============1052876322== Content-Type: multipart/alternative; boundary=f46d04447f8304ad1e0519b80c09 --f46d04447f8304ad1e0519b80c09 Content-Type: text/plain; charset=UTF-8 On Sun, Jun 28, 2015 at 12:59 PM, Dr. Dominik Klein < Dominik.Klein@outlook.com> wrote: > Context is the only Tex-based system that allows to properly tag a pdf. > Tagged PDFs are one major requirement for accessibility. > > Indeed, in several large organizations/universities, accessibility is > mandated by law, and this is a major obstacle for using Tex. In practice > compliance is often assessed with Acrobat Pro's > accessibility checker. > > Context produces a nice tag-structure, but there are some minor issues > that prevent compliance to [1], and hence Acrobat Pro complains during the > check. The main issues are: > > 1.) Elements that are not contained in the structure tree are not marked > as an artifact. Consider this example: > > ------------------------------- > \setuptagging[state=start] > > \setuppagenumbering > [location=, > alternative=doublesided] > > \setupheadertexts > [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}] > [{Header Right}] > [{Header Left}] > [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}] > > \setupfootertexts > [Organization Name] > [pagenumber] > [pagenumber] > [Organization Name] > > \starttext > \startfrontmatter > something > \stopfrontmatter > > \startbodymatter > some more text here > \stopbodymatter > \stoptext > ------------------------------- > > Header, footer, pagenumber etc. will not be included in the tag structure. > Of course this makes absolutely sense and is correct, however according to > Section 14.8.2.2.2 of [1], then this content that is not in the structure > tree should be marked as an artifact, i.e. > > /Artifact > BMC > .. > EMC > > or in an advanced way with /Artifact PropertyList where the type of > Artifact can be defined. It would be nice if those elements that are not > included in the tag tree would be marked as artifacts by default. The same > holds for \startelement[ignore] when one wants to explicitly remove > something from the structure tree. > > 2.) Images without alternate text: > According to Section 14.9.3 of [1], alternate descriptions in human > readable text should be provided for images. It would be really helpful, > if these could be defined in the source tex file, and then automatically > added when creating the object in the structure tree. I.e. it would be > nice to have something like: > > \placefigure[top][Image Reference]{Caption}{ > \externalfigure[cow.pdf][width=10cm][alternate text = "This images shows a > beautiful cow."] > } > > The same holds for formulas: Whereas the mathml-like tagging of Context is > very advanced, sometimes it might be still helpful to supply a textual > description (alt-text ="The definition of the Pythagorean theorem: a^2 + > b^2 = c^2") > > 3.) Tag names of the resulting tag structure: > Section 14.8.4 of [1] defines standard structure types, such as ,

, > etc. Context creates a tag-tree that uses names directly > representing the structure names of the context laguage, such as > . This should however be mapped to something standard, such > as . Interestingly these mappings seem to have been considered in > strc-tag.mkiv but I was unable to generate such a tagged pdf. > Editing/Outcommenting things in strc-tag.mkiv didn't work for me. It would > be nice if there was a switch somewhere, i.e. > \setuptagging[state=start,tagnames=pdf17] - or maybe I overlooked something? > > 4.) Acrobat Pro always complains that the language for the whole document > is not set. > > 5.) Tables > The generated structure looks something like this: > > > > ... > > > ... > > Here, not only are the tag names non-compliant, also the tag structure > should distinguish between the table header (THead), and table rows > (TBody), c.f. Section 14.8.4.3.1 of [1]. A simple heuristic would be > to always put the first line into THead tags, and the rest of the able > into TBody. > > 6.) It would be nice if a flat tag structure could be created optionally. > This is not a required feature according to [1], and in fact a properly > nested structure is surely preferable for the final output; for debugging > or checking during document creation however, a flat structure tree > sometimes is easier to browse through. > > All in all, these seem to be the only issues that prevent accessible PDF > documents with context. For those within an organization where > accessibility is required legally for all publications, compliance to at > least Acrobat Pro's checks is a huge issue. I do not know how difficult > these things are to implement in Context (personally I am just lost in the > code), but looking at e.g. tex.stackexchange > for question related to accessibility, this is indeed a major obstacle for > several people. > > cheers > > - Dominik > > > [1] ISO 32000-1:2008, available at > http://www.adobe.com/devnet/pdf/pdf_reference.html > > ___________________________________________________________________________________ > Thank you for the report . It would be nice to have a pdf made by context using \nopdfcompression that have all these issues together with the report emitted by acrobat. Last time I have checked a pfd/a-1a made by context it was all ok, but it was time ago and maybe not all the features were tested deeply. -- luigi --f46d04447f8304ad1e0519b80c09 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Sun, Jun 28, 2015 at 12:59 PM, Dr. Dominik Klein &= lt;Dominik.K= lein@outlook.com> wrote:
Context is the o= nly Tex-based system that allows to properly tag a pdf. Tagged PDFs are one= major requirement for accessibility.

Indeed, in several large organizations/universities, accessibility is manda= ted by law, and this is a major obstacle for using Tex. In practice complia= nce is often assessed with Acrobat Pro's
accessibility checker.

Context produces a nice tag-structure, but there are some minor issues that= prevent compliance to [1], and hence Acrobat Pro complains during the chec= k. The main issues are:

1.) Elements that are not contained in the structure tree are not marked as= an artifact. Consider this example:

-------------------------------
\setuptagging[state=3Dstart]

\setuppagenumbering
[location=3D,
=C2=A0alternative=3Ddoublesided]

\setupheadertexts
=C2=A0 [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}]<= br> =C2=A0 [{Header Right}]
=C2=A0 [{Header Left}]
=C2=A0 [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}]<= br>
\setupfootertexts
=C2=A0 [Organization Name]
=C2=A0 [pagenumber]
=C2=A0 [pagenumber]
=C2=A0 [Organization Name]

\starttext
\startfrontmatter
something
\stopfrontmatter

\startbodymatter
some more text here
\stopbodymatter
\stoptext
-------------------------------

Header, footer, pagenumber etc. will not be included in the tag structure. = Of course this makes absolutely sense and is correct, however according to = Section 14.8.2.2.2 of [1], then this content that is not in the structure t= ree should be marked as an artifact, i.e.

/Artifact
=C2=A0 BMC
=C2=A0 ..
=C2=A0 EMC

or in an advanced way with /Artifact PropertyList where the type of Artifac= t can be defined. It would be nice if those elements that are not included = in the tag tree would be marked as artifacts by default. The same holds for= \startelement[ignore] when one wants to explicitly remove something from t= he structure tree.

2.) Images without alternate text:
According to Section 14.9.3 of [1], alternate descriptions in human readabl= e text should be provided for images. It would be really helpful,
if these could be defined in the source tex file, and then automatically added when creating the object in the structure tree. I.e. it would be
nice to have something like:

\placefigure[top][Image Reference]{Caption}{
\externalfigure[cow.pdf][width=3D10cm][alternate text =3D "This images= shows a beautiful cow."]
}

The same holds for formulas: Whereas the mathml-like tagging of Context is = very advanced, sometimes it might be still helpful to supply a textual desc= ription (alt-text =3D"The definition of the Pythagorean theorem: a^2 += b^2 =3D c^2")

3.) Tag names of the resulting tag structure:
Section 14.8.4 of [1] defines standard structure types, such as <H>, = <P>, <Sect> etc. Context creates a tag-tree that uses names dir= ectly representing the structure names of the context laguage, such as <= sectiontitle>. This should however be mapped to something standard, such= as <H>. Interestingly these mappings seem to have been considered in= strc-tag.mkiv but I was unable to generate such a tagged pdf. Editing/Outc= ommenting things in strc-tag.mkiv didn't work for me. It would be nice = if there was a switch somewhere, i.e. \setuptagging[state=3Dstart,tagnames= =3Dpdf17] - or maybe I overlooked something?

4.) Acrobat Pro always complains that the language for the whole document i= s not set.

5.) Tables
The generated structure looks something like this:
<table>
=C2=A0<tablerow>
=C2=A0 =C2=A0<tablecell>
=C2=A0 =C2=A0...
=C2=A0<tablerow>
=C2=A0 =C2=A0<tablecell>
=C2=A0...

Here, not only are the tag names non-compliant, also the tag structure
should distinguish between the table header (THead), and table rows (TBody)= , c.f. Section 14.8.4.3.1 of [1]. A simple heuristic would be
to always put the first line into THead tags, and the rest of the able into= TBody.

6.) It would be nice if a flat tag structure could be created optionally. T= his is not a required feature according to [1], and in fact a properly nest= ed structure is surely preferable for the final output; for debugging or ch= ecking during document creation however, a flat structure tree sometimes is= easier to browse through.

All in all, these seem to be the only issues that prevent accessible PDF do= cuments with context. For those within an organization where accessibility = is required legally for all publications, compliance to at least Acrobat Pr= o's checks is a huge issue. I do not know how difficult these things ar= e to implement in Context (personally I am just lost in the code), but look= ing at e.g. tex.stackexchange
for question related to accessibility, this is indeed a major obstacle for = several people.

cheers

- Dominik


[1] ISO 32000-1:2008, available at
http://www.adobe.com/devnet/pdf/pdf_reference.html=
___________________________________________________________________________= ________



= Thank you for the report .
It would be nice to have a pdf made by= context using \nopdfcompression
that have all these issues toget= her with the report emitted by acrobat.
Last time I have checked = a pfd/a-1a made by context it was all ok, but it was time ago and maybe not=
all the features were tested deeply.

-= -
luigi
--f46d04447f8304ad1e0519b80c09-- --===============1052876322== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cDovL3d3dy5udGcubmwvbWFpbG1hbi9s aXN0aW5mby9udGctY29udGV4dAp3ZWJwYWdlICA6IGh0dHA6Ly93d3cucHJhZ21hLWFkZS5ubCAv IGh0dHA6Ly90ZXguYWFuaGV0Lm5ldAphcmNoaXZlICA6IGh0dHA6Ly9mb3VuZHJ5LnN1cGVsZWMu ZnIvcHJvamVjdHMvY29udGV4dHJldi8Kd2lraSAgICAgOiBodHRwOi8vY29udGV4dGdhcmRlbi5u ZXQKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX18= --===============1052876322==--