Re: unicode in bookmarks

ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed

* Re: unicode in bookmarks
       [not found] <Pine.BSF.3.96.1001127035447.9316A-100000@vertigo.fme.vutbr .cz>
@ 2000-11-27 10:20 ` Hans Hagen
  2000-11-27 15:28 ` Hans Hagen
  1 sibling, 0 replies; 9+ messages in thread
From: Hans Hagen @ 2000-11-27 10:20 UTC (permalink / raw)
  Cc: ntg-context

At 04:07 AM 11/27/00 +0100, Petr Ferdus wrote:
>Hi 
>does someone know, if it is possible to put unicode characters to
>bookmarks geterated by Context? Or more generally if the output of
>routines producing bookmark strins could be made unicode encoded. I would
>like to introduce some accented (czech) characters there. It seems
>to be possible only while they are entered in unicode.

I must admit that i never looked into it, but it should not be that hard to
implement and probabbly involved some parsing and mapping. 

What characters are problematic? \'y and so are handled by the pdfdoc
encoding already. 

Hans 
-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unicode in bookmarks
       [not found] <Pine.BSF.3.96.1001127035447.9316A-100000@vertigo.fme.vutbr .cz>
  2000-11-27 10:20 ` unicode in bookmarks Hans Hagen
@ 2000-11-27 15:28 ` Hans Hagen
  2000-11-28 10:53   ` Petr Ferdus
       [not found]   ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>
  1 sibling, 2 replies; 9+ messages in thread
From: Hans Hagen @ 2000-11-27 15:28 UTC (permalink / raw)
  Cc: ntg-context

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

At 04:07 AM 11/27/00 +0100, Petr Ferdus wrote:
>Hi 
>does someone know, if it is possible to put unicode characters to
>bookmarks geterated by Context? Or more generally if the output of
>routines producing bookmark strins could be made unicode encoded. I would
>like to introduce some accented (czech) characters there. It seems
>to be possible only while they are entered in unicode.

You can have unicode indeed, see attached file. But, the bad news is that
if you want this kind of support right now, you and/or others will have to
collect the data that should go in the pdu endocing vector, since i don't
have complete unicode tables. And, it should be decimal [more efficient in
tex] or octal [slightly faster but ugly]. Then of course some testing has
to be done, 

In the attached file, conversion to unicode is kind of implemented [some
dirty code tricks, so don't ask me to explain it] and i'm still not sure if
teh more efficient \000a kind of normal ascii is really valid. [rather
minimal, the pdf documentation]. 

Hans  

[-- Attachment #2: unicode.pdf --]
[-- Type: application/pdf, Size: 4113 bytes --]

[-- Attachment #3: Unicode.tex --]
[-- Type: text/plain, Size: 4023 bytes --]

% output=pdftex

\def\octnumber#1%
  {\ifcase#1
     000\or 001\or 002\or 003\or 004\or 005\or 006\or 007\or 
     010\or 011\or 012\or 013\or 014\or 015\or 016\or 017\or 
     020\or 021\or 022\or 023\or 024\or 025\or 026\or 027\or 
     030\or 031\or 032\or 033\or 034\or 035\or 036\or 037\or 
     040\or 041\or 042\or 043\or 044\or 045\or 046\or 047\or 
     050\or 051\or 052\or 053\or 054\or 055\or 056\or 057\or 
     060\or 061\or 062\or 063\or 064\or 065\or 066\or 067\or 
     070\or 071\or 072\or 073\or 074\or 075\or 076\or 077\or 
     100\or 101\or 102\or 103\or 104\or 105\or 106\or 107\or
     110\or 111\or 112\or 113\or 114\or 115\or 116\or 117\or 
     120\or 121\or 122\or 123\or 124\or 125\or 126\or 127\or 
     130\or 131\or 132\or 133\or 134\or 135\or 136\or 137\or 
     140\or 141\or 142\or 143\or 144\or 145\or 146\or 147\or 
     150\or 151\or 152\or 153\or 154\or 155\or 156\or 157\or 
     160\or 161\or 162\or 163\or 164\or 165\or 166\or 167\or 
     170\or 171\or 172\or 173\or 174\or 175\or 176\or 177\or
     200\or 201\or 202\or 203\or 204\or 205\or 206\or 207\or 
     210\or 211\or 212\or 213\or 214\or 215\or 216\or 217\or 
     220\or 221\or 222\or 223\or 224\or 225\or 226\or 227\or 
     230\or 231\or 232\or 233\or 234\or 235\or 236\or 237\or 
     240\or 241\or 242\or 243\or 244\or 245\or 246\or 247\or 
     250\or 251\or 252\or 253\or 254\or 255\or 256\or 257\or 
     260\or 261\or 262\or 263\or 264\or 265\or 266\or 267\or 
     270\or 271\or 272\or 273\or 274\or 275\or 276\or 277\or 
     300\or 301\or 302\or 303\or 304\or 305\or 306\or 307\or 
     310\or 311\or 312\or 313\or 314\or 315\or 316\or 317\or 
     320\or 321\or 322\or 323\or 324\or 325\or 326\or 327\or 
     330\or 331\or 332\or 333\or 334\or 335\or 336\or 337\or 
     340\or 341\or 342\or 343\or 344\or 345\or 346\or 347\or 
     350\or 351\or 352\or 353\or 354\or 355\or 356\or 357\or 
     360\or 361\or 362\or 363\or 364\or 365\or 366\or 367\or 
     370\or 371\or 372\or 373\or 374\or 375\or 376\or 377\fi}

\startencoding[pdu]

\defineaccent ' A {\uchar{0}{193}}   % or {\octuchar{000}{301}}
\defineaccent ` A {\uchar{0}{194}}   
\defineaccent ^ A {\uchar{0}{195}}
\defineaccent ~ A {\uchar{0}{196}}

\stopencoding

\unprotect 

\edef\PDFoctuchar#1#2%
  {\expandafter\firstoftwoarguments\string\\#1%
   \expandafter\firstoftwoarguments\string\\#2}

\def\PDFdecuchar#1#2%
  {\PDFoctuchar{\octnumber{#1}}{\octnumber{#2}}}

%def\PDFunicodetrigger{\PDFoctuchar{376}{377}} % fe ff signals unicode
\def\PDFunicodetrigger{\PDFdecuchar{254}{255}} % fe ff signals unicode

\bgroup
\catcode`!=\@@escape
\catcode92=\@@other
!gdef!dodopdfuni#1#2!fi!fi!fi{!fi!fi!fi!dopdfuni#1}

!gdef!dopdfuni#1#2#3#4#5%
  {!ifx#1!empty
     % done 
   !else!ifx#2!empty
    %!string!000#1% more efficient, but ok?
     !string!000\!octnumber{!ifnum`#1=1 32!else`#1!fi}%
   !else!ifx#1\%
     #1#2#3#4%
     !dodopdfuni{#5}%
   !else
    %!string!000#1% more efficient, but ok?
     !string!000\!octnumber{!ifnum`#1=1 32!else`#1!fi}%
     !dodopdfuni{#2#3#4#5}%
   !fi!fi!fi}
!egroup

\bgroup
\catcode`\^^M=\@@active
\gdef\enablePDFunicrlf%
  {\def\\{\PDFdecuchar{0}{13}}%
   \def\par{\\\\}%
   \catcode`\^^M=\@@active%
   \let^^M=\\}
\egroup

\def\enablePDFuniencoding%
  {\reducetocoding[pdu]\simplifycommands}

\long\def\sanitizePDFuniencoding#1\to#2%
  {\let\octuchar\PDFoctuchar 
   \let\decuchar\PDFdecuchar 
   \let\uchar   \PDFdecuchar 
   \enablePDFunicrlf
   \enablePDFuniencoding
   \edef#2{\PDFunicodetrigger#1}%
   \lccode` =1
   \lowercasestring#2\to#2% freeze spaces
  %\show#2%
   \edef#2{\expandafter\dopdfuni#2\empty\empty\empty\empty\empty\empty\empty}%
  %\lccode1=32
  %\lowercasestring#2\to#2%
  %\show#2%
   }

\protect 

\let\sanitizePDFdocencoding\sanitizePDFuniencoding % \useencoding[pdu]

\setupbodyfont[pos]
\setupinteraction[state=start]

\starttext 

\startcomment 
\'A \`A \^A \~A B C D E \TeX 
\'A \`A \^A \~A B C D E \TeX 
\'A \`A \^A \~A B C D E \TeX 
\stopcomment 

\input tufte 

\stoptext
\x1a

[-- Attachment #4: Type: text/plain, Size: 370 bytes --]

-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unicode in bookmarks
  2000-11-27 15:28 ` Hans Hagen
@ 2000-11-28 10:53   ` Petr Ferdus
       [not found]   ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>
  1 sibling, 0 replies; 9+ messages in thread
From: Petr Ferdus @ 2000-11-28 10:53 UTC (permalink / raw)
  Cc: ntg-context

> In the attached file, conversion to unicode is kind of implemented [some
> dirty code tricks, so don't ask me to explain it] and i'm still not sure if
> teh more efficient \000a kind of normal ascii is really valid. [rather
> minimal, the pdf documentation]. 

Oops. Talking about bookmarks I thought of "outlines" not comments or
notes. Text in those isnt that vital to me at the moment but thanks
anyway. I have played with your example a bit, some results, source
files and comments are at:
http://vertigo.fme.vutbr.cz/~pferdus/context/test03_unicode.pdf

What I need to do is having text in outlines (bookmarks) in czech
(besides having searchable pdf files, (I mean  searchable including
accented letters). Would that be possible, while using Context's
potent commands, to create them?
Creating "searchable" pdf files with pdftex is already possible. I was
partially succesful in their generation with help of Context (currently, 
font which should be searchable must be "manipulated" after its use in
pdf, and I was able to "manipulate" only those fonts, I have introduced.
I was not able "globaly manipulate" all fonts Context uses)
(more details about font "manipulation" are in file test03_unicode.pdf).
Perhaps this could be done more generally.

Thanks for your input.

Peter

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>]

* Re: unicode in bookmarks
       [not found]   ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>
@ 2000-11-28 11:21     ` Hans Hagen
  2000-11-28 12:08     ` unicode in bookmarks / unicode searching support Hans Hagen
  1 sibling, 0 replies; 9+ messages in thread
From: Hans Hagen @ 2000-11-28 11:21 UTC (permalink / raw)
  Cc: ntg-context

At 11:53 AM 11/28/00 +0100, Petr Ferdus wrote:
>> In the attached file, conversion to unicode is kind of implemented [some
>> dirty code tricks, so don't ask me to explain it] and i'm still not sure if
>> teh more efficient \000a kind of normal ascii is really valid. [rather
>> minimal, the pdf documentation]. 
>
>Oops. Talking about bookmarks I thought of "outlines" not comments or
>notes. Text in those isnt that vital to me at the moment but thanks
>anyway. I have played with your example a bit, some results, source
>files and comments are at:
>http://vertigo.fme.vutbr.cz/~pferdus/context/test03_unicode.pdf
>
>What I need to do is having text in outlines (bookmarks) in czech
>(besides having searchable pdf files, (I mean  searchable including
>accented letters). Would that be possible, while using Context's
>potent commands, to create them?
>Creating "searchable" pdf files with pdftex is already possible. I was

Interesting. The last time i played with encoding vectors [actually for
forms the framework is already there] and forcing them into the file,
acrobat did fail hopelessly. [there are some low level pdf commands that
can be used to force glyphs in the file too]

>partially succesful in their generation with help of Context (currently, 
>font which should be searchable must be "manipulated" after its use in
>pdf, and I was able to "manipulate" only those fonts, I have introduced.
>I was not able "globaly manipulate" all fonts Context uses)
>(more details about font "manipulation" are in file test03_unicode.pdf).
>Perhaps this could be done more generally.
>
>Thanks for your input.

The bookmarks, comments, etc all use pdfdoc encoding and/or unicode, and i
found comments more easy to test. So, once we've solved the problem for
one, we solved it for all. This solution is supposed to be general -) 

What is needed, is a encoding file with accented chars and commands. Of
course there is always \bookmark to overload a title. 

I'll have a look at your file. 

Hans
-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unicode in bookmarks / unicode searching support
       [not found]   ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>
  2000-11-28 11:21     ` Hans Hagen
@ 2000-11-28 12:08     ` Hans Hagen
  2000-11-28 16:04       ` Petr Ferdus
       [not found]       ` <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz>
  1 sibling, 2 replies; 9+ messages in thread
From: Hans Hagen @ 2000-11-28 12:08 UTC (permalink / raw)
  Cc: ntg-context

[-- Attachment #1: Type: text/plain, Size: 560 bytes --]

At 11:53 AM 11/28/00 +0100, Petr Ferdus wrote:

>Perhaps this could be done more generally.

Well, it depends on how dependent this is of an encoding vector. I think
that in that case we need some specific unicode def section per encoding
vector. I did some tests and i think it can be implemented without
noticable overhead. You may test the attached file [just include it on top
of your doc] and don't include obj's manually. This test hack may also be
of interest to polish etc users i think. Are there more such predefined
lists of unicode mappings? 
Hans

[-- Attachment #2: unitest.zip --]
[-- Type: application/zip, Size: 4561 bytes --]

[-- Attachment #3: Type: text/plain, Size: 370 bytes --]

-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unicode in bookmarks / unicode searching support
  2000-11-28 12:08     ` unicode in bookmarks / unicode searching support Hans Hagen
@ 2000-11-28 16:04       ` Petr Ferdus
       [not found]       ` <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz>
  1 sibling, 0 replies; 9+ messages in thread
From: Petr Ferdus @ 2000-11-28 16:04 UTC (permalink / raw)
  Cc: ntg-context

On Tue, 28 Nov 2000, Hans Hagen wrote:
> Well, it depends on how dependent this is of an encoding vector. I think
> that in that case we need some specific unicode def section per encoding
> vector. I did some tests and i think it can be implemented without
> noticable overhead. You may test the attached file [just include it on top
> of your doc] and don't include obj's manually. This test hack may also be
> of interest to polish etc users i think. 
Unfortunatelly, including unitest.tex did not help. Resulting file 
(test04contex.pdf) has accented searchable just scaron, zcaron, yacute,
aacute, iacute, eacute, oacute and uacute characters together with
uppercase relatives. File generated by older "method" (test04plain.pdf)
has all accented characters seachable/copyable. 
all files located at:
http://vertigo.fme.vutbr.cz/~pferdus/context/
Which results did you get with inclusion of unitest.tex?

> Are there more such predefined lists of unicode mappings? 
I am not aware of others. Perhaps Ondrej Koala Vacha <koala@fi.muni.cz>
might know more about it.

Thanks.
Peter


^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz>]

* Re: unicode in bookmarks / unicode searching support
       [not found]       ` <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz>
@ 2000-11-28 17:34         ` Hans Hagen
  2000-11-28 22:46           ` Petr Ferdus
       [not found]           ` <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz>
  0 siblings, 2 replies; 9+ messages in thread
From: Hans Hagen @ 2000-11-28 17:34 UTC (permalink / raw)
  Cc: ntg-context

At 05:04 PM 11/28/00 +0100, Petr Ferdus wrote:

>Which results did you get with inclusion of unitest.tex?

Your fonts are preloaded at \everyjob time, so you need to invoke a font to
make sure that the object is written. 

% output=pdftex textranslate=cp1250cs

\starttext

\input unitest.tex

\resetfontdefinitionfile[csr] \setupbodyfont[csr] % rather safe, \rm or
\ss\rm would also have worked i suppose

some unreadable czech code not handled by my mailer

\stoptext

Later we will work out a more convenient way. 

Hans
-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unicode in bookmarks / unicode searching support
  2000-11-28 17:34         ` Hans Hagen
@ 2000-11-28 22:46           ` Petr Ferdus
       [not found]           ` <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz>
  1 sibling, 0 replies; 9+ messages in thread
From: Petr Ferdus @ 2000-11-28 22:46 UTC (permalink / raw)
  Cc: ntg-context

On Tue, 28 Nov 2000, Hans Hagen wrote:
> Your fonts are preloaded at \everyjob time, so you need to invoke a font to
> make sure that the object is written. 
> 
> % output=pdftex textranslate=cp1250cs
> 
> \starttext
> 
> \input unitest.tex
> 
> \resetfontdefinitionfile[csr] \setupbodyfont[csr] % rather safe, \rm or
> \ss\rm would also have worked i suppose
> 
> some unreadable czech code not handled by my mailer
> 
> \stoptext
Sorry, but I wasn't even with this fragment of code able to generate
"searchable" results. It might be due my Context setup, cant compare with
some other installation but even more \resetfontdefinitionfile hasn't accepted [csr] 
parameter and was "eating" one character as is clear from 
http://vertigo.fme.vutbr.cz/~pferdus/context/test05.pdf
example, with check numbers in it. Using \resetfontdefinitionfile without
parameter or playing with \rm or \ss\rm did not help either.

> Later we will work out a more convenient way. 
Looking forward it.

Thanks,

Peter


^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz>]

* Re: unicode in bookmarks / unicode searching support
       [not found]           ` <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz>
@ 2000-11-29  8:54             ` Hans Hagen
  0 siblings, 0 replies; 9+ messages in thread
From: Hans Hagen @ 2000-11-29  8:54 UTC (permalink / raw)
  Cc: ntg-context

At 11:46 PM 11/28/00 +0100, Petr Ferdus wrote:
>On Tue, 28 Nov 2000, Hans Hagen wrote:
>> Your fonts are preloaded at \everyjob time, so you need to invoke a font to
>> make sure that the object is written. 
>> 
>> % output=pdftex textranslate=cp1250cs
>> 
>> \starttext
>> 
>> \input unitest.tex
>> 
>> \resetfontdefinitionfile[csr] \setupbodyfont[csr] % rather safe, \rm or
>> \ss\rm would also have worked i suppose
>> 
>> some unreadable czech code not handled by my mailer
>> 
>> \stoptext
>Sorry, but I wasn't even with this fragment of code able to generate
>"searchable" results. It might be due my Context setup, cant compare with
>some other installation but even more \resetfontdefinitionfile hasn't
accepted [csr] 

looks like you run an old context, say \resetfontdefinitionfile{csr} then,
it was uninterfaced until recently

[btw, from now on i will make reloading default and add a switch to inhibit
it, computers are fast today]

Hans
-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2000-11-29  8:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.BSF.3.96.1001127035447.9316A-100000@vertigo.fme.vutbr .cz>
2000-11-27 10:20 ` unicode in bookmarks Hans Hagen
2000-11-27 15:28 ` Hans Hagen
2000-11-28 10:53   ` Petr Ferdus
     [not found]   ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>
2000-11-28 11:21     ` Hans Hagen
2000-11-28 12:08     ` unicode in bookmarks / unicode searching support Hans Hagen
2000-11-28 16:04       ` Petr Ferdus
     [not found]       ` <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz>
2000-11-28 17:34         ` Hans Hagen
2000-11-28 22:46           ` Petr Ferdus
     [not found]           ` <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz>
2000-11-29  8:54             ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).