unicode in bookmarks

ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed

* unicode in bookmarks
@ 2000-11-27  3:07 Petr Ferdus
  0 siblings, 0 replies; 7+ messages in thread
From: Petr Ferdus @ 2000-11-27  3:07 UTC (permalink / raw)


Hi 
does someone know, if it is possible to put unicode characters to
bookmarks geterated by Context? Or more generally if the output of
routines producing bookmark strins could be made unicode encoded. I would
like to introduce some accented (czech) characters there. It seems
to be possible only while they are entered in unicode.

Thanks,
Peter


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicode in bookmarks
  2000-12-01 17:33   ` Hans Hagen
@ 2000-12-05 10:58     ` Petr Ferdus
  0 siblings, 0 replies; 7+ messages in thread
From: Petr Ferdus @ 2000-12-05 10:58 UTC (permalink / raw)
  Cc: ntg-context

On Fri, 1 Dec 2000, Hans Hagen wrote:

> For that, the unicode list should be extended, since i defined only a few
> chars, 

I used rcaron letter to see what is going on and it behave partly the way
I don't comprehend.
It seems to write properly encoded octal representation of rcaron
character to both annotation and bookmark objects, regardles the way I
"defineaccent-ed" it. But in both cases, expected rcaron didn't show up in
bookmark or annotation texts (there are dots instead of them).

Secondly, used as regular text in pdf, both definitions of rcaron
produced different results (\'r showed as racute, \v{r} showed as rcaron)

The question is what makes presumably well written/encoded
character not to show properly in bookmark/annotation. Second one would be
how to efficiently write bookmark/ annot. string preferably as one
character for each written character thus avoiding \'r or \v{r} way of
writting them.

Thank you for your patience.

Peter

source:
http://vertigo.fme.vutbr.cz/~pferdus/context/test09a.tex

pdf output:
http://vertigo.fme.vutbr.cz/~pferdus/context/test09a.pdf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicode in bookmarks
       [not found] ` <Pine.BSF.3.96.1001201154014.25304E-100000@vertigo.fme.vutb r.cz>
@ 2000-12-01 17:33   ` Hans Hagen
  2000-12-05 10:58     ` Petr Ferdus
  0 siblings, 1 reply; 7+ messages in thread
From: Hans Hagen @ 2000-12-01 17:33 UTC (permalink / raw)
  Cc: ntg-context

At 05:52 PM 12/1/00 +0100, Petr Ferdus wrote:
>On Wed, 29 Nov 2000, Hans Hagen wrote:
>
results. (perhaps now comes the time to extend and apply code
>from your first reply, with note example, to proces string feeded to
>bookmark). Is it correct? Or did unitable changed situation somehow? 

For that, the unicode list should be extended, since i defined only a few
chars, 

Hans 
-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicode in bookmarks
       [not found]   ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>
@ 2000-11-28 11:21     ` Hans Hagen
  0 siblings, 0 replies; 7+ messages in thread
From: Hans Hagen @ 2000-11-28 11:21 UTC (permalink / raw)
  Cc: ntg-context

At 11:53 AM 11/28/00 +0100, Petr Ferdus wrote:
>> In the attached file, conversion to unicode is kind of implemented [some
>> dirty code tricks, so don't ask me to explain it] and i'm still not sure if
>> teh more efficient \000a kind of normal ascii is really valid. [rather
>> minimal, the pdf documentation]. 
>
>Oops. Talking about bookmarks I thought of "outlines" not comments or
>notes. Text in those isnt that vital to me at the moment but thanks
>anyway. I have played with your example a bit, some results, source
>files and comments are at:
>http://vertigo.fme.vutbr.cz/~pferdus/context/test03_unicode.pdf
>
>What I need to do is having text in outlines (bookmarks) in czech
>(besides having searchable pdf files, (I mean  searchable including
>accented letters). Would that be possible, while using Context's
>potent commands, to create them?
>Creating "searchable" pdf files with pdftex is already possible. I was

Interesting. The last time i played with encoding vectors [actually for
forms the framework is already there] and forcing them into the file,
acrobat did fail hopelessly. [there are some low level pdf commands that
can be used to force glyphs in the file too]

>partially succesful in their generation with help of Context (currently, 
>font which should be searchable must be "manipulated" after its use in
>pdf, and I was able to "manipulate" only those fonts, I have introduced.
>I was not able "globaly manipulate" all fonts Context uses)
>(more details about font "manipulation" are in file test03_unicode.pdf).
>Perhaps this could be done more generally.
>
>Thanks for your input.

The bookmarks, comments, etc all use pdfdoc encoding and/or unicode, and i
found comments more easy to test. So, once we've solved the problem for
one, we solved it for all. This solution is supposed to be general -) 

What is needed, is a encoding file with accented chars and commands. Of
course there is always \bookmark to overload a title. 

I'll have a look at your file. 

Hans
-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicode in bookmarks
  2000-11-27 15:28 ` Hans Hagen
@ 2000-11-28 10:53   ` Petr Ferdus
       [not found]   ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>
  1 sibling, 0 replies; 7+ messages in thread
From: Petr Ferdus @ 2000-11-28 10:53 UTC (permalink / raw)
  Cc: ntg-context

> In the attached file, conversion to unicode is kind of implemented [some
> dirty code tricks, so don't ask me to explain it] and i'm still not sure if
> teh more efficient \000a kind of normal ascii is really valid. [rather
> minimal, the pdf documentation]. 

Oops. Talking about bookmarks I thought of "outlines" not comments or
notes. Text in those isnt that vital to me at the moment but thanks
anyway. I have played with your example a bit, some results, source
files and comments are at:
http://vertigo.fme.vutbr.cz/~pferdus/context/test03_unicode.pdf

What I need to do is having text in outlines (bookmarks) in czech
(besides having searchable pdf files, (I mean  searchable including
accented letters). Would that be possible, while using Context's
potent commands, to create them?
Creating "searchable" pdf files with pdftex is already possible. I was
partially succesful in their generation with help of Context (currently, 
font which should be searchable must be "manipulated" after its use in
pdf, and I was able to "manipulate" only those fonts, I have introduced.
I was not able "globaly manipulate" all fonts Context uses)
(more details about font "manipulation" are in file test03_unicode.pdf).
Perhaps this could be done more generally.

Thanks for your input.

Peter

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicode in bookmarks
       [not found] <Pine.BSF.3.96.1001127035447.9316A-100000@vertigo.fme.vutbr .cz>
  2000-11-27 10:20 ` Hans Hagen
@ 2000-11-27 15:28 ` Hans Hagen
  2000-11-28 10:53   ` Petr Ferdus
       [not found]   ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>
  1 sibling, 2 replies; 7+ messages in thread
From: Hans Hagen @ 2000-11-27 15:28 UTC (permalink / raw)
  Cc: ntg-context

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

At 04:07 AM 11/27/00 +0100, Petr Ferdus wrote:
>Hi 
>does someone know, if it is possible to put unicode characters to
>bookmarks geterated by Context? Or more generally if the output of
>routines producing bookmark strins could be made unicode encoded. I would
>like to introduce some accented (czech) characters there. It seems
>to be possible only while they are entered in unicode.

You can have unicode indeed, see attached file. But, the bad news is that
if you want this kind of support right now, you and/or others will have to
collect the data that should go in the pdu endocing vector, since i don't
have complete unicode tables. And, it should be decimal [more efficient in
tex] or octal [slightly faster but ugly]. Then of course some testing has
to be done, 

In the attached file, conversion to unicode is kind of implemented [some
dirty code tricks, so don't ask me to explain it] and i'm still not sure if
teh more efficient \000a kind of normal ascii is really valid. [rather
minimal, the pdf documentation]. 

Hans  

[-- Attachment #2: unicode.pdf --]
[-- Type: application/pdf, Size: 4113 bytes --]

[-- Attachment #3: Unicode.tex --]
[-- Type: text/plain, Size: 4023 bytes --]

% output=pdftex

\def\octnumber#1%
  {\ifcase#1
     000\or 001\or 002\or 003\or 004\or 005\or 006\or 007\or 
     010\or 011\or 012\or 013\or 014\or 015\or 016\or 017\or 
     020\or 021\or 022\or 023\or 024\or 025\or 026\or 027\or 
     030\or 031\or 032\or 033\or 034\or 035\or 036\or 037\or 
     040\or 041\or 042\or 043\or 044\or 045\or 046\or 047\or 
     050\or 051\or 052\or 053\or 054\or 055\or 056\or 057\or 
     060\or 061\or 062\or 063\or 064\or 065\or 066\or 067\or 
     070\or 071\or 072\or 073\or 074\or 075\or 076\or 077\or 
     100\or 101\or 102\or 103\or 104\or 105\or 106\or 107\or
     110\or 111\or 112\or 113\or 114\or 115\or 116\or 117\or 
     120\or 121\or 122\or 123\or 124\or 125\or 126\or 127\or 
     130\or 131\or 132\or 133\or 134\or 135\or 136\or 137\or 
     140\or 141\or 142\or 143\or 144\or 145\or 146\or 147\or 
     150\or 151\or 152\or 153\or 154\or 155\or 156\or 157\or 
     160\or 161\or 162\or 163\or 164\or 165\or 166\or 167\or 
     170\or 171\or 172\or 173\or 174\or 175\or 176\or 177\or
     200\or 201\or 202\or 203\or 204\or 205\or 206\or 207\or 
     210\or 211\or 212\or 213\or 214\or 215\or 216\or 217\or 
     220\or 221\or 222\or 223\or 224\or 225\or 226\or 227\or 
     230\or 231\or 232\or 233\or 234\or 235\or 236\or 237\or 
     240\or 241\or 242\or 243\or 244\or 245\or 246\or 247\or 
     250\or 251\or 252\or 253\or 254\or 255\or 256\or 257\or 
     260\or 261\or 262\or 263\or 264\or 265\or 266\or 267\or 
     270\or 271\or 272\or 273\or 274\or 275\or 276\or 277\or 
     300\or 301\or 302\or 303\or 304\or 305\or 306\or 307\or 
     310\or 311\or 312\or 313\or 314\or 315\or 316\or 317\or 
     320\or 321\or 322\or 323\or 324\or 325\or 326\or 327\or 
     330\or 331\or 332\or 333\or 334\or 335\or 336\or 337\or 
     340\or 341\or 342\or 343\or 344\or 345\or 346\or 347\or 
     350\or 351\or 352\or 353\or 354\or 355\or 356\or 357\or 
     360\or 361\or 362\or 363\or 364\or 365\or 366\or 367\or 
     370\or 371\or 372\or 373\or 374\or 375\or 376\or 377\fi}

\startencoding[pdu]

\defineaccent ' A {\uchar{0}{193}}   % or {\octuchar{000}{301}}
\defineaccent ` A {\uchar{0}{194}}   
\defineaccent ^ A {\uchar{0}{195}}
\defineaccent ~ A {\uchar{0}{196}}

\stopencoding

\unprotect 

\edef\PDFoctuchar#1#2%
  {\expandafter\firstoftwoarguments\string\\#1%
   \expandafter\firstoftwoarguments\string\\#2}

\def\PDFdecuchar#1#2%
  {\PDFoctuchar{\octnumber{#1}}{\octnumber{#2}}}

%def\PDFunicodetrigger{\PDFoctuchar{376}{377}} % fe ff signals unicode
\def\PDFunicodetrigger{\PDFdecuchar{254}{255}} % fe ff signals unicode

\bgroup
\catcode`!=\@@escape
\catcode92=\@@other
!gdef!dodopdfuni#1#2!fi!fi!fi{!fi!fi!fi!dopdfuni#1}

!gdef!dopdfuni#1#2#3#4#5%
  {!ifx#1!empty
     % done 
   !else!ifx#2!empty
    %!string!000#1% more efficient, but ok?
     !string!000\!octnumber{!ifnum`#1=1 32!else`#1!fi}%
   !else!ifx#1\%
     #1#2#3#4%
     !dodopdfuni{#5}%
   !else
    %!string!000#1% more efficient, but ok?
     !string!000\!octnumber{!ifnum`#1=1 32!else`#1!fi}%
     !dodopdfuni{#2#3#4#5}%
   !fi!fi!fi}
!egroup

\bgroup
\catcode`\^^M=\@@active
\gdef\enablePDFunicrlf%
  {\def\\{\PDFdecuchar{0}{13}}%
   \def\par{\\\\}%
   \catcode`\^^M=\@@active%
   \let^^M=\\}
\egroup

\def\enablePDFuniencoding%
  {\reducetocoding[pdu]\simplifycommands}

\long\def\sanitizePDFuniencoding#1\to#2%
  {\let\octuchar\PDFoctuchar 
   \let\decuchar\PDFdecuchar 
   \let\uchar   \PDFdecuchar 
   \enablePDFunicrlf
   \enablePDFuniencoding
   \edef#2{\PDFunicodetrigger#1}%
   \lccode` =1
   \lowercasestring#2\to#2% freeze spaces
  %\show#2%
   \edef#2{\expandafter\dopdfuni#2\empty\empty\empty\empty\empty\empty\empty}%
  %\lccode1=32
  %\lowercasestring#2\to#2%
  %\show#2%
   }

\protect 

\let\sanitizePDFdocencoding\sanitizePDFuniencoding % \useencoding[pdu]

\setupbodyfont[pos]
\setupinteraction[state=start]

\starttext 

\startcomment 
\'A \`A \^A \~A B C D E \TeX 
\'A \`A \^A \~A B C D E \TeX 
\'A \`A \^A \~A B C D E \TeX 
\stopcomment 

\input tufte 

\stoptext
\x1a

[-- Attachment #4: Type: text/plain, Size: 370 bytes --]

-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicode in bookmarks
       [not found] <Pine.BSF.3.96.1001127035447.9316A-100000@vertigo.fme.vutbr .cz>
@ 2000-11-27 10:20 ` Hans Hagen
  2000-11-27 15:28 ` Hans Hagen
  1 sibling, 0 replies; 7+ messages in thread
From: Hans Hagen @ 2000-11-27 10:20 UTC (permalink / raw)
  Cc: ntg-context

At 04:07 AM 11/27/00 +0100, Petr Ferdus wrote:
>Hi 
>does someone know, if it is possible to put unicode characters to
>bookmarks geterated by Context? Or more generally if the output of
>routines producing bookmark strins could be made unicode encoded. I would
>like to introduce some accented (czech) characters there. It seems
>to be possible only while they are entered in unicode.

I must admit that i never looked into it, but it should not be that hard to
implement and probabbly involved some parsing and mapping. 

What characters are problematic? \'y and so are handled by the pdfdoc
encoding already. 

Hans 
-------------------------------------------------------------------------
                                                  Hans Hagen | PRAGMA ADE
                      Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2000-12-05 10:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-11-27  3:07 unicode in bookmarks Petr Ferdus
     [not found] <Pine.BSF.3.96.1001127035447.9316A-100000@vertigo.fme.vutbr .cz>
2000-11-27 10:20 ` Hans Hagen
2000-11-27 15:28 ` Hans Hagen
2000-11-28 10:53   ` Petr Ferdus
     [not found]   ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>
2000-11-28 11:21     ` Hans Hagen
     [not found] <3.0.6.32.20001129091246.01766810@pop.wxs.nl>
     [not found] ` <Pine.BSF.3.96.1001201154014.25304E-100000@vertigo.fme.vutb r.cz>
2000-12-01 17:33   ` Hans Hagen
2000-12-05 10:58     ` Petr Ferdus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).