* Re: unicode in bookmarks [not found] <Pine.BSF.3.96.1001127035447.9316A-100000@vertigo.fme.vutbr .cz> @ 2000-11-27 10:20 ` Hans Hagen 2000-11-27 15:28 ` Hans Hagen 1 sibling, 0 replies; 9+ messages in thread From: Hans Hagen @ 2000-11-27 10:20 UTC (permalink / raw) Cc: ntg-context At 04:07 AM 11/27/00 +0100, Petr Ferdus wrote: >Hi >does someone know, if it is possible to put unicode characters to >bookmarks geterated by Context? Or more generally if the output of >routines producing bookmark strins could be made unicode encoded. I would >like to introduce some accented (czech) characters there. It seems >to be possible only while they are entered in unicode. I must admit that i never looked into it, but it should not be that hard to implement and probabbly involved some parsing and mapping. What characters are problematic? \'y and so are handled by the pdfdoc encoding already. Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unicode in bookmarks [not found] <Pine.BSF.3.96.1001127035447.9316A-100000@vertigo.fme.vutbr .cz> 2000-11-27 10:20 ` unicode in bookmarks Hans Hagen @ 2000-11-27 15:28 ` Hans Hagen 2000-11-28 10:53 ` Petr Ferdus [not found] ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz> 1 sibling, 2 replies; 9+ messages in thread From: Hans Hagen @ 2000-11-27 15:28 UTC (permalink / raw) Cc: ntg-context [-- Attachment #1: Type: text/plain, Size: 1034 bytes --] At 04:07 AM 11/27/00 +0100, Petr Ferdus wrote: >Hi >does someone know, if it is possible to put unicode characters to >bookmarks geterated by Context? Or more generally if the output of >routines producing bookmark strins could be made unicode encoded. I would >like to introduce some accented (czech) characters there. It seems >to be possible only while they are entered in unicode. You can have unicode indeed, see attached file. But, the bad news is that if you want this kind of support right now, you and/or others will have to collect the data that should go in the pdu endocing vector, since i don't have complete unicode tables. And, it should be decimal [more efficient in tex] or octal [slightly faster but ugly]. Then of course some testing has to be done, In the attached file, conversion to unicode is kind of implemented [some dirty code tricks, so don't ask me to explain it] and i'm still not sure if teh more efficient \000a kind of normal ascii is really valid. [rather minimal, the pdf documentation]. Hans [-- Attachment #2: unicode.pdf --] [-- Type: application/pdf, Size: 4113 bytes --] [-- Attachment #3: Unicode.tex --] [-- Type: text/plain, Size: 4023 bytes --] % output=pdftex \def\octnumber#1% {\ifcase#1 000\or 001\or 002\or 003\or 004\or 005\or 006\or 007\or 010\or 011\or 012\or 013\or 014\or 015\or 016\or 017\or 020\or 021\or 022\or 023\or 024\or 025\or 026\or 027\or 030\or 031\or 032\or 033\or 034\or 035\or 036\or 037\or 040\or 041\or 042\or 043\or 044\or 045\or 046\or 047\or 050\or 051\or 052\or 053\or 054\or 055\or 056\or 057\or 060\or 061\or 062\or 063\or 064\or 065\or 066\or 067\or 070\or 071\or 072\or 073\or 074\or 075\or 076\or 077\or 100\or 101\or 102\or 103\or 104\or 105\or 106\or 107\or 110\or 111\or 112\or 113\or 114\or 115\or 116\or 117\or 120\or 121\or 122\or 123\or 124\or 125\or 126\or 127\or 130\or 131\or 132\or 133\or 134\or 135\or 136\or 137\or 140\or 141\or 142\or 143\or 144\or 145\or 146\or 147\or 150\or 151\or 152\or 153\or 154\or 155\or 156\or 157\or 160\or 161\or 162\or 163\or 164\or 165\or 166\or 167\or 170\or 171\or 172\or 173\or 174\or 175\or 176\or 177\or 200\or 201\or 202\or 203\or 204\or 205\or 206\or 207\or 210\or 211\or 212\or 213\or 214\or 215\or 216\or 217\or 220\or 221\or 222\or 223\or 224\or 225\or 226\or 227\or 230\or 231\or 232\or 233\or 234\or 235\or 236\or 237\or 240\or 241\or 242\or 243\or 244\or 245\or 246\or 247\or 250\or 251\or 252\or 253\or 254\or 255\or 256\or 257\or 260\or 261\or 262\or 263\or 264\or 265\or 266\or 267\or 270\or 271\or 272\or 273\or 274\or 275\or 276\or 277\or 300\or 301\or 302\or 303\or 304\or 305\or 306\or 307\or 310\or 311\or 312\or 313\or 314\or 315\or 316\or 317\or 320\or 321\or 322\or 323\or 324\or 325\or 326\or 327\or 330\or 331\or 332\or 333\or 334\or 335\or 336\or 337\or 340\or 341\or 342\or 343\or 344\or 345\or 346\or 347\or 350\or 351\or 352\or 353\or 354\or 355\or 356\or 357\or 360\or 361\or 362\or 363\or 364\or 365\or 366\or 367\or 370\or 371\or 372\or 373\or 374\or 375\or 376\or 377\fi} \startencoding[pdu] \defineaccent ' A {\uchar{0}{193}} % or {\octuchar{000}{301}} \defineaccent ` A {\uchar{0}{194}} \defineaccent ^ A {\uchar{0}{195}} \defineaccent ~ A {\uchar{0}{196}} \stopencoding \unprotect \edef\PDFoctuchar#1#2% {\expandafter\firstoftwoarguments\string\\#1% \expandafter\firstoftwoarguments\string\\#2} \def\PDFdecuchar#1#2% {\PDFoctuchar{\octnumber{#1}}{\octnumber{#2}}} %def\PDFunicodetrigger{\PDFoctuchar{376}{377}} % fe ff signals unicode \def\PDFunicodetrigger{\PDFdecuchar{254}{255}} % fe ff signals unicode \bgroup \catcode`!=\@@escape \catcode92=\@@other !gdef!dodopdfuni#1#2!fi!fi!fi{!fi!fi!fi!dopdfuni#1} !gdef!dopdfuni#1#2#3#4#5% {!ifx#1!empty % done !else!ifx#2!empty %!string!000#1% more efficient, but ok? !string!000\!octnumber{!ifnum`#1=1 32!else`#1!fi}% !else!ifx#1\% #1#2#3#4% !dodopdfuni{#5}% !else %!string!000#1% more efficient, but ok? !string!000\!octnumber{!ifnum`#1=1 32!else`#1!fi}% !dodopdfuni{#2#3#4#5}% !fi!fi!fi} !egroup \bgroup \catcode`\^^M=\@@active \gdef\enablePDFunicrlf% {\def\\{\PDFdecuchar{0}{13}}% \def\par{\\\\}% \catcode`\^^M=\@@active% \let^^M=\\} \egroup \def\enablePDFuniencoding% {\reducetocoding[pdu]\simplifycommands} \long\def\sanitizePDFuniencoding#1\to#2% {\let\octuchar\PDFoctuchar \let\decuchar\PDFdecuchar \let\uchar \PDFdecuchar \enablePDFunicrlf \enablePDFuniencoding \edef#2{\PDFunicodetrigger#1}% \lccode` =1 \lowercasestring#2\to#2% freeze spaces %\show#2% \edef#2{\expandafter\dopdfuni#2\empty\empty\empty\empty\empty\empty\empty}% %\lccode1=32 %\lowercasestring#2\to#2% %\show#2% } \protect \let\sanitizePDFdocencoding\sanitizePDFuniencoding % \useencoding[pdu] \setupbodyfont[pos] \setupinteraction[state=start] \starttext \startcomment \'A \`A \^A \~A B C D E \TeX \'A \`A \^A \~A B C D E \TeX \'A \`A \^A \~A B C D E \TeX \stopcomment \input tufte \stoptext \x1a [-- Attachment #4: Type: text/plain, Size: 370 bytes --] ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unicode in bookmarks 2000-11-27 15:28 ` Hans Hagen @ 2000-11-28 10:53 ` Petr Ferdus [not found] ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz> 1 sibling, 0 replies; 9+ messages in thread From: Petr Ferdus @ 2000-11-28 10:53 UTC (permalink / raw) Cc: ntg-context > In the attached file, conversion to unicode is kind of implemented [some > dirty code tricks, so don't ask me to explain it] and i'm still not sure if > teh more efficient \000a kind of normal ascii is really valid. [rather > minimal, the pdf documentation]. Oops. Talking about bookmarks I thought of "outlines" not comments or notes. Text in those isnt that vital to me at the moment but thanks anyway. I have played with your example a bit, some results, source files and comments are at: http://vertigo.fme.vutbr.cz/~pferdus/context/test03_unicode.pdf What I need to do is having text in outlines (bookmarks) in czech (besides having searchable pdf files, (I mean searchable including accented letters). Would that be possible, while using Context's potent commands, to create them? Creating "searchable" pdf files with pdftex is already possible. I was partially succesful in their generation with help of Context (currently, font which should be searchable must be "manipulated" after its use in pdf, and I was able to "manipulate" only those fonts, I have introduced. I was not able "globaly manipulate" all fonts Context uses) (more details about font "manipulation" are in file test03_unicode.pdf). Perhaps this could be done more generally. Thanks for your input. Peter ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz>]
* Re: unicode in bookmarks [not found] ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz> @ 2000-11-28 11:21 ` Hans Hagen 2000-11-28 12:08 ` unicode in bookmarks / unicode searching support Hans Hagen 1 sibling, 0 replies; 9+ messages in thread From: Hans Hagen @ 2000-11-28 11:21 UTC (permalink / raw) Cc: ntg-context At 11:53 AM 11/28/00 +0100, Petr Ferdus wrote: >> In the attached file, conversion to unicode is kind of implemented [some >> dirty code tricks, so don't ask me to explain it] and i'm still not sure if >> teh more efficient \000a kind of normal ascii is really valid. [rather >> minimal, the pdf documentation]. > >Oops. Talking about bookmarks I thought of "outlines" not comments or >notes. Text in those isnt that vital to me at the moment but thanks >anyway. I have played with your example a bit, some results, source >files and comments are at: >http://vertigo.fme.vutbr.cz/~pferdus/context/test03_unicode.pdf > >What I need to do is having text in outlines (bookmarks) in czech >(besides having searchable pdf files, (I mean searchable including >accented letters). Would that be possible, while using Context's >potent commands, to create them? >Creating "searchable" pdf files with pdftex is already possible. I was Interesting. The last time i played with encoding vectors [actually for forms the framework is already there] and forcing them into the file, acrobat did fail hopelessly. [there are some low level pdf commands that can be used to force glyphs in the file too] >partially succesful in their generation with help of Context (currently, >font which should be searchable must be "manipulated" after its use in >pdf, and I was able to "manipulate" only those fonts, I have introduced. >I was not able "globaly manipulate" all fonts Context uses) >(more details about font "manipulation" are in file test03_unicode.pdf). >Perhaps this could be done more generally. > >Thanks for your input. The bookmarks, comments, etc all use pdfdoc encoding and/or unicode, and i found comments more easy to test. So, once we've solved the problem for one, we solved it for all. This solution is supposed to be general -) What is needed, is a encoding file with accented chars and commands. Of course there is always \bookmark to overload a title. I'll have a look at your file. Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unicode in bookmarks / unicode searching support [not found] ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz> 2000-11-28 11:21 ` Hans Hagen @ 2000-11-28 12:08 ` Hans Hagen 2000-11-28 16:04 ` Petr Ferdus [not found] ` <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz> 1 sibling, 2 replies; 9+ messages in thread From: Hans Hagen @ 2000-11-28 12:08 UTC (permalink / raw) Cc: ntg-context [-- Attachment #1: Type: text/plain, Size: 560 bytes --] At 11:53 AM 11/28/00 +0100, Petr Ferdus wrote: >Perhaps this could be done more generally. Well, it depends on how dependent this is of an encoding vector. I think that in that case we need some specific unicode def section per encoding vector. I did some tests and i think it can be implemented without noticable overhead. You may test the attached file [just include it on top of your doc] and don't include obj's manually. This test hack may also be of interest to polish etc users i think. Are there more such predefined lists of unicode mappings? Hans [-- Attachment #2: unitest.zip --] [-- Type: application/zip, Size: 4561 bytes --] [-- Attachment #3: Type: text/plain, Size: 370 bytes --] ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unicode in bookmarks / unicode searching support 2000-11-28 12:08 ` unicode in bookmarks / unicode searching support Hans Hagen @ 2000-11-28 16:04 ` Petr Ferdus [not found] ` <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz> 1 sibling, 0 replies; 9+ messages in thread From: Petr Ferdus @ 2000-11-28 16:04 UTC (permalink / raw) Cc: ntg-context On Tue, 28 Nov 2000, Hans Hagen wrote: > Well, it depends on how dependent this is of an encoding vector. I think > that in that case we need some specific unicode def section per encoding > vector. I did some tests and i think it can be implemented without > noticable overhead. You may test the attached file [just include it on top > of your doc] and don't include obj's manually. This test hack may also be > of interest to polish etc users i think. Unfortunatelly, including unitest.tex did not help. Resulting file (test04contex.pdf) has accented searchable just scaron, zcaron, yacute, aacute, iacute, eacute, oacute and uacute characters together with uppercase relatives. File generated by older "method" (test04plain.pdf) has all accented characters seachable/copyable. all files located at: http://vertigo.fme.vutbr.cz/~pferdus/context/ Which results did you get with inclusion of unitest.tex? > Are there more such predefined lists of unicode mappings? I am not aware of others. Perhaps Ondrej Koala Vacha <koala@fi.muni.cz> might know more about it. Thanks. Peter ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz>]
* Re: unicode in bookmarks / unicode searching support [not found] ` <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz> @ 2000-11-28 17:34 ` Hans Hagen 2000-11-28 22:46 ` Petr Ferdus [not found] ` <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz> 0 siblings, 2 replies; 9+ messages in thread From: Hans Hagen @ 2000-11-28 17:34 UTC (permalink / raw) Cc: ntg-context At 05:04 PM 11/28/00 +0100, Petr Ferdus wrote: >Which results did you get with inclusion of unitest.tex? Your fonts are preloaded at \everyjob time, so you need to invoke a font to make sure that the object is written. % output=pdftex textranslate=cp1250cs \starttext \input unitest.tex \resetfontdefinitionfile[csr] \setupbodyfont[csr] % rather safe, \rm or \ss\rm would also have worked i suppose some unreadable czech code not handled by my mailer \stoptext Later we will work out a more convenient way. Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unicode in bookmarks / unicode searching support 2000-11-28 17:34 ` Hans Hagen @ 2000-11-28 22:46 ` Petr Ferdus [not found] ` <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz> 1 sibling, 0 replies; 9+ messages in thread From: Petr Ferdus @ 2000-11-28 22:46 UTC (permalink / raw) Cc: ntg-context On Tue, 28 Nov 2000, Hans Hagen wrote: > Your fonts are preloaded at \everyjob time, so you need to invoke a font to > make sure that the object is written. > > % output=pdftex textranslate=cp1250cs > > \starttext > > \input unitest.tex > > \resetfontdefinitionfile[csr] \setupbodyfont[csr] % rather safe, \rm or > \ss\rm would also have worked i suppose > > some unreadable czech code not handled by my mailer > > \stoptext Sorry, but I wasn't even with this fragment of code able to generate "searchable" results. It might be due my Context setup, cant compare with some other installation but even more \resetfontdefinitionfile hasn't accepted [csr] parameter and was "eating" one character as is clear from http://vertigo.fme.vutbr.cz/~pferdus/context/test05.pdf example, with check numbers in it. Using \resetfontdefinitionfile without parameter or playing with \rm or \ss\rm did not help either. > Later we will work out a more convenient way. Looking forward it. Thanks, Peter ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz>]
* Re: unicode in bookmarks / unicode searching support [not found] ` <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz> @ 2000-11-29 8:54 ` Hans Hagen 0 siblings, 0 replies; 9+ messages in thread From: Hans Hagen @ 2000-11-29 8:54 UTC (permalink / raw) Cc: ntg-context At 11:46 PM 11/28/00 +0100, Petr Ferdus wrote: >On Tue, 28 Nov 2000, Hans Hagen wrote: >> Your fonts are preloaded at \everyjob time, so you need to invoke a font to >> make sure that the object is written. >> >> % output=pdftex textranslate=cp1250cs >> >> \starttext >> >> \input unitest.tex >> >> \resetfontdefinitionfile[csr] \setupbodyfont[csr] % rather safe, \rm or >> \ss\rm would also have worked i suppose >> >> some unreadable czech code not handled by my mailer >> >> \stoptext >Sorry, but I wasn't even with this fragment of code able to generate >"searchable" results. It might be due my Context setup, cant compare with >some other installation but even more \resetfontdefinitionfile hasn't accepted [csr] looks like you run an old context, say \resetfontdefinitionfile{csr} then, it was uninterfaced until recently [btw, from now on i will make reloading default and add a switch to inhibit it, computers are fast today] Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2000-11-29 8:54 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <Pine.BSF.3.96.1001127035447.9316A-100000@vertigo.fme.vutbr .cz> 2000-11-27 10:20 ` unicode in bookmarks Hans Hagen 2000-11-27 15:28 ` Hans Hagen 2000-11-28 10:53 ` Petr Ferdus [not found] ` <Pine.BSF.3.96.1001128111339.22864A-100000@vertigo.fme.vutb r.cz> 2000-11-28 11:21 ` Hans Hagen 2000-11-28 12:08 ` unicode in bookmarks / unicode searching support Hans Hagen 2000-11-28 16:04 ` Petr Ferdus [not found] ` <Pine.BSF.3.96.1001128164649.28022A-100000@vertigo.fme.vutb r.cz> 2000-11-28 17:34 ` Hans Hagen 2000-11-28 22:46 ` Petr Ferdus [not found] ` <Pine.BSF.3.96.1001128223054.9364A-100000@vertigo.fme.vutbr .cz> 2000-11-29 8:54 ` Hans Hagen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).