ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Basic question on Unicode and ConTeXt
@ 2005-07-14  9:30 Steffen Wolfrum
  2005-07-14 10:29 ` Henning Hraban Ramm
  0 siblings, 1 reply; 31+ messages in thread
From: Steffen Wolfrum @ 2005-07-14  9:30 UTC (permalink / raw)
  Cc: ntg-context

Hi,

now and then I saw threads on this list dealing with specific 
problems of using various languages with utf-8 input in ConTeXt 
(processing with pdftex, NOT xetex).

I know there is \enableregime[utf]
but what else I needed that the output equals my utf-8 input?

Could some maybe give a short and usable How-To on common examples:
Greek
Russian
an East European language
and an Asian language?

(If the platform makes a difference, I'd be interested in OSX.)

Thank you,

Steffen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-14  9:30 Basic question on Unicode and ConTeXt Steffen Wolfrum
@ 2005-07-14 10:29 ` Henning Hraban Ramm
  2005-07-14 19:13   ` Steffen Wolfrum
  2005-07-15 18:43   ` Mojca Miklavec
  0 siblings, 2 replies; 31+ messages in thread
From: Henning Hraban Ramm @ 2005-07-14 10:29 UTC (permalink / raw)


Am 2005-07-14 um 11:30 schrieb Steffen Wolfrum:

> I know there is \enableregime[utf]
> but what else I needed that the output equals my utf-8 input?
>
> Could some maybe give a short and usable How-To on common examples:
> Greek
> Russian
> an East European language
> and an Asian language?

You did read http://contextgarden.net/Encodings_and_Regimes and  
linked pages, did you?
If you learn anything new, please add it to the wiki!


Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-14 10:29 ` Henning Hraban Ramm
@ 2005-07-14 19:13   ` Steffen Wolfrum
  2005-07-14 19:32     ` VnPenguin
                       ` (2 more replies)
  2005-07-15 18:43   ` Mojca Miklavec
  1 sibling, 3 replies; 31+ messages in thread
From: Steffen Wolfrum @ 2005-07-14 19:13 UTC (permalink / raw)


Hi Henning,


Zitat von Henning Hraban Ramm <hraban@fiee.net>:

> Am 2005-07-14 um 11:30 schrieb Steffen Wolfrum:
>
> > I know there is \enableregime[utf]
> > but what else I needed that the output equals my utf-8 input?
> >
> > Could some maybe give a short and usable How-To on common examples:
> > Greek
> > Russian
> > an East European language
> > and an Asian language?
>
> You did read http://contextgarden.net/Encodings_and_Regimes and
> linked pages, did you?
> If you learn anything new, please add it to the wiki!


Well, yes, I wasn't interested in e.g. VISCII, but I read the info for UTF.
But as you wrote "linked pages" I became more curious and looked up also those
pages. Indeed, there is more:

But, why is the Vietnamese example with
\enableregime[utf]
\setupencoding[default=t5
linked under
vis = viscii	VISCII	Vietnamesevis = viscii	VISCII	Vietnamese
and not accessable with
utf	UTF-8	Unicode ? (Same for cyrillic)

Is this just a wrong link, or does this show that I don't have understood the
realationship between regimes and encoding?

Shouldn't all UTF relevant examples be listed under UTF?


So,sorry for starting this irrelevant thread,

Steffen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-14 19:13   ` Steffen Wolfrum
@ 2005-07-14 19:32     ` VnPenguin
  2005-07-15  5:16     ` Radhelorn
  2005-07-15  9:09     ` Henning Hraban Ramm
  2 siblings, 0 replies; 31+ messages in thread
From: VnPenguin @ 2005-07-14 19:32 UTC (permalink / raw)


On 7/14/05, Steffen Wolfrum <context@st.estfiles.de> wrote:
> 
> 
> Well, yes, I wasn't interested in e.g. VISCII, but I read the info for UTF.
> But as you wrote "linked pages" I became more curious and looked up also those
> pages. Indeed, there is more:
> 
> But, why is the Vietnamese example with
> \enableregime[utf]
> \setupencoding[default=t5
> linked under
> vis = viscii    VISCII  Vietnamesevis = viscii  VISCII  Vietnamese
> and not accessable with
> utf     UTF-8   Unicode ? (Same for cyrillic)

Sorry, I can not understand your question.

Vietnamese can you TeX/LaTeX and ConTeXt with different input
encodings: TCVN, VISCII, VPS or UTF-8.

I'm using currently ConTeXt UTF-8 input for ConTeXt no problem. Not
yet tested with another input encoding, but no more problem with
TeX/LaTeX, so should be ok with ConTeXt, i'm wrong?

-- 
http://vnoss.org
Vietnamese Open Source Software Community

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-14 19:13   ` Steffen Wolfrum
  2005-07-14 19:32     ` VnPenguin
@ 2005-07-15  5:16     ` Radhelorn
  2005-07-15  9:09     ` Henning Hraban Ramm
  2 siblings, 0 replies; 31+ messages in thread
From: Radhelorn @ 2005-07-15  5:16 UTC (permalink / raw)


Steffen Wolfrum wrote:
>>>I know there is \enableregime[utf]
>>>but what else I needed that the output equals my utf-8 input?
>>>
>>>Could some maybe give a short and usable How-To on common examples:
>>>Greek
>>>Russian
>>>an East European language
>>>and an Asian language?
>>
>>You did read http://contextgarden.net/Encodings_and_Regimes and
>>linked pages, did you?
>>If you learn anything new, please add it to the wiki!
> 
> Well, yes, I wasn't interested in e.g. VISCII, but I read the info for UTF.
> But as you wrote "linked pages" I became more curious and looked up also those
> pages. Indeed, there is more:
> 
> But, why is the Vietnamese example with
> \enableregime[utf]
> \setupencoding[default=t5
> linked under
> vis = viscii	VISCII	Vietnamesevis = viscii	VISCII	Vietnamese
> and not accessable with
> utf	UTF-8	Unicode ? (Same for cyrillic)
> 
> Is this just a wrong link, or does this show that I don't have understood the
> realationship between regimes and encoding?
> 
> Shouldn't all UTF relevant examples be listed under UTF?
> 
> 

\enableregime is not enough. You need to setup font encoding and 
appropriate bodyfont. For these see type-enc, type-pre and such.

Example for cyrillic:

\enableregime [utf]
\setupencoding [default=t2a]
\usetypescript [modern-base] [\defaultencoding]
\setupbodyfont [modern]

\starttext
Тест.
\stoptext


-- 
Radhelorn <radhelorn@mail.ru>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-14 19:13   ` Steffen Wolfrum
  2005-07-14 19:32     ` VnPenguin
  2005-07-15  5:16     ` Radhelorn
@ 2005-07-15  9:09     ` Henning Hraban Ramm
  2 siblings, 0 replies; 31+ messages in thread
From: Henning Hraban Ramm @ 2005-07-15  9:09 UTC (permalink / raw)


Am 2005-07-14 um 21:13 schrieb Steffen Wolfrum:

> But, why is the Vietnamese example with
> \enableregime[utf]
> linked under
> vis = viscii    VISCII    Vietnamesevis = viscii    VISCII     
> Vietnamese
> and not accessable with
> utf    UTF-8    Unicode ? (Same for cyrillic)
>
> Is this just a wrong link, or does this show that I don't have  
> understood the
> realationship between regimes and encoding?
> Shouldn't all UTF relevant examples be listed under UTF?

All examples are (could be) relevant for UTF-8, because you can set  
(nearly)
everything in Unicode.

VISCII is one possible encoding for Vietnamese (and only for  
Vietnamese),
so I found it rather logical to link from there to V., even if the V.  
example
uses UTF-8, which is probably more modern - as probably a lot of other
encodings are obsolete/deprecated.

So, even if the V. example could be considered a general UTF-8 example,
it shows how one can (and perhaps should) typeset V.

So I guess the only error or missing link is the link from UTF-8 to  
Vietnamese
(and Cyrillic). Do it yourself as you please.



Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-14 10:29 ` Henning Hraban Ramm
  2005-07-14 19:13   ` Steffen Wolfrum
@ 2005-07-15 18:43   ` Mojca Miklavec
  2005-07-15 18:59     ` hungarumlaut (was: Basic question on Unicode) Henning Hraban Ramm
                       ` (3 more replies)
  1 sibling, 4 replies; 31+ messages in thread
From: Mojca Miklavec @ 2005-07-15 18:43 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]

Henning Hraban Ramm wrote:
> You did read http://contextgarden.net/Encodings_and_Regimes and
> linked pages, did you?
> If you learn anything new, please add it to the wiki!

Thank you! It was probably me who copy-pasted some of the material
there from some thread, but when I looked at it once again, I learnt
something new. A while ago I was asking how to typeset things in
windows-1250 encoding (\usepackage[cp1250]{inputenc} in LaTeX).  I got
some answer (just a temporary solution with csr fonts), but it was not
a satisfying one.

I'm now attaching a file for support for windows-1250-encoded files.
One character is missing (I don't know what to write for non-breaking
space) and it's not extensively tested or proved for typos. So if
someone can drop an eye on it, I'll be glad.

Does anyone have any script to test the encoding (which would produce
a matrix of (almost) 266 characters)?

regi-lat.tex is interesting, made just for typesetting Croatian :)
Perhaps I can add some stuff there too.

\defineactivetoken đ {\pseudoencodeddj}
\defineactivetoken Ð {\pseudoencodedDJ}

This should be \dstroke and \Dstroke.

Where did the "hungarumlaut" characters get the name from? Woudn't it
be better to have "doubleaccute" (as in UNICODE standard). We also
don't name the characters "germanumlaut" but "diaeresis" instead.

Mojca

[-- Attachment #2: regi-cp1250.tex --]
[-- Type: application/x-tex, Size: 11143 bytes --]

[-- Attachment #3: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: hungarumlaut (was: Basic question on Unicode)
  2005-07-15 18:43   ` Mojca Miklavec
@ 2005-07-15 18:59     ` Henning Hraban Ramm
  2005-07-15 21:13     ` ISO/windows encodings (was: Basic question on Unicode ...) Mojca Miklavec
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Henning Hraban Ramm @ 2005-07-15 18:59 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 429 bytes --]

Am 2005-07-15 um 20:43 schrieb Mojca Miklavec:
> Where did the "hungarumlaut" characters get the name from? Woudn't it
> be better to have "doubleaccute" (as in UNICODE standard). We also
> don't name the characters "germanumlaut" but "diaeresis" instead.

AFAIK the name is PostScript standard - Adobe used some strange names...




Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2312 bytes --]

[-- Attachment #2: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ISO/windows encodings (was: Basic question on Unicode ...)
  2005-07-15 18:43   ` Mojca Miklavec
  2005-07-15 18:59     ` hungarumlaut (was: Basic question on Unicode) Henning Hraban Ramm
@ 2005-07-15 21:13     ` Mojca Miklavec
  2005-07-17 23:38       ` ISO/windows encodings Hans Hagen
  2005-07-17 20:01     ` Basic question on Unicode and ConTeXt Hans Hagen
  2005-07-17 20:37     ` Hans Hagen
  3 siblings, 1 reply; 31+ messages in thread
From: Mojca Miklavec @ 2005-07-15 21:13 UTC (permalink / raw)


(Sorry, I should have opened another thread already before.)

I have another couple of questions about regimes support.

How can synonyms for regimes be defined, so that
\enableregime[windows-1250] would have the same effect as
\enableregime[win-1250] or \enableregime[cp1250]? And
\enableregime[utf8] the same effect as \enableregime[utf].

I don't won't to be discriminating, but \enableregime[windows] is like
writing \enableregime[latin] ("il" in ConTeXt I think) and expecting
the whole world to understand that you mean latin1. In my opinion it
should be left there (for backward compatibility if for nothing else),
but deprecated and given an unambigious name like "windows-1252",
"windows1252", "win-1252", "win1252", "cp1252" or "windows-western".

> Does anyone have any script to test the encoding (which would produce
> a matrix of (almost) 266 characters)?

(Seems like I should have learnt for my math exam tomorrow instead of
writing this.) I meant if someone has a nice macro or a script to
produce the 256 (not 266!) characters table (minus non-printable
ones), maybe together with the corresponding name (only if it can
still be extracted). It should either look like an usual ASCII table
(perhaps with a box around like in TeX font tables) or simply one
character per line with a decimal and hex number written.

More or less in order to be able to test if the regi-* files are OK.

I prepared the file by hand, but now as I know where to look for and
after I saw the http://czyborra.com/charsets/iso8859.html page, I
think it shouldn't be a problem to prepare support for all those usual
and unusual encodings at once (only a clever script and some
manually-prepared mapping from unicode to ConTeXt names).

Unicode is great, but not everyone uses it (even vim behaves pretty
system-dependent and cannot always be used for unicode
out-of-the-box).

(I also forgot to thank Patrick for explaining me some stuff about regimes.)

Mojca

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-15 18:43   ` Mojca Miklavec
  2005-07-15 18:59     ` hungarumlaut (was: Basic question on Unicode) Henning Hraban Ramm
  2005-07-15 21:13     ` ISO/windows encodings (was: Basic question on Unicode ...) Mojca Miklavec
@ 2005-07-17 20:01     ` Hans Hagen
  2005-07-18  5:50       ` VnPenguin
  2005-07-18 20:26       ` Mojca Miklavec
  2005-07-17 20:37     ` Hans Hagen
  3 siblings, 2 replies; 31+ messages in thread
From: Hans Hagen @ 2005-07-17 20:01 UTC (permalink / raw)


Mojca Miklavec wrote:

>regi-lat.tex is interesting, made just for typesetting Croatian :)
>Perhaps I can add some stuff there too.
>
>\defineactivetoken đ {\pseudoencodeddj}
>\defineactivetoken Ð {\pseudoencodedDJ}
>
>This should be \dstroke and \Dstroke.
>  
>
ok, changed

>Where did the "hungarumlaut" characters get the name from? Woudn't it
>be better to have "doubleaccute" (as in UNICODE standard). We also
>don't name the characters "germanumlaut" but "diaeresis" instead.
>  
>
the names probably come from postscript 

btw, there is a differnece between umlaut and diaeresis (height) 

Hans 


-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-15 18:43   ` Mojca Miklavec
                       ` (2 preceding siblings ...)
  2005-07-17 20:01     ` Basic question on Unicode and ConTeXt Hans Hagen
@ 2005-07-17 20:37     ` Hans Hagen
  2005-07-17 21:51       ` Henning Hraban Ramm
  3 siblings, 1 reply; 31+ messages in thread
From: Hans Hagen @ 2005-07-17 20:37 UTC (permalink / raw)


Mojca Miklavec wrote:

>I'm now attaching a file for support for windows-1250-encoded files.
>One character is missing (I don't know what to write for non-breaking
>space) and it's not extensively tested or proved for typos. So if
>someone can drop an eye on it, I'll be glad.
>  
>
maybe a better name is regi-ce or just regi-1250

>Does anyone have any script to test the encoding (which would produce
>a matrix of (almost) 266 characters)?
>  
>
there are

\showcharacters
\showaccents

it all depends on the combination of input regime and font encoding 

Hans 


-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-17 20:37     ` Hans Hagen
@ 2005-07-17 21:51       ` Henning Hraban Ramm
  2005-07-17 22:36         ` Hans Hagen
  0 siblings, 1 reply; 31+ messages in thread
From: Henning Hraban Ramm @ 2005-07-17 21:51 UTC (permalink / raw)


Am 2005-07-17 um 22:37 schrieb Hans Hagen:

> there are
>
> \showcharacters
> \showaccents

BTW I finally created the wiki page "Visual Debugging" for all the  
\show... commands; I guess there are even more than I listed there,  
and some descriptions are still missing (had no time to try them all).


Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-17 21:51       ` Henning Hraban Ramm
@ 2005-07-17 22:36         ` Hans Hagen
  2005-07-18 16:18           ` Visual Debugging (was: Basic question) Henning Hraban Ramm
  0 siblings, 1 reply; 31+ messages in thread
From: Hans Hagen @ 2005-07-17 22:36 UTC (permalink / raw)


Henning Hraban Ramm wrote:

> Am 2005-07-17 um 22:37 schrieb Hans Hagen:
>
>> there are
>>
>> \showcharacters
>> \showaccents
>
>
> BTW I finally created the wiki page "Visual Debugging" for all the  
> \show... commands; I guess there are even more than I listed there,  
> and some descriptions are still missing (had no time to try them all).

thanks 

(\trace... is also handy) 

Hans 


-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ISO/windows encodings
  2005-07-15 21:13     ` ISO/windows encodings (was: Basic question on Unicode ...) Mojca Miklavec
@ 2005-07-17 23:38       ` Hans Hagen
  0 siblings, 0 replies; 31+ messages in thread
From: Hans Hagen @ 2005-07-17 23:38 UTC (permalink / raw)


Mojca Miklavec wrote:

>I have another couple of questions about regimes support.
>
>How can synonyms for regimes be defined, so that
>\enableregime[windows-1250] would have the same effect as
>\enableregime[win-1250] or \enableregime[cp1250]? And
>\enableregime[utf8] the same effect as \enableregime[utf].
>
>I don't won't to be discriminating, but \enableregime[windows] is like
>writing \enableregime[latin] ("il" in ConTeXt I think) and expecting
>the whole world to understand that you mean latin1. In my opinion it
>should be left there (for backward compatibility if for nothing else),
>but deprecated and given an unambigious name like "windows-1252",
>"windows1252", "win-1252", "win1252", "cp1252" or "windows-western".
>  
>
I'll send you a few lines of code to test

Hans


-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-17 20:01     ` Basic question on Unicode and ConTeXt Hans Hagen
@ 2005-07-18  5:50       ` VnPenguin
  2005-07-18 20:26       ` Mojca Miklavec
  1 sibling, 0 replies; 31+ messages in thread
From: VnPenguin @ 2005-07-18  5:50 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

On 7/17/05, Hans Hagen <pragma@wxs.nl> wrote:
> Mojca Miklavec wrote:
> 
> >regi-lat.tex is interesting, made just for typesetting Croatian :)
> >Perhaps I can add some stuff there too.
> >
> >\defineactivetoken đ {\pseudoencodeddj}
> >\defineactivetoken Ð {\pseudoencodedDJ}
> >
> >This should be \dstroke and \Dstroke.
> >
> >
> ok, changed
> 

yes, there are also exactly two glyphs \dstroke and \Dstroke in Vietnamese :)

Cheers,
-- 
http://vnoss.org
Vietnamese Open Source Software Community

[-- Attachment #2: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Visual Debugging (was: Basic question)
  2005-07-17 22:36         ` Hans Hagen
@ 2005-07-18 16:18           ` Henning Hraban Ramm
  2005-07-18 20:44             ` Brooks Moses
  0 siblings, 1 reply; 31+ messages in thread
From: Henning Hraban Ramm @ 2005-07-18 16:18 UTC (permalink / raw)



Am 2005-07-18 um 00:36 schrieb Hans Hagen:
>>> \showcharacters
>>> \showaccents
>> BTW I finally created the wiki page "Visual Debugging" for all  
>> the  \show... commands; I guess there are even more than I listed  
>> there,  and some descriptions are still missing (had no time to  
>> try them all).
> (\trace... is also handy)

Hm, there's no trace in texshow, but a lot of trace...true in the  
sources; hopefully I catched them all on http://contextgarden.net/ 
Visual_Debugging
I found some \tracing in the jEdit xml, but nowhere in the sources,  
and some single \traced... (with d).


Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-17 20:01     ` Basic question on Unicode and ConTeXt Hans Hagen
  2005-07-18  5:50       ` VnPenguin
@ 2005-07-18 20:26       ` Mojca Miklavec
  2005-07-18 21:46         ` Hans Hagen
  2005-07-18 21:54         ` Hans Hagen
  1 sibling, 2 replies; 31+ messages in thread
From: Mojca Miklavec @ 2005-07-18 20:26 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 2398 bytes --]

Hans Hagen wrote:
> Mojca Miklavec wrote:
> 
> >regi-lat.tex is interesting, made just for typesetting Croatian :)
> >Perhaps I can add some stuff there too.
> >
> >\defineactivetoken đ {\pseudoencodeddj}
> >\defineactivetoken Ð {\pseudoencodedDJ}
> >
> >This should be \dstroke and \Dstroke.
> >
> ok, changed

Thank you.

\Dstroke has some "problems" anyway, at least in cmr (lmr?). The
stroke should be on the left, but it is on the right. I thought it was
just because \tt don't have that glyph, but also the roman version is
rendered extremely bad.

> >Where did the "hungarumlaut" characters get the name from?
> >
> the names probably come from postscript

Thanks, I looked into some .afm files and they were actually there.

> btw, there is a differnece between umlaut and diaeresis (height)

So what is the proper way of writing 'ä' (a umlaut) then?

> can't you make it into a
> 
> \defineactivetoken 128 {\texteuro} % € 20AC EURO SIGN
> 
> kind of table?

Good idea indeed, it looks much nicer this way.

> maybe a better name is regi-ce or just regi-1250

regi-ce is a bad name as there are 4 central european encodings
(IBM-853, ISO-8859-2, MacCE and Windows-1250) plus Croatian. 1250
alone is probably OK, but there's no hint in file name about which
encoding is meant (windows/ibm/iso/mac ...).


I tested the code for regime synonyms and it looks OK. Thanks for
investingating my request :)

> (concerning eregi-* files: you can define filesynonyms so we need a list of filesynonyms and regimesynonyms)

What do you mean by writing file synonyms? Where would it be used?

For unicode regimes, this is probably an useful (more or less complete) set.

\defineregimesynonym[utf8][utf]
\defineregimesynonym[utf 8][utf]
\defineregimesynonym[utf-8][utf]
\defineregimesynonym[unicode][utf]

(Btw, I tried all the four before I got the answer on the mailing list
that I should use 'utf' instead.)

For the rest of the regimes I have to take a look first, so that I
don't say anything wrong. There has to be only one clear scheme.

> there are
> 
> \showcharacters
> \showaccents

Thank you. The commands were only kind-of-working here. They produced
the table that I wanted (and quite some trash as well), but they were
complaining a lot.

Thanks for the contribution into Visual debugging, Hraban!


What's the proper name for nonbreaking space, '~', to be put in regi-* file?

Mojca

[-- Attachment #2: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Visual Debugging (was: Basic question)
  2005-07-18 16:18           ` Visual Debugging (was: Basic question) Henning Hraban Ramm
@ 2005-07-18 20:44             ` Brooks Moses
  2005-07-18 21:41               ` Visual Debugging Hans Hagen
  0 siblings, 1 reply; 31+ messages in thread
From: Brooks Moses @ 2005-07-18 20:44 UTC (permalink / raw)


At 09:18 AM 7/18/2005, you wrote:

>Am 2005-07-18 um 00:36 schrieb Hans Hagen:
>>>>\showcharacters
>>>>\showaccents
>>>BTW I finally created the wiki page "Visual Debugging" for all
>>>the  \show... commands; I guess there are even more than I listed
>>>there,  and some descriptions are still missing (had no time to
>>>try them all).
>>(\trace... is also handy)
>
>Hm, there's no trace in texshow, but a lot of trace...true in the
>sources; hopefully I catched them all on http://contextgarden.net/ 
>Visual_Debugging
>I found some \tracing in the jEdit xml, but nowhere in the sources,
>and some single \traced... (with d).

I had also put some lists of all the \show... and \trace... commands I 
found in the sources on the Discussion page for Visual_Debugging; it may be 
useful to go through and compare our lists.

Note that the \trace... commands seem to be defined with \newif; thus, they 
come in "\iftrace...", "\trace...true", and "\trace...false" trios.

- Brooks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Visual Debugging
  2005-07-18 20:44             ` Brooks Moses
@ 2005-07-18 21:41               ` Hans Hagen
  0 siblings, 0 replies; 31+ messages in thread
From: Hans Hagen @ 2005-07-18 21:41 UTC (permalink / raw)


Brooks Moses wrote:

> I had also put some lists of all the \show... and \trace... commands I 
> found in the sources on the Discussion page for Visual_Debugging; it 
> may be useful to go through and compare our lists.
>
> Note that the \trace... commands seem to be defined with \newif; thus, 
> they come in "\iftrace...", "\trace...true", and "\trace...false" trios.

when the list we can see if some more consistency is needed 

Hans 
 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-18 20:26       ` Mojca Miklavec
@ 2005-07-18 21:46         ` Hans Hagen
  2005-07-18 21:54         ` Hans Hagen
  1 sibling, 0 replies; 31+ messages in thread
From: Hans Hagen @ 2005-07-18 21:46 UTC (permalink / raw)


Mojca Miklavec wrote:

>\Dstroke has some "problems" anyway, at least in cmr (lmr?). The
>stroke should be on the left, but it is on the right. I thought it was
>just because \tt don't have that glyph, but also the roman version is
>rendered extremely bad.
>  
>
in case of doubt, you can discuss this with  Boguslaw Jackowski (jacko) 
who is in charge of latin modern; it shoul dbe ok in latin roman

>So what is the proper way of writing 'ä' (a umlaut) then?
>  
>
in german mode, "u will produce it (tricky since ther eis no hyphenation 
then)

latin modern did have them and there is a special encoding vector in the 
context distribution (awaiting for those umlaust to show up again)

Hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-18 20:26       ` Mojca Miklavec
  2005-07-18 21:46         ` Hans Hagen
@ 2005-07-18 21:54         ` Hans Hagen
  2005-07-18 23:11           ` Mojca Miklavec
  1 sibling, 1 reply; 31+ messages in thread
From: Hans Hagen @ 2005-07-18 21:54 UTC (permalink / raw)


Mojca Miklavec wrote:

>>maybe a better name is regi-ce or just regi-1250
>>    
>>
>
>regi-ce is a bad name as there are 4 central european encodings
>(IBM-853, ISO-8859-2, MacCE and Windows-1250) plus Croatian. 1250
>alone is probably OK, but there's no hint in file name about which
>encoding is meant (windows/ibm/iso/mac ...).
>
>
>I tested the code for regime synonyms and it looks OK. Thanks for
>investingating my request :)
>  
>
ok, i'll add it to enco-ini then

>>(concerning eregi-* files: you can define filesynonyms so we need a list of filesynonyms and regimesynonyms)
>>    
>>
>
>What do you mean by writing file synonyms? Where would it be used?
>  
>

\definefilesynonym  [mojka]  [mojca]
\definefilesynonym  [moika]  [mojca]
\definefilesynonym  [moica]  [mojca]

>For unicode regimes, this is probably an useful (more or less complete) set.
>
>\defineregimesynonym[utf8][utf]
>\defineregimesynonym[utf 8][utf]
>  
>
the spacy one does not make much sense

>\defineregimesynonym[utf-8][utf]
>\defineregimesynonym[unicode][utf]
>  
>
not sure about this one

>(Btw, I tried all the four before I got the answer on the mailing list
>that I should use 'utf' instead.)
>
>For the rest of the regimes I have to take a look first, so that I
>don't say anything wrong. There has to be only one clear scheme.
>  
>
indeed, i'll wait patiently for your complete list of synonyms

>>there are
>>
>>\showcharacters
>>\showaccents
>>    
>>
>
>Thank you. The commands were only kind-of-working here. They produced
>the table that I wanted (and quite some trash as well), but they were
>complaining a lot.
>
>Thanks for the contribution into Visual debugging, Hraban!
>
>
>What's the proper name for nonbreaking space, '~', to be put in regi-* file?
>  
>
how about \nonbreakablespace

Hans

-- 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-18 21:54         ` Hans Hagen
@ 2005-07-18 23:11           ` Mojca Miklavec
  2005-07-19  8:06             ` Hans Hagen
  0 siblings, 1 reply; 31+ messages in thread
From: Mojca Miklavec @ 2005-07-18 23:11 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 3923 bytes --]

Hans Hagen wrote:
> Mojca Miklavec wrote:
> 
> >>(concerning eregi-* files: you can define filesynonyms so we need a list of filesynonyms and regimesynonyms)
> >>
> >
> >What do you mean by writing file synonyms? Where would it be used?
> 
> \definefilesynonym  [mojka]  [mojca]
> \definefilesynonym  [moika]  [mojca]
> \definefilesynonym  [moica]  [mojca]

Ok, if you are provocating, I'll strike back:
None of the definitions above are allowed because they don't warn the
user if he's using the wrong name. They should throw an error instead.
The only proper way would be to define something like

\setuplabeltext[\s!en][\v!pronouncemyname=moitsa]
\setuplabeltext[\s!de][\v!pronouncemyname=mojza]
\setuplabeltext[\s!ru][\v!pronouncemyname=мойца]
...

> >For unicode regimes, this is probably an useful (more or less complete) set.
> >
> >\defineregimesynonym[utf8][utf]
> >\defineregimesynonym[utf 8][utf]
> >
> >
> the spacy one does not make much sense
> 
> >\defineregimesynonym[utf-8][utf]
> >\defineregimesynonym[unicode][utf]
> >
> >
> not sure about this one

Me neither, but "utf" alone is just as doubtful as this one. However,
leaving utf-8 and utf8 only is OK.

> >(Btw, I tried all the four before I got the answer on the mailing list
> >that I should use 'utf' instead.)
> >
> >For the rest of the regimes I have to take a look first, so that I
> >don't say anything wrong. There has to be only one clear scheme.
> >
> indeed, i'll wait patiently for your complete list of synonyms

OK. I'll prepare \defineregimesynonym-s proposals, but I still don't
know what the file synonyms should be used for in this context. The
user probably doesn't need to care about file names?

> >What's the proper name for nonbreaking space, '~', to be put in regi-* file?
> >
> how about \nonbreakablespace

Thanks. There was no such glyph in \showcharacters -)

(PS: I'm sorry for accusing the innocent commands of \showcharacters
and \showaccents for the missfunctionality. I accidentaly placed them
after an \obeylines command as I was debugging some files. They
couldn't have worked there anyway.)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I wanted to post this in another thread, but it probably still fits on
this place:

The regi-* files currently map characters from individual encodings
directly into \TeXcommands. But unicode is already supported in
ConTeXt and the mappings from single file encodings into unicode are
pretty well defined (perhaps there are some exceptions?) and can be
obtained elsewhere on the internet. On the other hand, mapping from
unicode to \TeXcommands is much less straightforward and sometimes
subjective.

I noticed some comments in regi-* files like
  % \texttrademark changed to \trademark
or
  % \dots changed to \textellipsis

The one who does the changes like that probably does them only in one
file, the rest remains as is (and probably becomes deprecated if not
unfunctional one day).

On the other hand, there are around ten different cyrilic encodings
(mostly they are already supported by ConTeXt, but anyway) and many
other encodings in other languages as well. This means that the same
cyrilic letter has to be assigned the name in ten files (regimes),
possibly manually.

So why not mapping the characters to unicode first and defining the
mapping from unicode to \TeXcommand only once? regi-* files (at least
in the meaning they have now) could be prepared automatically by a
script, less error-prone and without the need to say "Some more
definitions will be added later."


Is it possible to switch the regimes in the middle of the document
(like it is possible to switch the languages)? An example usage would
be if some input documents (plain text, some older TeX files or
database entries) are written in some other encoding than the main
stream.
(Possibly switching in such a way that no leftovers remain after the
old encoding is replaced by a new one.)

Mojca

[-- Attachment #2: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-18 23:11           ` Mojca Miklavec
@ 2005-07-19  8:06             ` Hans Hagen
  2005-07-20 20:35               ` Christopher Creutzig
  0 siblings, 1 reply; 31+ messages in thread
From: Hans Hagen @ 2005-07-19  8:06 UTC (permalink / raw)


Mojca Miklavec wrote:

>Hans Hagen wrote:
>  
>
>>Mojca Miklavec wrote:
>>
>>    
>>
>>>>(concerning eregi-* files: you can define filesynonyms so we need a list of filesynonyms and regimesynonyms)
>>>>
>>>>        
>>>>
>>>What do you mean by writing file synonyms? Where would it be used?
>>>      
>>>
>>\definefilesynonym  [mojka]  [mojca]
>>\definefilesynonym  [moika]  [mojca]
>>\definefilesynonym  [moica]  [mojca]
>>    
>>
>
>Ok, if you are provocating, I'll strike back:
>None of the definitions above are allowed because they don't warn the
>user if he's using the wrong name. They should throw an error instead.
>The only proper way would be to define something like
>
>\setuplabeltext[\s!en][\v!pronouncemyname=moitsa]
>\setuplabeltext[\s!de][\v!pronouncemyname=mojza]
>\setuplabeltext[\s!ru][\v!pronouncemyname=мойца]
>...
>  
>
so how about using:

\translate[en=moitsa,de=mojza,ru=мойца]

then -)

>OK. I'll prepare \defineregimesynonym-s proposals, but I still don't
>know what the file synonyms should be used for in this context. The
>user probably doesn't need to care about file names?
>  
>
depends on if you want to preload all those vectors (take quite some 
memory although i may find a way around that [maybe delayed loading]

>So why not mapping the characters to unicode first and defining the
>mapping from unicode to \TeXcommand only once? regi-* files (at least
>in the meaning they have now) could be prepared automatically by a
>script, less error-prone and without the need to say "Some more
>definitions will be added later."
>  
>
you mean ...

\defineactivetoken 123 {\uchar{...}{...}}

it is an option but it's much slower and take much more memory

\uchar{2}{33} takes 1 hash pointer and 7 char slots (so probably 8 mem 
locations) while \eacute takes one mem location

>Is it possible to switch the regimes in the middle of the document
>(like it is possible to switch the languages)? An example usage would
>be if some input documents (plain text, some older TeX files or
>database entries) are written in some other encoding than the main
>stream.
>(Possibly switching in such a way that no leftovers remain after the
>old encoding is replaced by a new one.)
>  
>
switching is possible but in that case  you probably want to set toc/index/etc expansion to yes 

Hans 


-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-19  8:06             ` Hans Hagen
@ 2005-07-20 20:35               ` Christopher Creutzig
  2005-07-21  0:52                 ` Mojca Miklavec
  0 siblings, 1 reply; 31+ messages in thread
From: Christopher Creutzig @ 2005-07-20 20:35 UTC (permalink / raw)


Hans Hagen wrote:

>> So why not mapping the characters to unicode first and defining the
>> mapping from unicode to \TeXcommand only once? regi-* files (at least
>> in the meaning they have now) could be prepared automatically by a
>> script, less error-prone and without the need to say "Some more
>> definitions will be added later."
>>  
>>
> you mean ...
> 
> \defineactivetoken 123 {\uchar{...}{...}}
> 
> it is an option but it's much slower and take much more memory

  I may be wrong, of course, but I think Mojca proposed something 
different (and something that should be really easy to implement):  Have 
the unicode vectors stored in a format easily parsed by an external ruby 
script and create the regi-* files from that, using the conversion 
tables provided by your operating system or iconv or wherever ruby gets 
them from.


regards,
	Christopher

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-20 20:35               ` Christopher Creutzig
@ 2005-07-21  0:52                 ` Mojca Miklavec
  2005-07-22 11:30                   ` Christopher Creutzig
  0 siblings, 1 reply; 31+ messages in thread
From: Mojca Miklavec @ 2005-07-21  0:52 UTC (permalink / raw)


Christopher Creutzig wrote:
> Hans Hagen wrote:
> >> So why not mapping the characters to unicode first and defining the
> >> mapping from unicode to \TeXcommand only once? regi-* files (at least
> >> in the meaning they have now) could be prepared automatically by a
> >> script, less error-prone and without the need to say "Some more
> >> definitions will be added later."
> >>
> > you mean ...
> >
> > \defineactivetoken 123 {\uchar{...}{...}}
> >
> > it is an option but it's much slower and take much more memory
> 
>   I may be wrong, of course, but I think Mojca proposed something
> different (and something that should be really easy to implement):  Have
> the unicode vectors stored in a format easily parsed by an external ruby
> script and create the regi-* files from that, using the conversion
> tables provided by your operating system or iconv or wherever ruby gets
> them from.

Yes, I had something different in mind.

A1.) prepare the files to be used as a source of transformation from
"any" character set to utf and prepare a list of synonyms for
encodings

(example: a file that says that in ISO-8859-2, character 0xA3
represents an unicode character 0x0141 (lstroke): for every character,
for every Mac/Windows/iso/[...] encoding that we want to support)

A2.) write a script which automatically generates regi-* files from
those files, but regi-* files would contain only the mapping to
unicode number

(example:
\startregime[iso-8859-2]
...
\somecommandtomapacharactertounicode {163}{1}{65} % lstroke
...
\stopregime)

A3.) prepare a huge file with mapping from unicode numbers to ConTeXt commands

(example:
...
\somecommandtomapfromunicodetocontext {1}{65}{\lstroke}
...)

A4.) ... I don't mind what ConTeXt does with this \lstroke afterwards,
but it seems it is already clever enough to produce the (proper) glyph
at the end

What should ConTeXt do with that?
B1.) The file under A3 should be processed at the beginning. As it may
become really huge, exotic definitions should be only preloaded if
asked for (\usemodule[korean]), while there is probably no harm if
(accented) latin, greek, cyrillic and punctuation (TM, copyright, ..)
are preloaded by default

B2.) Once the \enableregime[iso-8859-2] or any other regime is
requested, the file with the corresponding regime definitions is
processed. However, as \somecommandtomapacharactertounicode
{163}{1}{65} is processed, the character '163' is not stored as
\uchar{1}{65}, but as \lstroke. '\somecommandtomapacharactertounicode'
would first take a look which ConTeXt command is saved under
\uchar{1}{65} and call the
\defineactivetoken 179 {\lstroke} as a result.

I don't know the details of the ConTeXt internal stuff, but I think
(hope) that it should be possible to do it this way. B1 (preloading
mapping from unicode to tex commands) is probably the only "hungry"
step in the whole story.

I think that it doesn't make any sense to ask the user to "\input
regi-whatever". \enableregime and some additional definitions should
be clever enough to find out which file to process in order to enable
the proper regime.

%%%%%%%%%%%%%%%%%%%%%

Christopher's idea is actually yet another alternative, which combines
the steps A2 and A3. If the mapping unicode->ConTeXt is in some
easy-to-parse format, there's actually no additional effort if the
script writes directly the ConTeXt commands instead of unicode numbers
into regi-* files, so that B2 has some less work to do. As long as it
is guaranteed that nobody will change these files manually, this is
OK. The only drawback is that if someone notices that "\textellipsis"
is more suitable than "\dots", the script has to be changed and the
files have to be generated once more. If the character is mapped to
(0x2026 HORIZONTAL ELLIPSIS) instead, only one line in the file with
unicode->ConTeXt mapping (A3) has to be changed.

If B2 cannot work as described, the Christopher's proposal would be
the only proper way to go.

%%%%%%%%%%%%%%%%%%%%%

I wanted to test \showcharacters on the live.contextgarden.net (as
Hans suggested that my map files are probably not OK), but it didn't
compile there. (I hope it's not because of my buggy contributions in
the last few days.)

Is there any tool or macro to visialize all the glyphs available in a
font? \showcharacters (if it works) shows only the glyphs that ConTeXt
is aware of. What about the rest?

Mojca

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-21  0:52                 ` Mojca Miklavec
@ 2005-07-22 11:30                   ` Christopher Creutzig
  2005-07-22 12:05                     ` Hans Hagen
  2005-07-22 22:20                     ` Mojca Miklavec
  0 siblings, 2 replies; 31+ messages in thread
From: Christopher Creutzig @ 2005-07-22 11:30 UTC (permalink / raw)


Mojca Miklavec wrote:

> A1.) prepare the files to be used as a source of transformation from
> "any" character set to utf and prepare a list of synonyms for
> encodings

  In my point of view, that should only be a fallback.  We already have 
Iconv in ruby and can, if we know that ISO-8859-2 is a single byte 
coding system, simply say

conv = Iconv.new("UTF-16", "ISO-8859-2")
255.times { |i| puts lookup[conv.iconv("%c" % i)] }

to get the whole list, assuming we've filled the lookup hash first.


  As you've said, I'd combine steps A2 and A3, to make ConTeXt run 
faster.  If you want, for whatever reason, to use \textellipsis for an 
ellipsis (it just looks horribly wrong to me) instead of \dots, you'd 
need to invoke the ruby script which generates the regi-* files.

  The whole thing should not require any change at all to ConTeXt 
itself, since the regi-* files could look exactly as they do now, just 
being generated automatically.  (For the multibyte encodings, the whole 
thing gets much more tricky.)


Christopher

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-22 11:30                   ` Christopher Creutzig
@ 2005-07-22 12:05                     ` Hans Hagen
  2005-07-22 22:20                     ` Mojca Miklavec
  1 sibling, 0 replies; 31+ messages in thread
From: Hans Hagen @ 2005-07-22 12:05 UTC (permalink / raw)


Christopher Creutzig wrote:

> conv = Iconv.new("UTF-16", "ISO-8859-2")
> 255.times { |i| puts lookup[conv.iconv("%c" % i)] }
>
> to get the whole list, assuming we've filled the lookup hash first.

an alternative is to use the tcx files but that is kind of messy

so we need a utf-8 hash (can be loaded from unic-* files)

Hans

 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-22 11:30                   ` Christopher Creutzig
  2005-07-22 12:05                     ` Hans Hagen
@ 2005-07-22 22:20                     ` Mojca Miklavec
  2005-07-25 15:58                       ` Henning Hraban Ramm
  2005-07-25 23:49                       ` Hans Hagen
  1 sibling, 2 replies; 31+ messages in thread
From: Mojca Miklavec @ 2005-07-22 22:20 UTC (permalink / raw)


Christopher Creutzig wrote:
> We already have
> Iconv in ruby and can, if we know that ISO-8859-2 is a single byte
> coding system, simply say
> 
> conv = Iconv.new("UTF-16", "ISO-8859-2")
> 255.times { |i| puts lookup[conv.iconv("%c" % i)] }
> 
> to get the whole list, assuming we've filled the lookup hash first.

Great!

Sorry for all my philosophising! I don't know ruby (yet) and I didn't
even think about this possibility. My last idea was to parse and
combine the data on http://www.unicode.org/Public/MAPPINGS/VENDORS/, 
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt and
http://partners.adobe.com/public/developer/en/opentype/aglfn13.txt,
but your idea is hundred times faster and better! Thanks a lot!

> As you've said, I'd combine steps A2 and A3, to make ConTeXt run faster.

That's OK for me. If there's a simple internal ruby tool (called every
time when unicode->tex mapping changes or some more encoding support
is added) instead of one-time-script, there should be no problem to do
that directly.

> If you want, for whatever reason, to use \textellipsis for an
> ellipsis (it just looks horribly wrong to me) instead of \dots, you'd
> need to invoke the ruby script which generates the regi-* files.

I just wanted to give an example that changes are sometimes needed and
that it is difficult to trace all the places where they should have
been made. Sorry, this example wasn't very ilustrative, I don't even
know what \textellipses stands for, I just saw some comments about
changes made in regi-* files or some discrepancies.

>   The whole thing should not require any change at all to ConTeXt
> itself, since the regi-* files could look exactly as they do now, just
> being generated automatically.  (For the multibyte encodings, the whole
> thing gets much more tricky.)

I noticed (perhaps I'm wrong) that TeX community support for cyrillic
may be better than that in unicode and in the available old 8bit
encodings. ConTeXt is also already supporting those strange regimes
(ctt, dbk, mls, mnk, mos, ncc, ...) that I was unable to find anywhere
else. In this case one should also be careful in order not to spoil
this already available feature.

I'm still slighlty confused by the encoding files (texnansi, ec,...,
in one case iso-8859-7 is used). Does it mean that it is impossible
(or at least very complex or slow) to access more than 256 characters
from a single font at once?

Mojca

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-22 22:20                     ` Mojca Miklavec
@ 2005-07-25 15:58                       ` Henning Hraban Ramm
  2005-07-25 23:49                       ` Hans Hagen
  1 sibling, 0 replies; 31+ messages in thread
From: Henning Hraban Ramm @ 2005-07-25 15:58 UTC (permalink / raw)



Am 2005-07-23 um 00:20 schrieb Mojca Miklavec:
> I'm still slighlty confused by the encoding files (texnansi, ec,...,
> in one case iso-8859-7 is used). Does it mean that it is impossible
> (or at least very complex or slow) to access more than 256 characters
> from a single font at once?

TeX as an old 8bit system isn't able to handle more than 256 chars  
per font.
Only more modern siblings (like Omega/Aleph) are able to handle  
"Unicode size" fonts by itself.


Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Basic question on Unicode and ConTeXt
  2005-07-22 22:20                     ` Mojca Miklavec
  2005-07-25 15:58                       ` Henning Hraban Ramm
@ 2005-07-25 23:49                       ` Hans Hagen
  1 sibling, 0 replies; 31+ messages in thread
From: Hans Hagen @ 2005-07-25 23:49 UTC (permalink / raw)


Mojca Miklavec wrote:

>I'm still slighlty confused by the encoding files (texnansi, ec,...,
>in one case iso-8859-7 is used). Does it mean that it is impossible
>(or at least very complex or slow) to access more than 256 characters
>from a single font at once?
>  
>
indeed and since it's related to hyphenation ... 

but some day pdftex will be 32 bit and open type so ... 

Hans 



-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Visual debugging
@ 2011-10-01 15:44 Otso Helenius
  0 siblings, 0 replies; 31+ messages in thread
From: Otso Helenius @ 2011-10-01 15:44 UTC (permalink / raw)
  To: ntg-context

Hi,

I'm in a need for more visual debugging aids. I tried the supp-vis 
module, but it did not give enough cues.
I guess the ruledhbox is bound by ascender and descender height 
vertically, but I'd like to see also the
cap height and x-height lines. Is it possible?

Plus, is it possible to display the individual box around each letter 
similar to Knuth's TeXBook page 65?

Best regards,
Otso Helenius
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-10-01 15:44 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-14  9:30 Basic question on Unicode and ConTeXt Steffen Wolfrum
2005-07-14 10:29 ` Henning Hraban Ramm
2005-07-14 19:13   ` Steffen Wolfrum
2005-07-14 19:32     ` VnPenguin
2005-07-15  5:16     ` Radhelorn
2005-07-15  9:09     ` Henning Hraban Ramm
2005-07-15 18:43   ` Mojca Miklavec
2005-07-15 18:59     ` hungarumlaut (was: Basic question on Unicode) Henning Hraban Ramm
2005-07-15 21:13     ` ISO/windows encodings (was: Basic question on Unicode ...) Mojca Miklavec
2005-07-17 23:38       ` ISO/windows encodings Hans Hagen
2005-07-17 20:01     ` Basic question on Unicode and ConTeXt Hans Hagen
2005-07-18  5:50       ` VnPenguin
2005-07-18 20:26       ` Mojca Miklavec
2005-07-18 21:46         ` Hans Hagen
2005-07-18 21:54         ` Hans Hagen
2005-07-18 23:11           ` Mojca Miklavec
2005-07-19  8:06             ` Hans Hagen
2005-07-20 20:35               ` Christopher Creutzig
2005-07-21  0:52                 ` Mojca Miklavec
2005-07-22 11:30                   ` Christopher Creutzig
2005-07-22 12:05                     ` Hans Hagen
2005-07-22 22:20                     ` Mojca Miklavec
2005-07-25 15:58                       ` Henning Hraban Ramm
2005-07-25 23:49                       ` Hans Hagen
2005-07-17 20:37     ` Hans Hagen
2005-07-17 21:51       ` Henning Hraban Ramm
2005-07-17 22:36         ` Hans Hagen
2005-07-18 16:18           ` Visual Debugging (was: Basic question) Henning Hraban Ramm
2005-07-18 20:44             ` Brooks Moses
2005-07-18 21:41               ` Visual Debugging Hans Hagen
2011-10-01 15:44 Visual debugging Otso Helenius

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).