ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Plea for unicode help
@ 2011-05-05 11:32 Oliver Buerschaper
  2011-05-05 11:38 ` Taco Hoekwater
  0 siblings, 1 reply; 5+ messages in thread
From: Oliver Buerschaper @ 2011-05-05 11:32 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hi there,

I'm experiencing a very strange error related to unicode at the moment and I can't pin down the problem for the life of me…

The situation: I'm using context mkvi (2011.04.20 16:23) on Mac OS 10.6.7 (with the latest font patches in particular) and TeXShop 2.41. In my tex source file I have unicode letters with accents (like "clarín") and the file's encoding is set to UTF-8. It still looks like valid UTF-8 when opened with a different text editor, say SubEthaEdit or vim. The problem is, context compiles this to a PDF in which the accent is missing ("clarin").

The really strange thing happens now. I delete the offending letter and reenter it, in TeXShop there's *no* visual difference between before and after but the compiled pdf suddenly has the accent enabled again. I seem to remember that the missing accents issue didn't occur with a different font (Latin Modern vs. Minion Pro in context) for the *same* source. I'll have to check that again though.

Has anyone seen this before? I wanted to ask up front before I really start digging into the issue… I might have missed something obvious.

Many thanks,
Oliver
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Plea for unicode help
  2011-05-05 11:32 Plea for unicode help Oliver Buerschaper
@ 2011-05-05 11:38 ` Taco Hoekwater
  2011-05-05 13:52   ` Oliver Buerschaper
  0 siblings, 1 reply; 5+ messages in thread
From: Taco Hoekwater @ 2011-05-05 11:38 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 05/05/2011 01:32 PM, Oliver Buerschaper wrote:
>
> Has anyone seen this before? I wanted to ask up front before I really start digging into the issue… I might have missed something obvious.

Check the hexdump of the file. Chances are that one of them has í 
directly, and one a combination of <dotlessi><acuteaccent>.

Best wishes,
Taco
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Plea for unicode help
  2011-05-05 11:38 ` Taco Hoekwater
@ 2011-05-05 13:52   ` Oliver Buerschaper
  2011-05-05 14:21     ` Taco Hoekwater
  0 siblings, 1 reply; 5+ messages in thread
From: Oliver Buerschaper @ 2011-05-05 13:52 UTC (permalink / raw)
  To: mailing list for ConTeXt users

>> Has anyone seen this before? I wanted to ask up front before I really start digging into the issue… I might have missed something obvious.
> 
> Check the hexdump of the file. Chances are that one of them has í directly, and one a combination of <dotlessi><acuteaccent>.

Awesome hint… hits the nail on the head! The "faulty" version (i.e. the one not appearing in the PDF with Minion Pro) is <dotlessi><acuteaccent> (where <acuteaccent> appears to translate to CC81 in hex, correct?).

I guess I need to find and replace the accent combination by the direct slot? Can something similar happen for other "foreign" characters (like ß, umlauts, ae, etc.) or is this sort of error only possible with accents?

Oliver
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Plea for unicode help
  2011-05-05 13:52   ` Oliver Buerschaper
@ 2011-05-05 14:21     ` Taco Hoekwater
  2011-05-05 15:12       ` Oliver Buerschaper
  0 siblings, 1 reply; 5+ messages in thread
From: Taco Hoekwater @ 2011-05-05 14:21 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 05/05/2011 03:52 PM, Oliver Buerschaper wrote:
>>> Has anyone seen this before? I wanted to ask up front before I really start digging into the issue… I might have missed something obvious.
>>
>> Check the hexdump of the file. Chances are that one of them has í directly, and one a combination of<dotlessi><acuteaccent>.
>
> Awesome hint… hits the nail on the head! The "faulty" version (i.e. the one not appearing in the PDF with Minion Pro) is<dotlessi><acuteaccent>  (where<acuteaccent>  appears to translate to CC81 in hex, correct?).

Yes. Useful site for find out stuff like that without having to do utf-8 
calculations yourself:

   http://www.decodeunicode.org/en/u+0301/properties

At the top right, it has numerical values for the current character in 
various encodings.

> I guess I need to find and replace the accent combination by the direct slot?

That would be wise for now, but I think context should be able to trap 
this automatically (at least in the mode=node case).

> Can something similar happen for other "foreign" characters (like ß, umlauts, ae, etc.) or is this sort of error only possible with accents?

IIRC, in principle it can happen with some other characters as well, but 
I do not think that happens often. It is mostly combining accents.

Best wishes,
Taco
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Plea for unicode help
  2011-05-05 14:21     ` Taco Hoekwater
@ 2011-05-05 15:12       ` Oliver Buerschaper
  0 siblings, 0 replies; 5+ messages in thread
From: Oliver Buerschaper @ 2011-05-05 15:12 UTC (permalink / raw)
  To: mailing ConTeXt users list for

>> Awesome hint… hits the nail on the head! The "faulty" version (i.e. the one not appearing in the PDF with Minion Pro) is<dotlessi><acuteaccent>  (where<acuteaccent>  appears to translate to CC81 in hex, correct?).
> 
> Yes. Useful site for find out stuff like that without having to do utf-8 calculations yourself:
> 
>  http://www.decodeunicode.org/en/u+0301/properties
> 
> At the top right, it has numerical values for the current character in various encodings.

This page looks great. Jotted down for later reading ;-)


>> I guess I need to find and replace the accent combination by the direct slot?
> 
> That would be wise for now, but I think context should be able to trap this automatically (at least in the mode=node case).

Sounds reasonable. By the way, is the direct encoding generally preferred over the combination method (say, by good Unicode practice ;-)? If yes, I certainly wouldn't mind a little warning message if I happen to use the other variant…


>> Can something similar happen for other "foreign" characters (like ß, umlauts, ae, etc.) or is this sort of error only possible with accents?
> 
> IIRC, in principle it can happen with some other characters as well, but I do not think that happens often. It is mostly combining accents.

I see. So umlauts are good candidates to check, too.

Thanks again,
Oliver
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-05-05 15:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-05 11:32 Plea for unicode help Oliver Buerschaper
2011-05-05 11:38 ` Taco Hoekwater
2011-05-05 13:52   ` Oliver Buerschaper
2011-05-05 14:21     ` Taco Hoekwater
2011-05-05 15:12       ` Oliver Buerschaper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).