ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* decomposed u umlaut
@ 2018-03-20  7:42 Henning Hraban Ramm
  2018-03-22  9:08 ` Mojca Miklavec
  2018-03-22  9:34 ` Ulrike Fischer
  0 siblings, 2 replies; 6+ messages in thread
From: Henning Hraban Ramm @ 2018-03-20  7:42 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Ahoi,

I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode.
All other characters within Latin-1, including umlauts, are no problem, that’s why I think the problem might be in ConTeXt’s font handling.

(Actually, this is about my invoice addresses that I copy from PDF to the German Post postage webshop. The site is quite new and I can’t understand how a big company can buy such crappy software. I already complained, there are more problems, but of course got only a template answer.)

Greetlings, Hraban
---
http://www.fiee.net
http://wiki.contextgarden.net
GPG Key ID 1C9B22FD

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: decomposed u umlaut
  2018-03-20  7:42 decomposed u umlaut Henning Hraban Ramm
@ 2018-03-22  9:08 ` Mojca Miklavec
  2018-03-25 20:36   ` Arthur Reutenauer
  2018-03-22  9:34 ` Ulrike Fischer
  1 sibling, 1 reply; 6+ messages in thread
From: Mojca Miklavec @ 2018-03-22  9:08 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 20 March 2018 at 08:42, Henning Hraban Ramm wrote:
> Ahoi,
>
> I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode.

You are on macOS, right?

In my experience it was usually Apple's technology to blame. Perfectly
valid PDFs with proper accented characters would always end up with
decomposed characters when copy-pasting. Even pdftotext did a better
job.

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: decomposed u umlaut
  2018-03-20  7:42 decomposed u umlaut Henning Hraban Ramm
  2018-03-22  9:08 ` Mojca Miklavec
@ 2018-03-22  9:34 ` Ulrike Fischer
  2018-03-22 10:37   ` Hans Hagen
  1 sibling, 1 reply; 6+ messages in thread
From: Ulrike Fischer @ 2018-03-22  9:34 UTC (permalink / raw)
  To: ntg-context

Am Tue, 20 Mar 2018 08:42:08 +0100 schrieb Henning Hraban Ramm:

> I’ve one annoying problem with ConTeXt: all üs (small u umlauts)
> seem to be encoded as decomposed unicode or something like that,
> at least every ü breaks into u + garbage if I copy some text from
> a ConTeXt PDF to an app that doesn’t really support Unicode.

This can depend on the font. I just looked for another question at
cambria and it e.g. uses char + accent for some of the Umlauts. So
concrete code is needed to test this. 


-- 
Ulrike Fischer 
http://www.troubleshooting-tex.de/

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: decomposed u umlaut
  2018-03-22  9:34 ` Ulrike Fischer
@ 2018-03-22 10:37   ` Hans Hagen
  0 siblings, 0 replies; 6+ messages in thread
From: Hans Hagen @ 2018-03-22 10:37 UTC (permalink / raw)
  To: news3, mailing list for ConTeXt users

On 3/22/2018 10:34 AM, Ulrike Fischer wrote:
> Am Tue, 20 Mar 2018 08:42:08 +0100 schrieb Henning Hraban Ramm:
> 
>> I’ve one annoying problem with ConTeXt: all üs (small u umlauts)
>> seem to be encoded as decomposed unicode or something like that,
>> at least every ü breaks into u + garbage if I copy some text from
>> a ConTeXt PDF to an app that doesn’t really support Unicode.
> 
> This can depend on the font. I just looked for another question at
> cambria and it e.g. uses char + accent for some of the Umlauts. So
> concrete code is needed to test this.
btw, the same is true for ligature building (but i already explained 
that many times so i won't repeat myself)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: decomposed u umlaut
  2018-03-22  9:08 ` Mojca Miklavec
@ 2018-03-25 20:36   ` Arthur Reutenauer
  2018-03-25 21:26     ` Henning Hraban Ramm
  0 siblings, 1 reply; 6+ messages in thread
From: Arthur Reutenauer @ 2018-03-25 20:36 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Thu, Mar 22, 2018 at 10:08:44AM +0100, Mojca Miklavec wrote:
> On 20 March 2018 at 08:42, Henning Hraban Ramm wrote:
>> I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode.
> 
> You are on macOS, right?
> 
> In my experience it was usually Apple's technology to blame.

  I agree with you that Apple’s software has a tendency to decompose
characters, but I wouldn’t blame them for that: it’s perfectly
Unicode-compliant to do so, and by now software should support
combining characters in at least a basic way.  It’s a real problem that
the software from the Deutsche Post isn’t able to handle them correctly.

	Best,

		Arthur
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: decomposed u umlaut
  2018-03-25 20:36   ` Arthur Reutenauer
@ 2018-03-25 21:26     ` Henning Hraban Ramm
  0 siblings, 0 replies; 6+ messages in thread
From: Henning Hraban Ramm @ 2018-03-25 21:26 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Am 2018-03-25 um 22:36 schrieb Arthur Reutenauer <arthur.reutenauer@normalesup.org>:

> On Thu, Mar 22, 2018 at 10:08:44AM +0100, Mojca Miklavec wrote:
>> On 20 March 2018 at 08:42, Henning Hraban Ramm wrote:
>>> I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode.
>> 
>> You are on macOS, right?
>> In my experience it was usually Apple's technology to blame.
> 
>  I agree with you that Apple’s software has a tendency to decompose
> characters, but I wouldn’t blame them for that: it’s perfectly
> Unicode-compliant to do so, and by now software should support
> combining characters in at least a basic way.  It’s a real problem that
> the software from the Deutsche Post isn’t able to handle them correctly.

While DP shop should be able to handle more than Latin-1, the problem seems to be in the viewer or in a combination of viewer and OS:
- It doesn’t depend on the font, I tried Computer Modern and Alegreya (that is known to have some OpenType ligature issues).
- I checked with several viewers, and the Adobe apps (Acrobat Pro 9 and Reader DC) decompose just the ü, while my other viewers including Apple’s Preview decompose all the umlauts. (Just copied and pasted into an hex editor.)
- It also happens with PDFs from other sources.

So it’s not a ConTeXt bug. Sorry for the noise.

Greetlings, Hraban
---
http://www.fiee.net
http://wiki.contextgarden.net
GPG Key ID 1C9B22FD

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-25 21:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20  7:42 decomposed u umlaut Henning Hraban Ramm
2018-03-22  9:08 ` Mojca Miklavec
2018-03-25 20:36   ` Arthur Reutenauer
2018-03-25 21:26     ` Henning Hraban Ramm
2018-03-22  9:34 ` Ulrike Fischer
2018-03-22 10:37   ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).