ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Non-printable Unicode control characters
@ 2008-08-15 19:23 Khaled Hosny
  2008-08-16  7:53 ` Hans Hagen
  0 siblings, 1 reply; 4+ messages in thread
From: Khaled Hosny @ 2008-08-15 19:23 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 805 bytes --]

Unicode has many "control characters" that only control text behaviour
and shouldn't be rendered visually in the text, such as Bidi_Control and
Join_Control chars (see
http://www.unicode.org/Public/5.1.0/ucd/PropList.txt and
http://unicode.org/Public/UNIDATA/UCD.html)

Currently, ConTeXt handles ZWJ and ZWNJ, but other characters get
rendered if the font has glyphs for them or make no effect at all if the
font has no glyphs for them. I think that the optimum behaviour is to
make those characters affect text formatting while not visually rendered
whether the font has glyphs for them or not.
It might be also useful if we can enable rendering those characters
manually, for drafts and such.

Regards,
 Khaled


-- 
 Khaled Hosny
 Arabic localizer and member of Arabeyes.org team

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 487 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Non-printable Unicode control characters
  2008-08-15 19:23 Non-printable Unicode control characters Khaled Hosny
@ 2008-08-16  7:53 ` Hans Hagen
  2008-08-17  2:37   ` Idris Samawi Hamid ادريس سماوي حامد
  0 siblings, 1 reply; 4+ messages in thread
From: Hans Hagen @ 2008-08-16  7:53 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Khaled Hosny wrote:
> Unicode has many "control characters" that only control text behaviour
> and shouldn't be rendered visually in the text, such as Bidi_Control and
> Join_Control chars (see
> http://www.unicode.org/Public/5.1.0/ucd/PropList.txt and
> http://unicode.org/Public/UNIDATA/UCD.html)
> 
> Currently, ConTeXt handles ZWJ and ZWNJ, but other characters get
> rendered if the font has glyphs for them or make no effect at all if the
> font has no glyphs for them. I think that the optimum behaviour is to
> make those characters affect text formatting while not visually rendered
> whether the font has glyphs for them or not.
> It might be also useful if we can enable rendering those characters
> manually, for drafts and such.

actually we need:

- ignore them (like in verbatim)
- act upon them

and

- show them (might somehow interfere with other things)
- hide them

if i'm right, when bidi is turned on, those chars get processed and then 
discarded from the node list, so some more than zwj and zwnj is handled, 
and of course others need to be handled as well

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Non-printable Unicode control characters
  2008-08-16  7:53 ` Hans Hagen
@ 2008-08-17  2:37   ` Idris Samawi Hamid ادريس سماوي حامد
  2008-08-17 12:46     ` Hans Hagen
  0 siblings, 1 reply; 4+ messages in thread
From: Idris Samawi Hamid ادريس سماوي حامد @ 2008-08-17  2:37 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Sat, 16 Aug 2008 01:53:06 -0600, Hans Hagen <pragma@wxs.nl> wrote:

> Khaled Hosny wrote:
>> Unicode has many "control characters" that only control text behaviour
>> and shouldn't be rendered visually in the text, such as Bidi_Control and
>> Join_Control chars (see
>> http://www.unicode.org/Public/5.1.0/ucd/PropList.txt and
>> http://unicode.org/Public/UNIDATA/UCD.html)
>> Currently, ConTeXt handles ZWJ and ZWNJ, but other characters get
>> rendered if the font has glyphs for them or make no effect at all if the
>> font has no glyphs for them. I think that the optimum behaviour is to
>> make those characters affect text formatting while not visually rendered
>> whether the font has glyphs for them or not.
>> It might be also useful if we can enable rendering those characters
>> manually, for drafts and such.
>
> actually we need:
>
> - ignore them (like in verbatim)

Eventually we want to be able to show them in verbatim also (provided the  
font has them).

Indeed, I suggest that -- given an appropriate teletype font -- the  
default for _verbatim text_ should be to _show_ the control chars.

> - act upon them
>
> and
>
> - show them (might somehow interfere with other things)

Showing the control chars in typeset text -- non-verbatim -- should be  
rare; more appropriate for verbatim

> - hide them

I suggest that the default for _typeset text_ should definitely be to  
_hide_ the control chars.

> if i'm right, when bidi is turned on, those chars get processed and then
> discarded from the node list, so some more than zwj and zwnj is handled,

It appears to me that zwj and zwnj etc. should be invisible in  
typeset-text output -- as explained above, but should still be encoded in  
the output pdf. Think pdf-text extraction, converting between Arabic and  
Farsi typesetting conventions, etc.

> and of course others need to be handled as well

Even lsep's and psep's should be present in the output pdf (eg \par =>  
psep). Will make text extraction much more useful, etc.

Best wishes
Idris

-- 
Professor Idris Samawi Hamid, Editor-in-Chief
International Journal of Shi`i Studies
Department of Philosophy
Colorado State University
Fort Collins, CO 80523
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Non-printable Unicode control characters
  2008-08-17  2:37   ` Idris Samawi Hamid ادريس سماوي حامد
@ 2008-08-17 12:46     ` Hans Hagen
  0 siblings, 0 replies; 4+ messages in thread
From: Hans Hagen @ 2008-08-17 12:46 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Idris Samawi Hamid ادريس سماوي حامد wrote:

> It appears to me that zwj and zwnj etc. should be invisible in  
> typeset-text output -- as explained above, but should still be encoded in  
> the output pdf. Think pdf-text extraction, converting between Arabic and  
> Farsi typesetting conventions, etc.

we can do that later (we can use an attribute to keep track of 
preceding/following special thingies and inject them in the output later 
on)

>> and of course others need to be handled as well
> 
> Even lsep's and psep's should be present in the output pdf (eg \par =>  
> psep). Will make text extraction much more useful, etc.

rather useless in pdf; at some point i might add proper structure to the 
pdf output but it has a rather low priority (never needed it)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-08-17 12:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-15 19:23 Non-printable Unicode control characters Khaled Hosny
2008-08-16  7:53 ` Hans Hagen
2008-08-17  2:37   ` Idris Samawi Hamid ادريس سماوي حامد
2008-08-17 12:46     ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).