From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/43235 Path: news.gmane.org!not-for-mail From: =?utf-8?B?SWRyaXMgU2FtYXdpIEhhbWlkINin2K/YsdmK2LMg2LPZhdin2YjZiiDYrQ==?= =?utf-8?B?2KfZhdiv?= Newsgroups: gmane.comp.tex.context Subject: Re: Non-printable Unicode control characters Date: Sat, 16 Aug 2008 20:37:33 -0600 Organization: Colorado State University Message-ID: References: <20080815192308.GB5667@khaled-laptop> <48A68762.8080809@wxs.nl> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1218940780 25704 80.91.229.12 (17 Aug 2008 02:39:40 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 17 Aug 2008 02:39:40 +0000 (UTC) To: "mailing list for ConTeXt users" Original-X-From: ntg-context-bounces@ntg.nl Sun Aug 17 04:40:31 2008 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1KUYBw-00006R-Ga for gctc-ntg-context-518@m.gmane.org; Sun, 17 Aug 2008 04:40:28 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 53D811FB6D; Sun, 17 Aug 2008 04:39:29 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 01827-02-3; Sun, 17 Aug 2008 04:38:42 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id D90C61FBEF; Sun, 17 Aug 2008 04:38:41 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 9F0501FBEF for ; Sun, 17 Aug 2008 04:38:39 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 01827-02-2 for ; Sun, 17 Aug 2008 04:38:01 +0200 (CEST) Original-Received: from trueband.net (director.trueband.net [216.163.120.8]) by ronja.ntg.nl (Postfix) with SMTP id 24CE31FBCC for ; Sun, 17 Aug 2008 04:38:00 +0200 (CEST) Original-Received: (qmail 31259 invoked by uid 1006); 17 Aug 2008 02:37:53 -0000 Original-Received: from ishamid@colostate.edu by rs0 by uid 1003 with qmail-scanner-1.16 (spamassassin: 3.1.4. Clear:SA:0(0.1/100.0):. Processed in 0.702687 secs); 17 Aug 2008 02:37:53 -0000 Original-Received: from unknown (HELO trueband.net) (172.16.0.17) by -v with SMTP; 17 Aug 2008 02:37:53 -0000 Original-Received: (qmail 18653 invoked from network); 17 Aug 2008 02:37:48 -0000 Original-Received: from unknown (HELO your-b27fb1c401) (ishamid@75.104.82.252) by -v with SMTP; 17 Aug 2008 02:37:48 -0000 In-Reply-To: <48A68762.8080809@wxs.nl> User-Agent: Opera Mail/9.50 (Win32) X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.9 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:43235 Archived-At: On Sat, 16 Aug 2008 01:53:06 -0600, Hans Hagen wrote: > Khaled Hosny wrote: >> Unicode has many "control characters" that only control text behaviour >> and shouldn't be rendered visually in the text, such as Bidi_Control and >> Join_Control chars (see >> http://www.unicode.org/Public/5.1.0/ucd/PropList.txt and >> http://unicode.org/Public/UNIDATA/UCD.html) >> Currently, ConTeXt handles ZWJ and ZWNJ, but other characters get >> rendered if the font has glyphs for them or make no effect at all if the >> font has no glyphs for them. I think that the optimum behaviour is to >> make those characters affect text formatting while not visually rendered >> whether the font has glyphs for them or not. >> It might be also useful if we can enable rendering those characters >> manually, for drafts and such. > > actually we need: > > - ignore them (like in verbatim) Eventually we want to be able to show them in verbatim also (provided the font has them). Indeed, I suggest that -- given an appropriate teletype font -- the default for _verbatim text_ should be to _show_ the control chars. > - act upon them > > and > > - show them (might somehow interfere with other things) Showing the control chars in typeset text -- non-verbatim -- should be rare; more appropriate for verbatim > - hide them I suggest that the default for _typeset text_ should definitely be to _hide_ the control chars. > if i'm right, when bidi is turned on, those chars get processed and then > discarded from the node list, so some more than zwj and zwnj is handled, It appears to me that zwj and zwnj etc. should be invisible in typeset-text output -- as explained above, but should still be encoded in the output pdf. Think pdf-text extraction, converting between Arabic and Farsi typesetting conventions, etc. > and of course others need to be handled as well Even lsep's and psep's should be present in the output pdf (eg \par => psep). Will make text extraction much more useful, etc. Best wishes Idris -- Professor Idris Samawi Hamid, Editor-in-Chief International Journal of Shi`i Studies Department of Philosophy Colorado State University Fort Collins, CO 80523 ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________