Gnus development mailing list
 help / color / mirror / Atom feed
* problem with utf-8 encoding in Gnus
@ 2002-11-25 12:27 Ted Zlatanov
  2002-11-25 12:38 ` Roman Belenov
  2002-11-25 15:41 ` Simon Josefsson
  0 siblings, 2 replies; 5+ messages in thread
From: Ted Zlatanov @ 2002-11-25 12:27 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

I got two e-mails, shown below.  One is in utf-8, the other in gb2312
encoding.  The utf-8 encoded e-mail contains Roman numerals that do
not show up in Gnus (I see a square box).  The gb2312 e-mail quotes
those numerals and they show up correctly in Gnus.

I use Emacs 21.2 with Mule, and the
-misc-fixed-bold-r-normal--14-130-75-75-c-70-iso10646-1 font.

The messages are also at

http://mail.nl.linux.org/linux-utf8/2002-11/msg00077.html
http://mail.nl.linux.org/linux-utf8/2002-11/msg00078.html

I didn't report this as a Gnus bug because it may be something else
I'm doing wrong.  Can anyone see the problem also?

Thanks
Ted


[-- Attachment #2.1: Type: text/plain, Size: 70 bytes --]

Subject: Topics

Topics:
   A Rendering Idea
   Re: A Rendering Idea


[-- Attachment #2.2: Type: text/plain, Size: 3490 bytes --]

Date:	Mon, 25 Nov 2002 11:22:57 +0900 (JST)
From:	Gaspar Sinai <gsinai@yudit.org>
To:	linux-utf8@nl.linux.org
Subject: A Rendering Idea
Message-ID: <Pine.LNX.4.44.0211251038280.8617-100000@suse.blue-edge-tech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi,
I have a new redering schema in mind, that I am going to
implement in a future version of Yudit.

Of course it would take less than the estimated 10 years
of development time if someone, or some group would be
excited about it and implement it in a library. This is
the reason why I am writing down this here.

More and more people are using Open Type fonts to render
Unicode text, I use it myself. Still, I think that Open Type
is not the answer to complex Unicode rendering.

The major problems with it are:
=E2=91=A0 Very difficult to test the rendering software. Features,
  need to be applied in a certain order. Even one mix-up will
  result in bad rendering, and a very complex test should be
  manually performed to catch the bug.
=E2=91=A1 OpenType tables can not be shared between two fonts of the
  same time, although similar positioning/substituting  needs
  to be performed. This makes the font file unnecessarily big.
=E2=91=A2 OpenType is not an Open Standard.
=E2=91=A3 Rendering is a non-reversible process

The idea is:
=E2=85=A0. Assign codes and hot spots for all possible Glyph componenents,
  per script, per language system.
=E2=85=A1. Create a generic state machine thet can step through the input
  unicode characters, and spit out Glyph components and their relative
 hot spot positions.
=E2=85=A2. Create states and a dynamically loadable state table per script
 per language system.
=E2=85=A3. Create bitmap and vector fonts. The glyph codepoints are
 defined in (=E2=85=A0.) so this will be an easy process. Much easier
 than creating OpenType tables.
=E2=85=A4. Create a generic inverse state machine. The input is
 components and their relative hot spot positions and the
 output is unicode stream.
=E2=85=A5. Create dynamically loadable inverse state tables per script
 per language system.
=E2=85=A6. Use (=E2=85=A1) and pass it to (=E2=85=A4) to see if we get back=
 out stream.
 We can test the rendering engine on-the fly this way.
=E2=85=A7. Use (=E2=85=A4.) for OCR (character recognition) software to scan
  text images into Unicode stream.

This is all - I am running out of Roman numerals =E2=98=BA

The merits of such a rendering/font schema would be:
- Fonts do not need to carry extra extra tables
- Rendering is linear and needs very littel processing power.
- It is testable
- It is bitmap-font-friendly
- Once the specs are done, font making is an easy process.

The drawback is:
- Need to fix the states for the script and laguage system.
  This needs to be done very carefully.

One thing is for sure: one man is not enough to implement all
this=E2=80=A6

Unfortunaltely I do not really have much time to discuss all
these steps and reasons now, so forgive me if I am not replying
my mails.

G=CC=B3=C3=A1=CC=B3s=CC=B3p=CC=B3=C3=A1=CC=B3r=CC=B3

=E3=82=AC=E3=83=BC=E3=82=B7=E3=83=A5=E3=83=91=E3=83=BC=E3=83=AB=E3=83=BB=D0=
=93=D0=B0=D1=88=D0=BF=D0=B0=D1=80=E3=83=BB=EA=B0=80=EC=8A=A4=ED=8C=94=E3=83=
=BB=CE=93=CE=B1=CF=83=CF=80=CE=B1=CF=81=E3=83=BB=D7=92=D7=90=D7=A9=D7=A4=D7=
=90=D7=A8
=D7=A2=D7=91=D7=A8=D7=99 10-2*5


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/



[-- Attachment #2.3: Type: text/plain, Size: 1954 bytes --]

Date:	Mon, 25 Nov 2002 11:35:47 +0100 (CET)
From:	Werner LEMBERG <wl@gnu.org>
To:	linux-utf8@nl.linux.org, gsinai@yudit.org
Subject: Re: A Rendering Idea
Message-Id: <20021125.113547.78472655.wl@gnu.org>
References: <Pine.LNX.4.44.0211251038280.8617-100000@suse.blue-edge-tech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=gb2312
Content-Transfer-Encoding: quoted-printable

> The idea is:
> =A2=F1. Assign codes and hot spots for all possible Glyph componenents,
>   per script, per language system.

How will you handle open-ended scripts like Urdu where the number of
ligatures is changing while the language evolves?  For example, I was
told by an Urdu computer scientist that during a visit of Margaret
Thatcher (a former Prime Minister of England) the newspapers created a
new ligature for her name.

> =A2=F2. Create a generic state machine thet can step through the input
>   unicode characters, and spit out Glyph components and their
>   relative hot spot positions.

This is far more complicated I fear.  You will need fallback
algorithms for fonts which don't provide some glyphs/ligatures, etc.
Some fonts have e.g. `Amacron' as a single glyph, others compose it
from `A' with a macron accent.

> =A2=F5. Create a generic inverse state machine. The input is
>  components and their relative hot spot positions and the
>  output is unicode stream.

You can do that already by following the Adobe Glyph List (AGL)
algorithm for naming glyphs.

> The merits of such a rendering/font schema would be:
> - It is bitmap-font-friendly

Hmm, the next release of X will probably contain all bitmapped fonts
in SFNT format.  It is straightforward then to provide proper OpenType
tables to do the same processing as with outline glyphs.  Just van
Rossum's freely available TTX compiler/decompiler for OpenType fonts
can help here.


    Werner
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-11-28 22:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-25 12:27 problem with utf-8 encoding in Gnus Ted Zlatanov
2002-11-25 12:38 ` Roman Belenov
2002-11-25 15:41 ` Simon Josefsson
2002-11-26 14:37   ` Ted Zlatanov
2002-11-28 22:21     ` Simon Josefsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).