problem with utf-8 encoding in Gnus

Gnus development mailing list
 help / color / mirror / Atom feed

* problem with utf-8 encoding in Gnus
@ 2002-11-25 12:27 Ted Zlatanov
  2002-11-25 12:38 ` Roman Belenov
  2002-11-25 15:41 ` Simon Josefsson
  0 siblings, 2 replies; 5+ messages in thread
From: Ted Zlatanov @ 2002-11-25 12:27 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

I got two e-mails, shown below.  One is in utf-8, the other in gb2312
encoding.  The utf-8 encoded e-mail contains Roman numerals that do
not show up in Gnus (I see a square box).  The gb2312 e-mail quotes
those numerals and they show up correctly in Gnus.

I use Emacs 21.2 with Mule, and the
-misc-fixed-bold-r-normal--14-130-75-75-c-70-iso10646-1 font.

The messages are also at

http://mail.nl.linux.org/linux-utf8/2002-11/msg00077.html
http://mail.nl.linux.org/linux-utf8/2002-11/msg00078.html

I didn't report this as a Gnus bug because it may be something else
I'm doing wrong.  Can anyone see the problem also?

Thanks
Ted


[-- Attachment #2.1: Type: text/plain, Size: 70 bytes --]

Subject: Topics

Topics:
   A Rendering Idea
   Re: A Rendering Idea


[-- Attachment #2.2: Type: text/plain, Size: 3490 bytes --]

Date:	Mon, 25 Nov 2002 11:22:57 +0900 (JST)
From:	Gaspar Sinai <gsinai@yudit.org>
To:	linux-utf8@nl.linux.org
Subject: A Rendering Idea
Message-ID: <Pine.LNX.4.44.0211251038280.8617-100000@suse.blue-edge-tech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi,
I have a new redering schema in mind, that I am going to
implement in a future version of Yudit.

Of course it would take less than the estimated 10 years
of development time if someone, or some group would be
excited about it and implement it in a library. This is
the reason why I am writing down this here.

More and more people are using Open Type fonts to render
Unicode text, I use it myself. Still, I think that Open Type
is not the answer to complex Unicode rendering.

The major problems with it are:
=E2=91=A0 Very difficult to test the rendering software. Features,
  need to be applied in a certain order. Even one mix-up will
  result in bad rendering, and a very complex test should be
  manually performed to catch the bug.
=E2=91=A1 OpenType tables can not be shared between two fonts of the
  same time, although similar positioning/substituting  needs
  to be performed. This makes the font file unnecessarily big.
=E2=91=A2 OpenType is not an Open Standard.
=E2=91=A3 Rendering is a non-reversible process

The idea is:
=E2=85=A0. Assign codes and hot spots for all possible Glyph componenents,
  per script, per language system.
=E2=85=A1. Create a generic state machine thet can step through the input
  unicode characters, and spit out Glyph components and their relative
 hot spot positions.
=E2=85=A2. Create states and a dynamically loadable state table per script
 per language system.
=E2=85=A3. Create bitmap and vector fonts. The glyph codepoints are
 defined in (=E2=85=A0.) so this will be an easy process. Much easier
 than creating OpenType tables.
=E2=85=A4. Create a generic inverse state machine. The input is
 components and their relative hot spot positions and the
 output is unicode stream.
=E2=85=A5. Create dynamically loadable inverse state tables per script
 per language system.
=E2=85=A6. Use (=E2=85=A1) and pass it to (=E2=85=A4) to see if we get back=
 out stream.
 We can test the rendering engine on-the fly this way.
=E2=85=A7. Use (=E2=85=A4.) for OCR (character recognition) software to scan
  text images into Unicode stream.

This is all - I am running out of Roman numerals =E2=98=BA

The merits of such a rendering/font schema would be:
- Fonts do not need to carry extra extra tables
- Rendering is linear and needs very littel processing power.
- It is testable
- It is bitmap-font-friendly
- Once the specs are done, font making is an easy process.

The drawback is:
- Need to fix the states for the script and laguage system.
  This needs to be done very carefully.

One thing is for sure: one man is not enough to implement all
this=E2=80=A6

Unfortunaltely I do not really have much time to discuss all
these steps and reasons now, so forgive me if I am not replying
my mails.

G=CC=B3=C3=A1=CC=B3s=CC=B3p=CC=B3=C3=A1=CC=B3r=CC=B3

=E3=82=AC=E3=83=BC=E3=82=B7=E3=83=A5=E3=83=91=E3=83=BC=E3=83=AB=E3=83=BB=D0=
=93=D0=B0=D1=88=D0=BF=D0=B0=D1=80=E3=83=BB=EA=B0=80=EC=8A=A4=ED=8C=94=E3=83=
=BB=CE=93=CE=B1=CF=83=CF=80=CE=B1=CF=81=E3=83=BB=D7=92=D7=90=D7=A9=D7=A4=D7=
=90=D7=A8
=D7=A2=D7=91=D7=A8=D7=99 10-2*5


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/



[-- Attachment #2.3: Type: text/plain, Size: 1954 bytes --]

Date:	Mon, 25 Nov 2002 11:35:47 +0100 (CET)
From:	Werner LEMBERG <wl@gnu.org>
To:	linux-utf8@nl.linux.org, gsinai@yudit.org
Subject: Re: A Rendering Idea
Message-Id: <20021125.113547.78472655.wl@gnu.org>
References: <Pine.LNX.4.44.0211251038280.8617-100000@suse.blue-edge-tech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=gb2312
Content-Transfer-Encoding: quoted-printable

> The idea is:
> =A2=F1. Assign codes and hot spots for all possible Glyph componenents,
>   per script, per language system.

How will you handle open-ended scripts like Urdu where the number of
ligatures is changing while the language evolves?  For example, I was
told by an Urdu computer scientist that during a visit of Margaret
Thatcher (a former Prime Minister of England) the newspapers created a
new ligature for her name.

> =A2=F2. Create a generic state machine thet can step through the input
>   unicode characters, and spit out Glyph components and their
>   relative hot spot positions.

This is far more complicated I fear.  You will need fallback
algorithms for fonts which don't provide some glyphs/ligatures, etc.
Some fonts have e.g. `Amacron' as a single glyph, others compose it
from `A' with a macron accent.

> =A2=F5. Create a generic inverse state machine. The input is
>  components and their relative hot spot positions and the
>  output is unicode stream.

You can do that already by following the Adobe Glyph List (AGL)
algorithm for naming glyphs.

> The merits of such a rendering/font schema would be:
> - It is bitmap-font-friendly

Hmm, the next release of X will probably contain all bitmapped fonts
in SFNT format.  It is straightforward then to provide proper OpenType
tables to do the same processing as with outline glyphs.  Just van
Rossum's freely available TTX compiler/decompiler for OpenType fonts
can help here.


    Werner
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problem with utf-8 encoding in Gnus
  2002-11-25 12:27 problem with utf-8 encoding in Gnus Ted Zlatanov
@ 2002-11-25 12:38 ` Roman Belenov
  2002-11-25 15:41 ` Simon Josefsson
  1 sibling, 0 replies; 5+ messages in thread
From: Roman Belenov @ 2002-11-25 12:38 UTC (permalink / raw)


Same behaviour here (Emacs 21.2/Windows XP + ognus 0.07)

Ted Zlatanov <tzz@lifelogs.com> writes:

> I got two e-mails, shown below.  One is in utf-8, the other in gb2312
> encoding.  The utf-8 encoded e-mail contains Roman numerals that do
> not show up in Gnus (I see a square box).  The gb2312 e-mail quotes
> those numerals and they show up correctly in Gnus.
>
> I use Emacs 21.2 with Mule, and the
> -misc-fixed-bold-r-normal--14-130-75-75-c-70-iso10646-1 font.
>
> The messages are also at
>
> http://mail.nl.linux.org/linux-utf8/2002-11/msg00077.html
> http://mail.nl.linux.org/linux-utf8/2002-11/msg00078.html
>
> I didn't report this as a Gnus bug because it may be something else
> I'm doing wrong.  Can anyone see the problem also?

(attached messages skipped)

-- 
 							With regards, Roman.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problem with utf-8 encoding in Gnus
  2002-11-25 12:27 problem with utf-8 encoding in Gnus Ted Zlatanov
  2002-11-25 12:38 ` Roman Belenov
@ 2002-11-25 15:41 ` Simon Josefsson
  2002-11-26 14:37   ` Ted Zlatanov
  1 sibling, 1 reply; 5+ messages in thread
From: Simon Josefsson @ 2002-11-25 15:41 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> I got two e-mails, shown below.  One is in utf-8, the other in gb2312
> encoding.  The utf-8 encoded e-mail contains Roman numerals that do
> not show up in Gnus (I see a square box).  The gb2312 e-mail quotes
> those numerals and they show up correctly in Gnus.
>
> I use Emacs 21.2 with Mule, and the
> -misc-fixed-bold-r-normal--14-130-75-75-c-70-iso10646-1 font.
>
> The messages are also at
>
> http://mail.nl.linux.org/linux-utf8/2002-11/msg00077.html
> http://mail.nl.linux.org/linux-utf8/2002-11/msg00078.html
>
> I didn't report this as a Gnus bug because it may be something else
> I'm doing wrong.  Can anyone see the problem also?

Yes.  Your fontset doesn't have the characters.  The following fontset
works here:

-Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problem with utf-8 encoding in Gnus
  2002-11-25 15:41 ` Simon Josefsson
@ 2002-11-26 14:37   ` Ted Zlatanov
  2002-11-28 22:21     ` Simon Josefsson
  0 siblings, 1 reply; 5+ messages in thread
From: Ted Zlatanov @ 2002-11-26 14:37 UTC (permalink / raw)

On Mon, 25 Nov 2002, jas@extundo.com wrote:
> Yes.  Your fontset doesn't have the characters.  The following
> fontset works here:
> 
> -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1

I supposed (wrongly) that the bold variant would have the same
characters as the medium one.  I'm using the ucs-fonts package, and
the bold variant (7x14B.bdf) is TARGET1, which is the minimal UCS
range in the ucs-fonts package.  I guess I'll switch to a TARGET3 font
like the one you mention or 7x14.bdf when I need to view more of the
UCS range.  It seems like all the bold fonts in the ucs-fonts package
are TARGET1.

Unfortunately, I don't like the look of the normal variants at all
(they are too thin) so I'll have to keep switching back and forth.
I'll ask the maintainer of ucs-fonts about the plans to bring bold
variants up to TARGET3.

I wonder if Gnus could intelligently switch between a regular and a
Unicode font as needed.  I know I can do a manual switch with
shift-Mouse1 as an Emacs function and that's fine, but perhaps Gnus
could be smart about showing messages with characters unavailable in
the current fontset.

Thanks
Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problem with utf-8 encoding in Gnus
  2002-11-26 14:37   ` Ted Zlatanov
@ 2002-11-28 22:21     ` Simon Josefsson
  0 siblings, 0 replies; 5+ messages in thread
From: Simon Josefsson @ 2002-11-28 22:21 UTC (permalink / raw)
  Cc: ding

This came up on the Gnus development list, but I believe it is a
generic Emacs problem, so please move the discussion to
emacs-devel@gnu.org.  The problem is that some fontsets doesn't
contain all characters, and that users then will have to change
fontset. Generally, I think users want the application (Emacs) to find
a better fontset that actually has the character instead, since it
seems no single fontset is applicable in all situations.

Any chance of getting this supported seamlessly in Emacs?  Does the
Unicode Emacs branch address this problem in any way?

Ted Zlatanov <tzz@lifelogs.com> writes:

> On Mon, 25 Nov 2002, jas@extundo.com wrote:
>> Yes.  Your fontset doesn't have the characters.  The following
>> fontset works here:
>> 
>> -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1
>
> I supposed (wrongly) that the bold variant would have the same
> characters as the medium one.  I'm using the ucs-fonts package, and
> the bold variant (7x14B.bdf) is TARGET1, which is the minimal UCS
> range in the ucs-fonts package.  I guess I'll switch to a TARGET3 font
> like the one you mention or 7x14.bdf when I need to view more of the
> UCS range.  It seems like all the bold fonts in the ucs-fonts package
> are TARGET1.
>
> Unfortunately, I don't like the look of the normal variants at all
> (they are too thin) so I'll have to keep switching back and forth.
> I'll ask the maintainer of ucs-fonts about the plans to bring bold
> variants up to TARGET3.
>
> I wonder if Gnus could intelligently switch between a regular and a
> Unicode font as needed.  I know I can do a manual switch with
> shift-Mouse1 as an Emacs function and that's fine, but perhaps Gnus
> could be smart about showing messages with characters unavailable in
> the current fontset.
>
> Thanks
> Ted




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-11-28 22:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-25 12:27 problem with utf-8 encoding in Gnus Ted Zlatanov
2002-11-25 12:38 ` Roman Belenov
2002-11-25 15:41 ` Simon Josefsson
2002-11-26 14:37   ` Ted Zlatanov
2002-11-28 22:21     ` Simon Josefsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).