More charset things

Gnus development mailing list
 help / color / mirror / Atom feed

* More charset things
@ 1999-02-03 18:09 Lars Magne Ingebrigtsen
  1999-02-04 14:56 ` Hrvoje Niksic
  0 siblings, 1 reply; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-03 18:09 UTC (permalink / raw)


I've gone through the HELLO files under XEmacs and Emacs, and I'm now
able to post everything there (except Lao).  I had to rewrite some
bits to be able to deal with things that use different MULE charsets,
but the same MIME charset.  The solution was just to do away with most 
MULE charset thingies, and just do MIME charset thingies instead.

Anyway -- body encodings.  The reason I'm not able to post Lao is that
some of the octets in the Lao stream seems to make Emacs and/or the
nntp server choke.  I haven't really done any body encoding things --
if it's text, Gnus posts using 8bit or 7bit.  But there should be a
way to say what MIME charsets should be encoded what way
-- 7bit, 8bit, base64 and qp.  There is a
`rfc2047-charset-encoding-alist', but that says how do encode things
in the headers.  Should I just add an `mm-charset-encoding-alist' for
the bodies?

Yes.  Fix in Pterodactyl Gnus v0.76.

And with that, I hereby declare the charset bits of MIME to be
implemented by Gnus.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-03 18:09 More charset things Lars Magne Ingebrigtsen
@ 1999-02-04 14:56 ` Hrvoje Niksic
  1999-02-04 17:08   ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 43+ messages in thread
From: Hrvoje Niksic @ 1999-02-04 14:56 UTC (permalink / raw)

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> And with that, I hereby declare the charset bits of MIME to be
> implemented by Gnus.

Uh-oh.  How can we possibly be compliant when there is no support for
UTF-8?

Also, Gnus still happily sends out 8bit stuff in email headers, losing 
all charset information, even when it receives it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-04 14:56 ` Hrvoje Niksic
@ 1999-02-04 17:08   ` Lars Magne Ingebrigtsen
  1999-02-04 17:21     ` Hrvoje Niksic
  1999-02-07 19:35     ` François Pinard
  0 siblings, 2 replies; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-04 17:08 UTC (permalink / raw)

Hrvoje Niksic <hniksic@srce.hr> writes:

> Uh-oh.  How can we possibly be compliant when there is no support for
> UTF-8?

That's not my table.  :-)  When MULE supports utf-8, Gnus will support
utf-8.  

> Also, Gnus still happily sends out 8bit stuff in email headers, losing 
> all charset information, even when it receives it.

Aarh, yes, I had forgotten that I was going to go over the charset
things in non-MULE XEmacsen.

(By the way -- is it "MULE" or "Mule?  I'm waffling all over the place
when I write that word.  Perhaps I should start writing it "mUlE"?)

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-04 17:08   ` Lars Magne Ingebrigtsen
@ 1999-02-04 17:21     ` Hrvoje Niksic
  1999-02-04 17:49       ` Lars Magne Ingebrigtsen
  1999-02-07 19:37       ` François Pinard
  1999-02-07 19:35     ` François Pinard
  1 sibling, 2 replies; 43+ messages in thread
From: Hrvoje Niksic @ 1999-02-04 17:21 UTC (permalink / raw)

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Hrvoje Niksic <hniksic@srce.hr> writes:
> 
> > Uh-oh.  How can we possibly be compliant when there is no support for
> > UTF-8?
> 
> That's not my table.  :-) When MULE supports utf-8, Gnus will
> support utf-8.

That is not a nice way of thinking.  MULE is little else than a
Japanese version of Emacs, and it appears that the Japanese are not
interested in Unicode.  So it wasn't implemented.  I'm not sure about
FSF, but for XEmacs, I know of no plans to implement it in the near
future.

> (By the way -- is it "MULE" or "Mule?

The original thing was called MULE.  The XEmacs developers prefer to
call the XEmacs variant `Mule', or `XEmacs/Mule'.  FSF Emacs
maintainers seem to prefer Emacs/MULE.

Unlike the XEmacs/Xemacs issue, noone seems to mind the different
spellings.  So I guess mULE would be just fine.  :-)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-04 17:21     ` Hrvoje Niksic
@ 1999-02-04 17:49       ` Lars Magne Ingebrigtsen
  1999-02-05  0:47         ` Stephen J. Turnbull
  1999-02-07 20:43         ` François Pinard
  1999-02-07 19:37       ` François Pinard
  1 sibling, 2 replies; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-04 17:49 UTC (permalink / raw)
  Cc: xemacs-mule

Hrvoje Niksic <hniksic@srce.hr> writes:

> > That's not my table.  :-) When MULE supports utf-8, Gnus will
> > support utf-8.
> 
> That is not a nice way of thinking.

I don't see any other way of thinking.  Grokking utf-8 is way outside
the scope of Gnus -- it has to be an Emacs thing.

> MULE is little else than a Japanese version of Emacs, and it appears
> that the Japanese are not interested in Unicode.  So it wasn't
> implemented.  I'm not sure about FSF, but for XEmacs, I know of no
> plans to implement it in the near future.

A partial implementation of utf-mumble was posted recently somewhere
by someone.  (Could I possible get any more vague?)  So I'm Cc'ing
this to the xemacs-mule list.

Anyway, I find that I'm strangely fascinated by the idea of an editor
that allows intermingling of text that uses a variety of character
sets.  I have an urge to jump into the matter, but I'm such a charset
novice that I don't really feel qualified.  (Well, I don't have the
time, either, but that's a minor detail.)

I asked before for a likely book that would introduce me to the basic
concepts, and someone (Stephen Turnbull?) told me, but then I forgot.
(At least, I can't find any books on charset issues in my list of
books to buy.)  Could that someone (or someone else) re-recommend the
book(s) that I should buy to get both an introduction and more
in-depth knowledge about charset issues?

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-04 17:49       ` Lars Magne Ingebrigtsen
@ 1999-02-05  0:47         ` Stephen J. Turnbull
  1999-02-05  2:43           ` Hrvoje Niksic
                             ` (5 more replies)
  1999-02-07 20:43         ` François Pinard
  1 sibling, 6 replies; 43+ messages in thread
From: Stephen J. Turnbull @ 1999-02-05  0:47 UTC (permalink / raw)

>>>>> "Lars" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
    Lars> Hrvoje Niksic <hniksic@srce.hr> writes:

    >> MULE is little else than a Japanese version of Emacs, and it
    >> appears that the Japanese are not interested in Unicode.  So it

The MULE development group is nearly entirely Japanese; including the
people implementing Devanagari (for sure) and Arabic and Ethiopic
(IIRC).  Not surprisingly, the tuning (and tuning is absolutely
necessary; the linguists don't know enough about language for charset
guessing and the like to be more than heuristic) is best for Japanese,
and bugs for non-Japanese languages don't get found and fixed quickly.

But MULE is the only truly multilingual platform there is at the
moment, to the best of my knowledge; Unicode doesn't satisfy the needs 
of lots of people, and is not easily extensible without changing the
standard.  MULE is.  MULE is more than a Japanese version of Emacs.

The Japanese are divided on Unicode; some are vehemently opposed,
others are interested.  There don't seem to be any strong advocates,
though.

    >> wasn't implemented.  I'm not sure about FSF, but for XEmacs, I
    >> know of no plans to implement it in the near future.

    Lars> A partial implementation of utf-mumble was posted recently
    Lars> somewhere by someone.  (Could I possible get any more
    Lars> vague?)  So I'm Cc'ing this to the xemacs-mule list.

Morioka-san ported (IIRC) a Lisp-level implementation of UTF-8.  The
attachments were broken on the ML (so Steve never was able to look at
it), I'll restore from archive the working (I hope) copy I got from
Morioka.  Martin Buchholz believes that since the tables are in Lisp,
the performance impact will be huge.

    Lars> I asked before for a likely book that would introduce me to
    Lars> the basic concepts, and someone (Stephen Turnbull?) told me,
    Lars> but then I forgot.

Prices are vague recollections, in decreasing order of importance for
basic understanding:

Ken Lunde.  Chinese, Japanese, Korean and Vietnamese Information
    Processing.  O'Reilly Associates.  Probably the most useful single 
    volume, although it doesn't cover single-octet encodings.
ISO.  ISO-2022:  Extension Techniques for Coded Character Sets.  US$75.
Unicode Consortium.  The Unicode Standard, v2.x.  About US$70 from Amazon.
ISO.  ISO-10646:  Universal Multi-octet Character Set Encoding
    Standard.  About US$125.  Don't bother unless you've got extra
    money, Unicode Standard is much more complete and readable.  All
    ISO-10646 has extra is 4-octet encoding, which is presently
    useless, and it is very likely that any UTF-8 .

I don't know of any textbooks on character set stuff, there must be
some somewhere.  Lunde's book will have a very extensive bibliography.

-- 
University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
__________________________________________________________________________
__________________________________________________________________________
What are those two straight lines for?  "Free software rules."

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-05  0:47         ` Stephen J. Turnbull
@ 1999-02-05  2:43           ` Hrvoje Niksic
       [not found]           ` <m3hft163aa.fsf@peorth.gweep.net>
                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 43+ messages in thread
From: Hrvoje Niksic @ 1999-02-05  2:43 UTC (permalink / raw)

"Stephen J. Turnbull" <turnbull@sk.tsukuba.ac.jp> writes:

> The MULE development group is nearly entirely Japanese; including
> the people implementing Devanagari (for sure) and Arabic and
> Ethiopic (IIRC).  Not surprisingly, the tuning (and tuning is
> absolutely necessary; the linguists don't know enough about language
> for charset guessing and the like to be more than heuristic) is best
> for Japanese, and bugs for non-Japanese languages don't get found
> and fixed quickly.

They don't get fixed at all, Stephen.  I don't like to bitch all that
much about the subject, since I could come out with the patches as
well as anybody else (but my disgust at the code is another matter),
only I *have* to correct you when you say that bugs don't get fixed
"quickly".

I have reported a number of latin2-related bugs in XEmacs/Mule, and I
haven't seen a fix for any of them.  I, a latin2 user, am supposed to
be a target audience for Mule, and yet I cannot bring myself to use it
for longer than ten minutes.

If Mule is usable for anyone except the latin1 people and the Japanese
(== majority), I'm happy for them.  But it's not my cup of coffee.
Not yet.

> But MULE is the only truly multilingual platform there is at the
> moment, to the best of my knowledge; Unicode doesn't satisfy the
> needs of lots of people, and is not easily extensible without
> changing the standard.  MULE is.  MULE is more than a Japanese
> version of Emacs.

:-(

>     Lars> A partial implementation of utf-mumble was posted recently
>     Lars> somewhere by someone.  (Could I possible get any more
>     Lars> vague?)  So I'm Cc'ing this to the xemacs-mule list.
> 
> Morioka-san ported (IIRC) a Lisp-level implementation of UTF-8.

Can such a thing even work under XEmacs/Mule?  The design differences
sound as if they make such a thing impossible.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
       [not found]           ` <m3hft163aa.fsf@peorth.gweep.net>
@ 1999-02-05 19:06             ` Vladimir Volovich
       [not found]               ` <m3sockqqjx.fsf@peorth.gweep.net>
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Volovich @ 1999-02-05 19:06 UTC (permalink / raw)


"Rat" == Stainless Steel Rat writes:

 Rat> MULE does not work at all well in Europe or other parts of the
 Rat> world that use ISO-8859-X 8-bit character sets.

well, in emacs 20.3, mule works quite satisfactory for cyrillic
encodings (including but not limiting to iso-8859-5).

	Best regards, -- Vladimir.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-05  0:47         ` Stephen J. Turnbull
  1999-02-05  2:43           ` Hrvoje Niksic
       [not found]           ` <m3hft163aa.fsf@peorth.gweep.net>
@ 1999-02-06  8:17           ` Lars Magne Ingebrigtsen
  1999-02-09 10:27           ` Displayed [ 0: Stephen J. Turnbull ] but it had lots of lines Alf-Ivar Holm
                             ` (2 subsequent siblings)
  5 siblings, 0 replies; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-06  8:17 UTC (permalink / raw)


"Stephen J. Turnbull" <turnbull@sk.tsukuba.ac.jp> writes:

> Prices are vague recollections, in decreasing order of importance for
> basic understanding:

Thanks; I've ordered ISO-2022, and I'm ordering the Lunde and the
Unicode Standard on Mondey.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
       [not found]               ` <m3sockqqjx.fsf@peorth.gweep.net>
@ 1999-02-06 15:55                 ` Lars Magne Ingebrigtsen
       [not found]                   ` <m3lnia5922.fsf@peorth.gweep.net>
  1999-02-08 16:04                   ` Bill White
  0 siblings, 2 replies; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-06 15:55 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 180 bytes --]

Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

> Try mixing ISO-8859-5 with ISO-8859-[1-4] sometime and you will see just
> how badly broken MULE really is.

Lét's see... 

[-- Attachment #2: Type: text/plain, Size: 29 bytes --]

Здравствуйте!  And some more 

[-- Attachment #3: Type: text/plain, Size: 132 bytes --]

Latïn-1.  Looks OK to me... 

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-04 17:08   ` Lars Magne Ingebrigtsen
  1999-02-04 17:21     ` Hrvoje Niksic
@ 1999-02-07 19:35     ` François Pinard
  1999-02-08 13:37       ` Simon Josefsson
  1 sibling, 1 reply; 43+ messages in thread
From: François Pinard @ 1999-02-07 19:35 UTC (permalink / raw)
  Cc: handa

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> (By the way -- is it "MULE" or "Mule?  I'm waffling all over the place
> when I write that word.  Perhaps I should start writing it "mUlE"?)

I documented this somewhere.  Let me see...  OK:

  The spelling @code{Mule} originally stands for @cite{@emph{mul}tilingual
  @emph{e}nhancement to GNU Emacs}, it is the result of a collective
  effort orchestrated by Handa Ken'ishi since 1993.  When @code{Mule} got
  rewritten in the main development stream of GNU Emacs 20, the FSF renamed
  it @code{MULE}, meaning @cite{@emph{mul}tilingual @emph{e}nvironment
  in GNU Emacs}.

I guess that the FSF wanted to more clearly establish who is the boss, by
renaming the thing and changing the capitalization.  By reaction, maybe,
I try to consistently write "Mule", as a tribute to the original effort.

-- 
François Pinard                            mailto:pinard@iro.umontreal.ca
Join the free Translation Project!    http://www.iro.umontreal.ca/~pinard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-04 17:21     ` Hrvoje Niksic
  1999-02-04 17:49       ` Lars Magne Ingebrigtsen
@ 1999-02-07 19:37       ` François Pinard
  1999-02-08  0:06         ` Kenichi Handa
  1 sibling, 1 reply; 43+ messages in thread
From: François Pinard @ 1999-02-07 19:37 UTC (permalink / raw)
  Cc: ding, handa

Hrvoje Niksic <hniksic@srce.hr> writes:

> That is not a nice way of thinking.  MULE is little else than a Japanese
> version of Emacs, and it appears that the Japanese are not interested
> in Unicode.  So it wasn't implemented.  I'm not sure about FSF, but for
> XEmacs, I know of no plans to implement it in the near future.

Handa-san is planning to implement Unicode support in Mule, and I presume
UTF-8 will come along with it.

-- 
François Pinard                            mailto:pinard@iro.umontreal.ca
Join the free Translation Project!    http://www.iro.umontreal.ca/~pinard



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-04 17:49       ` Lars Magne Ingebrigtsen
  1999-02-05  0:47         ` Stephen J. Turnbull
@ 1999-02-07 20:43         ` François Pinard
  1999-02-08  2:09           ` Martin Buchholz
                             ` (3 more replies)
  1 sibling, 4 replies; 43+ messages in thread
From: François Pinard @ 1999-02-07 20:43 UTC (permalink / raw)
  Cc: xemacs-mule

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> > > That's not my table.  :-) When MULE supports utf-8, Gnus will
> > > support utf-8.

> > That is not a nice way of thinking.

> I don't see any other way of thinking.  Grokking utf-8 is way outside
> the scope of Gnus -- it has to be an Emacs thing.

In a way, UTF-8 or Base64 are coding schemes.  I see no strong reason for
Gnus to be favourable to one without being to the other, except maybe that
Base64 is usable in CTE, while UTF-8 is probably not going to be.

UTF-8 is really simple, by comparison with other things in the field of
charsets, and much more simple that what Gnus already does about the whole
thing.  Lars, I can send you documentation and C code, if you feel like it.

> Could that someone (or someone else) re-recommend the book(s) that I
> should buy to get both an introduction and more in-depth knowledge about
> charset issues?

*The* reference, which I never seen (my librarian says the editor is out
of stock), is supposed to be the Ken Lunde book, in the ORA series.

  From: Brendan_Murray/DUB/Lotus@lotus.com
  Subject: Re: unicode <-> hex converter (fwd)
  To: pinard@IRO.UMontreal.CA
  Date: 1997-04-11 09:38:40 +01:00

  For information on Asian character sets, try picking up a copy of Ken
  Lunde's text for his next book. It should be on
  ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf. His first book,
  "Understanding Japanese Information Processing" is so good that it has been
  translated to Japanese, and is used over there by many developers (one of
  the guys in our Tokyo office thought Ken Lunde was Japanese - that's how
  good it is!) - if you're doing anything with the Japanese encoding systems,
  I heartily recommend this.

  By the way, you'll find code snippets sprinkled around that part of the FTP
  site, with different encoding transformations.

  Brendan

-- 
François Pinard                            mailto:pinard@iro.umontreal.ca
Join the free Translation Project!    http://www.iro.umontreal.ca/~pinard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
       [not found]                   ` <m3lnia5922.fsf@peorth.gweep.net>
@ 1999-02-07 21:02                     ` Hrvoje Niksic
  1999-02-09 15:56                       ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 43+ messages in thread
From: Hrvoje Niksic @ 1999-02-07 21:02 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 886 bytes --]

Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

> "Lars" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
> 
> Lars> Lét's see... ·ÔàÐÒáâÒãÙâÕ!  And some more Latïn-1.  Looks OK to me... 
> 
> Then you lucked out for some reason.  Many others (here, notably
> Hrvoje), have had numerous problems with it.

Mixing charset works for me in a Mule buffer, but there are
environmental brain-damages that appear to be incurable for Mule.  For 
instance, it insists that the default 128-255 chars are iso-8859-1,
which is a hard-coded arbitrary value with no hope of ever changing it 
to iso-8859-2.

Non-Mule XEmacs can be set up to work with latin2 just fine -- you
simply point it to latin2 fonts, and it works out of the box.  This
strategy works on TTY's too (which is another problem with
XEmacs/Mule).

Thus for me, Mule is useless.  So much for the "internationalization".

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-07 19:37       ` François Pinard
@ 1999-02-08  0:06         ` Kenichi Handa
  0 siblings, 0 replies; 43+ messages in thread
From: Kenichi Handa @ 1999-02-08  0:06 UTC (permalink / raw)
  Cc: hniksic, ding

=?ISO-8859-1?Q?Fran=E7ois_Pinard?= <pinard@iro.umontreal.ca> writes:
> Hrvoje Niksic <hniksic@srce.hr> writes:
>> That is not a nice way of thinking.  MULE is little else than a Japanese
>> version of Emacs, and it appears that the Japanese are not interested
>> in Unicode.  So it wasn't implemented.  I'm not sure about FSF, but for
>> XEmacs, I know of no plans to implement it in the near future.

> Handa-san is planning to implement Unicode support in Mule, and I presume
> UTF-8 will come along with it.

I myself have not yet started to work on Unicode support.  But, I
heard that mleisher@crl.nmsu.edu had started the work.

---
Ken'ichi HANDA
handa@etl.go.jp


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-07 20:43         ` François Pinard
@ 1999-02-08  2:09           ` Martin Buchholz
  1999-02-22 15:52             ` François Pinard
  1999-02-08 14:49           ` Robert Bihlmeyer
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Martin Buchholz @ 1999-02-08  2:09 UTC (permalink / raw)
  Cc: ding, xemacs-mule

>>>>> "F" == ISO-8859-1  <ISO-8859-1> writes:

F> Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

F> *The* reference, which I never seen (my librarian says the editor is out
F> of stock), is supposed to be the Ken Lunde book, in the ORA series.

F>   From: Brendan_Murray/DUB/Lotus@lotus.com
F>   Subject: Re: unicode <-> hex converter (fwd)
F>   To: pinard@IRO.UMontreal.CA
F>   Date: 1997-04-11 09:38:40 +01:00

F>   For information on Asian character sets, try picking up a copy of Ken
F>   Lunde's text for his next book. It should be on
F>   ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf. His first book,

Where've you been?  The second edition is finally out.  CJKV!

F>   "Understanding Japanese Information Processing" is so good that it has been
F>   translated to Japanese, and is used over there by many developers (one of
F>   the guys in our Tokyo office thought Ken Lunde was Japanese - that's how
F>   good it is!) - if you're doing anything with the Japanese encoding systems,
F>   I heartily recommend this.

F>   By the way, you'll find code snippets sprinkled around that part of the FTP
F>   site, with different encoding transformations.

Martin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
       [not found]           ` <m37lttydo2.fsf@peorth.gweep.net>
@ 1999-02-08  9:55             ` Kai.Grossjohann
  1999-02-08 15:52             ` François Pinard
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Kai.Grossjohann @ 1999-02-08  9:55 UTC (permalink / raw)


Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

  > And wow! I just noticed how badly Supercite failed to deal with
  > your mailbox.  Probably because you have 8-bit data in a field
  > that specifically calls for ASCII and only ASCII.  This one isn't
  > a MULE bug, because there is no MULE in my XEmacs.

None of these characters are non-ASCII:

,-----
| =?ISO-8859-1?Q?Fran=E7ois_Pinard?= <pinard@iro.umontreal.ca>
`-----

Maybe SC schould be updated to grok this encoding?

kai
-- 
I like _\bb_\bo_\bt_\bh kinds of music.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-07 19:35     ` François Pinard
@ 1999-02-08 13:37       ` Simon Josefsson
  1999-02-08 23:43         ` Kenichi Handa
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Josefsson @ 1999-02-08 13:37 UTC (permalink / raw)
  Cc: ding, handa

François Pinard <pinard@iro.umontreal.ca> writes:

> > (By the way -- is it "MULE" or "Mule?  I'm waffling all over the place
> > when I write that word.  Perhaps I should start writing it "mUlE"?)
> 
> I documented this somewhere.  Let me see...  OK:
> 
>   The spelling @code{Mule} originally stands for @cite{@emph{mul}tilingual
>   @emph{e}nhancement to GNU Emacs}, it is the result of a collective
>   effort orchestrated by Handa Ken'ishi since 1993.  When @code{Mule} got
>   rewritten in the main development stream of GNU Emacs 20, the FSF renamed
>   it @code{MULE}, meaning @cite{@emph{mul}tilingual @emph{e}nvironment
>   in GNU Emacs}.

Emacs seem a little bit confused about this itself, the menu bar
option is called "Mule" and in it there is "Show all of MULE status".

:-)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-07 20:43         ` François Pinard
  1999-02-08  2:09           ` Martin Buchholz
@ 1999-02-08 14:49           ` Robert Bihlmeyer
       [not found]           ` <m37lttydo2.fsf@peorth.gweep.net>
  1999-02-11 10:09           ` Jan Vroonhof
  3 siblings, 0 replies; 43+ messages in thread
From: Robert Bihlmeyer @ 1999-02-08 14:49 UTC (permalink / raw)


Hi,

>>>>> On 07 Feb 1999 15:43:18 -0500
>>>>> François Pinard <pinard@iro.umontreal.ca> said:
      ^^^^^^^^ works here

 FP> In a way, UTF-8 or Base64 are coding schemes. I see no strong
 FP> reason for Gnus to be favourable to one without being to the
 FP> other, except maybe that Base64 is usable in CTE, while UTF-8 is
 FP> probably not going to be.

UTF-7 is used in CTE today.

	Robbe

-- 
Robert Bihlmeyer	reads: Deutsch, English, MIME, Latin-1, NO SPAM!
<robbe@orcus.priv.at>	<http://stud2.tuwien.ac.at/~e9426626/sig.html>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
       [not found]           ` <m37lttydo2.fsf@peorth.gweep.net>
  1999-02-08  9:55             ` Kai.Grossjohann
@ 1999-02-08 15:52             ` François Pinard
       [not found]               ` <m3n22ou09w.fsf@peorth.gweep.net>
                                 ` (2 more replies)
  1999-02-08 17:29             ` Karl Eichwalder
  1999-02-08 22:03             ` James H. Cloos Jr.
  3 siblings, 3 replies; 43+ messages in thread
From: François Pinard @ 1999-02-08 15:52 UTC (permalink / raw)

Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

> base64 is an encoding scheme (comparable to uuencode).  UTF-8 is a
> character set (comparable to ISO-8859-1).  They have nothing in common,
> at least not the way you are thinking of it.

UTF-8 is an encoding scheme, comparable to uuencode.

But it is currently used to encode one and only character set, the UCS
(described in Unicode manuals and within ISO 10646).  But theoretically,
it could well be used to encode other things.

Because the UTF-8 encoding scheme is used for only one charset, it is common
to consider that it is a charset itself, but this is a conceptual abuse.
I have nothing against relying on this abuse, which is quite handy, as
long as we do not loose sight of the real thing.  UTF-8 is not a charset,
in the deep nature of things. :-)

That is why Lars could well decide, one of these days, to support UTF-8 as
an encoding (which it really is) on the same level as Base64, and moreover,
rather fun to implement.  It might be convenient that Gnus do so as a
contribution to the Unicode effort, without really waiting for Emacs to
do it.  The sad aspect of things is that, for orthogonality reasons, Gnus
should then support UTF-7 as well, and this one, being sensibly uglier
internally, is not as much fun.

-- 
François Pinard                            mailto:pinard@iro.umontreal.ca
Join the free Translation Project!    http://www.iro.umontreal.ca/~pinard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-06 15:55                 ` Lars Magne Ingebrigtsen
       [not found]                   ` <m3lnia5922.fsf@peorth.gweep.net>
@ 1999-02-08 16:04                   ` Bill White
  1999-02-09 16:04                     ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 43+ messages in thread
From: Bill White @ 1999-02-08 16:04 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

Lars - your Russian text shows up only when I switch fonts to
"standard: 16-dot medium" via the Mule:Set Font/Fontset menu.
Otherwise it's empty boxes. I use

 (set-default-font "-b&h-lucidatypewriter-medium-*-*-*-12-120-*-*-*-*-*-*")

in my .emacs. What font are you using that lets the Russian characters
show up?

bw

In message <m3r9s3edbp.fsf@quimbies.gnus.org>,
Lars Magne Ingebrigtsen <larsi@gnus.org> wrote:

> Stainless Steel Rat <ratinox@peorth.gweep.net> writes:
> 
> > Try mixing ISO-8859-5 with ISO-8859-[1-4] sometime and you will see just
> > how badly broken MULE really is.
> 
> Lét's see... 

[-- Attachment #2: Type: text/plain, Size: 29 bytes --]

Здравствуйте!  And some more 

[-- Attachment #3: Type: text/plain, Size: 208 bytes --]

Latïn-1.  Looks OK to me... 
> 
> -- 
> (domestic pets only, the antidote for overdose, milk.)
>   larsi@gnus.org * Lars Magne Ingebrigtsen

-- 
Bill White . billw@wolfram.com . http://www.wolfram.com/~billw

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
       [not found]           ` <m37lttydo2.fsf@peorth.gweep.net>
  1999-02-08  9:55             ` Kai.Grossjohann
  1999-02-08 15:52             ` François Pinard
@ 1999-02-08 17:29             ` Karl Eichwalder
  1999-02-08 22:03             ` James H. Cloos Jr.
  3 siblings, 0 replies; 43+ messages in thread
From: Karl Eichwalder @ 1999-02-08 17:29 UTC (permalink / raw)


Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

|   "oP" == ois Pinard <Fran> writes:

|   Probably because you have 8-bit data in a field that specifically 
|   calls for ASCII and only ASCII.

Try to view the raw From line -- it looks good to me (and message knows
to handle it).

-- 
Karl Eichwalder


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
       [not found]           ` <m37lttydo2.fsf@peorth.gweep.net>
                               ` (2 preceding siblings ...)
  1999-02-08 17:29             ` Karl Eichwalder
@ 1999-02-08 22:03             ` James H. Cloos Jr.
  1999-02-09  5:29               ` Russ Allbery
  3 siblings, 1 reply; 43+ messages in thread
From: James H. Cloos Jr. @ 1999-02-08 22:03 UTC (permalink / raw)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>>>> "SSR" == Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

SSR> And wow! I just noticed how badly Supercite failed to deal with
SSR> your mailbox.  Probably because you have 8-bit data in a field
SSR> that specifically calls for ASCII and only ASCII.  This one isn't
SSR> a MULE bug, because there is no MULE in my XEmacs.

Odd.  Works for me in GNU Emacs 20.3.1, with supercite.el revision: 3.54.
(Which says it was last modified 1993/09/22 18:58:46, FWIW.)

- -JimC
- -- 
James H. Cloos, Jr.  <http://www.jhcloos.com/cloos/public_key> 1024D/ED7DAEA6 
<cloos@jhcloos.com>     E9E9 F828 61A4 6EA9 0F2B  63E7 997A 9F17 ED7D AEA6
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v0.9.2 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE2v19CmXqfF+19rqYRAgyUAJ9vaBAXMvUmSQojOY2Mag8dZ+e+FwCfYUB2
3rVAdGIxCIE3FGOkaakoJt4=
=vHSW
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
       [not found]               ` <m3n22ou09w.fsf@peorth.gweep.net>
@ 1999-02-08 23:19                 ` François Pinard
  0 siblings, 0 replies; 43+ messages in thread
From: François Pinard @ 1999-02-08 23:19 UTC (permalink / raw)

Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

> > UTF-8 is an encoding scheme, comparable to uuencode.

> It is?  Then I'm confused... for some reason I was thinking that UTF-8
> *was* Unicode.

Nowadays, the UCS may be represented as UCS-2 or UCS-4 internally, yet
UCS-2 is often seen externally.  The latest Unicode, if I understand
things correctly, highly promotes what was once called UTF-16, which is
a way of using one or two UCS-2 super-bytes for representing one million
characters.  There is also UTF-8 which is popular (and nice) and UTF-7
which is getting popular (and ugly).  Nicety and ugliness is well hidden
in decoders/encoders, so it does not really matter in practice.  UTF-7 is a
MIME related invention, it does not come from Unicode nor ISO.  There also
are other encodings, but they are obsolent enough to not be worth mentioning.

-- 
François Pinard                            mailto:pinard@iro.umontreal.ca
Join the free Translation Project!    http://www.iro.umontreal.ca/~pinard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-08 13:37       ` Simon Josefsson
@ 1999-02-08 23:43         ` Kenichi Handa
  0 siblings, 0 replies; 43+ messages in thread
From: Kenichi Handa @ 1999-02-08 23:43 UTC (permalink / raw)
  Cc: pinard, ding

Simon Josefsson <jas@pdc.kth.se> writes:
> Emacs seem a little bit confused about this itself, the menu bar
> option is called "Mule" and in it there is "Show all of MULE status".

The other titles in the menu bar are all capitalized.  So, I thought
we had better capitalize "MULE" too.

---
Ken'ichi HANDA
handa@etl.go.jp


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-08 22:03             ` James H. Cloos Jr.
@ 1999-02-09  5:29               ` Russ Allbery
  1999-02-09  7:33                 ` James H. Cloos Jr.
  0 siblings, 1 reply; 43+ messages in thread
From: Russ Allbery @ 1999-02-09  5:29 UTC (permalink / raw)


James H Cloos <cloos@jhcloos.com> writes:
>>>>>> "SSR" == Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

> SSR> And wow! I just noticed how badly Supercite failed to deal with
> SSR> your mailbox.  Probably because you have 8-bit data in a field
> SSR> that specifically calls for ASCII and only ASCII.  This one isn't
> SSR> a MULE bug, because there is no MULE in my XEmacs.

> Odd.  Works for me in GNU Emacs 20.3.1, with supercite.el revision:
> 3.54.  (Which says it was last modified 1993/09/22 18:58:46, FWIW.)

supercite.el has a lot of major annoyances in what it's willing to
recognize as valid characters for names and for e-mail addresses.  I use
the following, which fixes it a little at lesat for me:

;; Override sc-get-address with something that's less picky about what it's
;; willing to consider an address (supercite's default truncates the address
;; at the first odd-looking character).
(defun sc-get-address (from author)
  "Get the full email address path from FROM.
AUTHOR is the author's name (which is removed from the address)."
  (let ((eos (length from)))
    (if (string-match (concat "\\(^\\|^\"\\)" (regexp-quote author)
                              "\\(\\s +\\|\"\\s +\\)") from 0)
        (let ((address (substring from (match-end 0) eos)))
          (if (and (= (aref address 0) ?<)
                   (= (aref address (1- (length address))) ?>))
              (substring address 1 (1- (length address)))
            address))
      (if (string-match
           "[ 	]*<?\\([^ 	(>]+@[^ 	(>]+\\)" from 0)
          (sc-submatch 1 from)
        ""))))

;; Override sc-attribs-extract-namestring so that it will correctly cope
;; with From headers that contain no address (which is becoming more common
;; with munging, even if it's technically illegal).
(defun sc-attribs-extract-namestring (from)
  "Extract the name string from FROM.
This should be the author's full name minus an optional title."
  (let ((namestring
         (or
          ;; If there is a <...> in the name,
          ;; treat everything before that as the full name.
          ;; Even if it contains parens, use the whole thing.
          ;; On the other hand, we do look for quotes in the usual way.
          (and (string-match " *<.*>" from 0)
               (let ((before-angles
                      (sc-name-substring from 0 (match-beginning 0) 0)))
                 (if (string-match "\".*\"" before-angles 0)
                     (sc-name-substring
                      before-angles (match-beginning 0) (match-end 0) 1)
                   before-angles)))
          (sc-name-substring
           from (string-match "(.*)" from 0) (match-end 0) 1)
          (sc-name-substring
           from (string-match "\".*\"" from 0) (match-end 0) 1)
          (sc-name-substring
           from (string-match "\\([-.a-zA-Z0-9_]+\\s *\\)+" from 0)
           (match-end 0) 0)
          (sc-attribs-emailname from))))
    ;; strip off any leading or trailing whitespace
    (if namestring
        (let ((bos 0)
              (eos (1- (length namestring))))
          (while (and (<= bos eos)
                      (memq (aref namestring bos) '(32 ?\t)))
            (setq bos (1+ bos)))
          (while (and (> eos bos)
                      (memq (aref namestring eos) '(32 ?\t)))
            (setq eos (1- eos)))
          (substring namestring bos (1+ eos))))))

-- 
Russ Allbery (rra@stanford.edu)         <URL:http://www.eyrie.org/~eagle/>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-09  5:29               ` Russ Allbery
@ 1999-02-09  7:33                 ` James H. Cloos Jr.
  1999-02-10  2:13                   ` Stephen Zander
  0 siblings, 1 reply; 43+ messages in thread
From: James H. Cloos Jr. @ 1999-02-09  7:33 UTC (permalink / raw)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>>>>"JHC" == James H Cloos <cloos@jhcloos.com> writes:
>>>>> "SSR" == Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

JHC> Odd.  Works for me in GNU Emacs 20.3.1, with supercite.el
JHC> revision: 3.54.  (Which says it was last modified 1993/09/22
JHC> 18:58:46, FWIW.)

SSR> Hmmm... quite strange.  I'm using 3.55, the version bundled
SSR> with XEmacs 20.4.

You know what it is?

Since I'm running GNU Emacs 20.3.1, I'm running MULE.  As such, the
non-ASCII characters match supercite's regexes.  Since you are not
running MULE, you'll need different regexes for matching handles and
addresses, such as the ones Russ posted.

Or at least that seems like the (most) logical explanation....

- -JimC
- -- 
James H. Cloos, Jr.  <http://www.jhcloos.com/cloos/public_key> 1024D/ED7DAEA6 
<cloos@jhcloos.com>     E9E9 F828 61A4 6EA9 0F2B  63E7 997A 9F17 ED7D AEA6
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v0.9.2 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE2v+SemXqfF+19rqYRAqN2AJ9kZmgcQdFF2NZ67tQ916F3NtOjTQCfezvH
20iAxvQsBiKUQjjsecaZo+k=
=TTK5
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-08 15:52             ` François Pinard
       [not found]               ` <m3n22ou09w.fsf@peorth.gweep.net>
@ 1999-02-09  8:05               ` Steinar Bang
  1999-02-14 18:10                 ` UTF-8 (Was: More charset things) Steinar Bang
  1999-02-09 16:03               ` More charset things Lars Magne Ingebrigtsen
  2 siblings, 1 reply; 43+ messages in thread
From: Steinar Bang @ 1999-02-09  8:05 UTC (permalink / raw)


>>>>> François Pinard <pinard@iro.umontreal.ca>:

> That is why Lars could well decide, one of these days, to support
> UTF-8 as an encoding (which it really is) on the same level as
> Base64, and moreover, rather fun to implement.  It might be
> convenient that Gnus do so as a contribution to the Unicode effort,
> without really waiting for Emacs to do it.

But isn't UTF-8 support something that really should be done at the C
level (like base64 is done in newer emacsen)?  Or am I thinking of
UTF-7 here...? (does anyone have some handy online references?)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Displayed [   0: Stephen J. Turnbull     ] but it had lots of lines
  1999-02-05  0:47         ` Stephen J. Turnbull
                             ` (2 preceding siblings ...)
  1999-02-06  8:17           ` Lars Magne Ingebrigtsen
@ 1999-02-09 10:27           ` Alf-Ivar Holm
  1999-02-09 16:14             ` Lars Magne Ingebrigtsen
  1999-02-09 22:07           ` More charset things Jan Vroonhof
       [not found]           ` <m3hft163aa.fsf@p <byu2wv6xkb.fsf@bolzano.math.ethz.ch>
  5 siblings, 1 reply; 43+ messages in thread
From: Alf-Ivar Holm @ 1999-02-09 10:27 UTC (permalink / raw)


I got this in my summary buffer:

R                      [   0: Stephen J. Turnbull     ] 

but it did have lots of text.  (Its the last message in the References
header, do ^.)

	Affi


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-07 21:02                     ` Hrvoje Niksic
@ 1999-02-09 15:56                       ` Lars Magne Ingebrigtsen
  1999-02-09 17:21                         ` Hrvoje Niksic
  0 siblings, 1 reply; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-09 15:56 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> Mixing charset works for me in a Mule buffer, but there are
> environmental brain-damages that appear to be incurable for Mule.  For 
> instance, it insists that the default 128-255 chars are iso-8859-1,
> which is a hard-coded arbitrary value with no hope of ever changing it 
> to iso-8859-2.

Hm.  In this message, for instance, isn't "Dzień dobry" rendered
correctly for you if you use a Mule XEmacs?  

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-08 15:52             ` François Pinard
       [not found]               ` <m3n22ou09w.fsf@peorth.gweep.net>
  1999-02-09  8:05               ` Steinar Bang
@ 1999-02-09 16:03               ` Lars Magne Ingebrigtsen
  2 siblings, 0 replies; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-09 16:03 UTC (permalink / raw)


François Pinard <pinard@iro.umontreal.ca> writes:

> UTF-8 is an encoding scheme, comparable to uuencode.
> 
> But it is currently used to encode one and only character set, the UCS
> (described in Unicode manuals and within ISO 10646).  But theoretically,
> it could well be used to encode other things.

>From off the top of my head -- we have two things, "encoded character
set" (which one usually just calls "character set" unless there's a
possibility for confusion), and we have "character encoding scheme".
ECS and CES.  Unicode is an ECS and utf-8 is a CES that is only used
for the ECS Unicode.

However -- in a MIME context, we don't care about this.  What we deal
with is "charsets", which is not an ECS or an CES, but a combination
of the two.  Therefore, "charset=utf-8" is correct.  In a MIME
context, utf-8 is not an encoding, it is purely, and always, a
charset, and nothing else.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-08 16:04                   ` Bill White
@ 1999-02-09 16:04                     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-09 16:04 UTC (permalink / raw)


Bill White <billw@wolfram.com> writes:

> in my .emacs. What font are you using that lets the Russian characters
> show up?

I'm using the intlfonts package, which contains oodles of fonts. 

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Displayed [   0: Stephen J. Turnbull     ] but it had lots of lines
  1999-02-09 10:27           ` Displayed [ 0: Stephen J. Turnbull ] but it had lots of lines Alf-Ivar Holm
@ 1999-02-09 16:14             ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-09 16:14 UTC (permalink / raw)


Alf-Ivar Holm <affi@osc.no> writes:

> I got this in my summary buffer:
> 
> R                      [   0: Stephen J. Turnbull     ] 
> 
> but it did have lots of text.  (Its the last message in the References
> header, do ^.)

It shows up as

RA [  63: Stephen J. Turnbull ] Re: More charset things

here, using nnml.  What do you use?

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-09 15:56                       ` Lars Magne Ingebrigtsen
@ 1999-02-09 17:21                         ` Hrvoje Niksic
  1999-02-09 17:31                           ` Alan Shutko
  1999-02-09 17:37                           ` Lars Magne Ingebrigtsen
  0 siblings, 2 replies; 43+ messages in thread
From: Hrvoje Niksic @ 1999-02-09 17:21 UTC (permalink / raw)

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Hrvoje Niksic <hniksic@srce.hr> writes:
> 
> > Mixing charset works for me in a Mule buffer, but there are
> > environmental brain-damages that appear to be incurable for Mule.  For 
> > instance, it insists that the default 128-255 chars are iso-8859-1,
> > which is a hard-coded arbitrary value with no hope of ever changing it 
> > to iso-8859-2.
> 
> Hm.  In this message, for instance, isn't "Dzień dobry" rendered
> correctly for you if you use a Mule XEmacs?

I don't do Mule, but I suspect it renders correctly as long as the
charset parameter is right (and Gnus gets things right).  But that
wasn't the point.

The point is that I cannot explain XEmacs/Mule that all the 8bit files 
I will want to load and save in the near future are latin2, and that
if it encounters chars in the appropriate subset of [128,256) range,
it should treat them as latin2, not latin1.  Currently I have to do
things like `C-u C-x C-f FILENAME RET iso-8859-2 RET'.

Also, I don't want to see the iso2022 (or whatever) coding on my saved 
files, *ever*.  If files have to be saved in a multicharset format, it 
should be implemented as Unicode, so that at least other
(non-Japanese) software has a chance of getting it right.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-09 17:21                         ` Hrvoje Niksic
@ 1999-02-09 17:31                           ` Alan Shutko
  1999-02-09 17:37                           ` Lars Magne Ingebrigtsen
  1 sibling, 0 replies; 43+ messages in thread
From: Alan Shutko @ 1999-02-09 17:31 UTC (permalink / raw)
  Cc: ding

>>>>> "H" == Hrvoje Niksic <hniksic@srce.hr> writes:

H> The point is that I cannot explain XEmacs/Mule that all the 8bit
H> files I will want to load and save in the near future are latin2,
H> and that if it encounters chars in the appropriate subset of
H> [128,256) range, it should treat them as latin2, not latin1.

In Emacs, there's a variable "default-buffer-file-coding-system",
which may do what you want.  Is that variable in XEmacs?

-- 
Alan Shutko <ats@acm.org> - By consent of the corrupted
A woman's place is in the house... and in the Senate.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-09 17:21                         ` Hrvoje Niksic
  1999-02-09 17:31                           ` Alan Shutko
@ 1999-02-09 17:37                           ` Lars Magne Ingebrigtsen
  1999-02-09 18:06                             ` Hrvoje Niksic
  1 sibling, 1 reply; 43+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-02-09 17:37 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> The point is that I cannot explain XEmacs/Mule that all the 8bit files 
> I will want to load and save in the near future are latin2, and that
> if it encounters chars in the appropriate subset of [128,256) range,
> it should treat them as latin2, not latin1.  Currently I have to do
> things like `C-u C-x C-f FILENAME RET iso-8859-2 RET'.

Huh.  How, er, useless.  I thought that this was what Mule was all
about -- letting you do this automatically?  Isn't
(set-language-environment "Latin-2") (or something) what one is
supposed to do?

> Also, I don't want to see the iso2022 (or whatever) coding on my saved 
> files, *ever*.  If files have to be saved in a multicharset format, it 
> should be implemented as Unicode, so that at least other
> (non-Japanese) software has a chance of getting it right.

Yup.  Someone really needs to implement Unicode for the Emacsen.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-09 17:37                           ` Lars Magne Ingebrigtsen
@ 1999-02-09 18:06                             ` Hrvoje Niksic
  0 siblings, 0 replies; 43+ messages in thread
From: Hrvoje Niksic @ 1999-02-09 18:06 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Hrvoje Niksic <hniksic@srce.hr> writes:
> 
> > The point is that I cannot explain XEmacs/Mule that all the 8bit files 
> > I will want to load and save in the near future are latin2, and that
> > if it encounters chars in the appropriate subset of [128,256) range,
> > it should treat them as latin2, not latin1.  Currently I have to do
> > things like `C-u C-x C-f FILENAME RET iso-8859-2 RET'.
> 
> Huh.  How, er, useless.  I thought that this was what Mule was all
> about -- letting you do this automatically?  Isn't
> (set-language-environment "Latin-2") (or something) what one is
> supposed to do?

It didn't work for me in XEmacs/Mule when I tried it.  I asked about
it on the mailing list, and noone was able to instruct me how to do it 
right.  So I concluded that it can't be done.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-05  0:47         ` Stephen J. Turnbull
                             ` (3 preceding siblings ...)
  1999-02-09 10:27           ` Displayed [ 0: Stephen J. Turnbull ] but it had lots of lines Alf-Ivar Holm
@ 1999-02-09 22:07           ` Jan Vroonhof
       [not found]           ` <m3hft163aa.fsf@p <byu2wv6xkb.fsf@bolzano.math.ethz.ch>
  5 siblings, 0 replies; 43+ messages in thread
From: Jan Vroonhof @ 1999-02-09 22:07 UTC (permalink / raw)
  Cc: xemacs-mule

Hrvoje Niksic <hniksic@srce.hr> writes:

> It didn't work for me in XEmacs/Mule when I tried it.  I asked about
> it on the mailing list, and noone was able to instruct me how to do it 
> right.  So I concluded that it can't be done.

I think it can be done, but I think most of the language environments
are just plain wrong. For instance even in a "Croatian" language
environment the 'ctext coding system is preferred over iso-8859-2.
Ctext is a good choice for latin-1 based systems as it is "backwards
compatible" with latin-1.

I did some experimenting and I think you should try

(set-language-environment "Croatian")
(set-coding-category-system 'iso-8-designate 'iso-8859-2)

This makes iso-8859-2 the preferred non Japanese coding system. I
think you might even prefer setting the coding system priorities to
avoid all the Japanese ones.

For some reason all the code to do this is commented out. Note that
the FSF versions of the language environments do change the coding
priorities, but they are now handled centrally.
Somehow I have the feeling somebody tried to sync the XEmacs files
with the FSF versions but stopped midway.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
       [not found]           ` <m3hft163aa.fsf@p <byu2wv6xkb.fsf@bolzano.math.ethz.ch>
@ 1999-02-09 22:13             ` Hrvoje Niksic
  0 siblings, 0 replies; 43+ messages in thread
From: Hrvoje Niksic @ 1999-02-09 22:13 UTC (permalink / raw)

Jan Vroonhof <vroonhof@math.ethz.ch> writes:

> (set-language-environment "Croatian")
> (set-coding-category-system 'iso-8-designate 'iso-8859-2)

One more thing that baffles me about Mule is that all these things are 
totally undocumented.  It is near impossible to just *use* Mule if you 
come from a latin2 background.

The above may seem strange coming from a developer, but the fact is,
when I was starting with XEmacs, reading the documentation was a
pleasure.  Mule is a black hole in the Emacs tradition.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-09  7:33                 ` James H. Cloos Jr.
@ 1999-02-10  2:13                   ` Stephen Zander
  0 siblings, 0 replies; 43+ messages in thread
From: Stephen Zander @ 1999-02-10  2:13 UTC (permalink / raw)
  Cc: (ding)

>>>>> "James" == James H Cloos <cloos@jhcloos.com> writes:
    James> Since I'm running GNU Emacs 20.3.1, I'm running MULE.  As
    James> such, the non-ASCII characters match supercite's regexes.
    James> Since you are not running MULE, you'll need different
    James> regexes for matching handles and addresses, such as the
    James> ones Russ posted.

Ixnay, that can't be all the story.  I am running Xemacs/MULE &
supercite has exactly the same failure mode for me as that experienced
by Ratinox.

-- 
Stephen
---
It should be illegal to yell "Y2K" in a crowded economy.  :-) -- Larry Wall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-07 20:43         ` François Pinard
                             ` (2 preceding siblings ...)
       [not found]           ` <m37lttydo2.fsf@peorth.gweep.net>
@ 1999-02-11 10:09           ` Jan Vroonhof
  3 siblings, 0 replies; 43+ messages in thread
From: Jan Vroonhof @ 1999-02-11 10:09 UTC (permalink / raw)

Stephen Zander <gibreel@pobox.com> writes:

> Ixnay, that can't be all the story.  I am running Xemacs/MULE &
> supercite has exactly the same failure mode for me as that experienced
> by Ratinox.

Same here. Maybe only FSF Mule has this hack/trick/whatever that a-z
matches more than just a-z.[1]

Jan

Footnotes: 
[1]  Of course there is something to be said for this. Non-ascii is no 
longer a nicely ordered set anyway so you might as well order all the
lower case intl characters somewhere in the a-z range. However you get 
all kinds of strange questions then: Does a-o  match  ö for instance?

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* UTF-8 (Was: More charset things)
  1999-02-09  8:05               ` Steinar Bang
@ 1999-02-14 18:10                 ` Steinar Bang
  0 siblings, 0 replies; 43+ messages in thread
From: Steinar Bang @ 1999-02-14 18:10 UTC (permalink / raw)


>>>>> Steinar Bang <sb@metis.no>:

>>>>> François Pinard <pinard@iro.umontreal.ca>:
>> That is why Lars could well decide, one of these days, to support
>> UTF-8 as an encoding (which it really is) on the same level as
>> Base64, and moreover, rather fun to implement.  It might be
>> convenient that Gnus do so as a contribution to the Unicode effort,
>> without really waiting for Emacs to do it.

One reason to support UTF-8 decoding and encoding, is that
son-of-son-of-1036 (or watchamacallit) 
	http://www.ietf.org/internet-drafts/draft-ietf-usefor-article-01.txt
seems to recommend UTF-8 for both the headers and bodies of news
messages. 

Hm... the way this works would probably be to have a UTF-8 decoding
that would always attempt to decode a news message and then revert to
a locale or newsgroup specific setting if the UTF-8 decoding breaks
down (use of the iso-8859-1 charset in the case of the no.*
hierarchy).

UTF-8 encoding should probably not be made default for a while yet.
At least it should be made newsgroup hierarchy dependent.

> But isn't UTF-8 support something that really should be done at the C
> level (like base64 is done in newer emacsen)?  Or am I thinking of
> UTF-7 here...? (does anyone have some handy online references?)

UTF-8 is defined in RFC2279
	ftp://ftp.ntnu.no/pub/rfc/rfc2279.txt

UTF-7 is defined in RFC2152
	ftp://ftp.ntnu.no/pub/rfc/rfc2152.txt

Both would probably be best off with decoding done in C.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: More charset things
  1999-02-08  2:09           ` Martin Buchholz
@ 1999-02-22 15:52             ` François Pinard
  0 siblings, 0 replies; 43+ messages in thread
From: François Pinard @ 1999-02-22 15:52 UTC (permalink / raw)
  Cc: ding, xemacs-mule

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1113 bytes --]

Martin Buchholz <martin@xemacs.org> writes:

> F> *The* reference, which I never seen (my librarian says the editor is out
> F> of stock), is supposed to be the Ken Lunde book, in the ORA series.

> Where've you been?  The second edition is finally out.  CJKV!

Yes, yeah!  I finally got a copy, after having waited for more than a year.
I surely have no time to read it right away, yet at first glance, it looks
like a wonderful book, and I'll surely find many, many answers in there. :-)

Oh, there is a mere mention about Mule, but no documentation.  Do not
buy this book if you are only looking for Mule specificities.  However,
Mule is an integrator for many charsets described in the book.  So, the
book might be useful for anybody interested in Asian charsets details,
whether pro-Mule or con-Mule.

P.S. - Of course, if this message was going to any FSF list, which I think
it does not, I would have refrained from commenting of a non-free book. :-)

-- 
François Pinard                            mailto:pinard@iro.umontreal.ca
Join the free Translation Project!    http://www.iro.umontreal.ca/~pinard

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~1999-02-22 15:52 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-02-03 18:09 More charset things Lars Magne Ingebrigtsen
1999-02-04 14:56 ` Hrvoje Niksic
1999-02-04 17:08   ` Lars Magne Ingebrigtsen
1999-02-04 17:21     ` Hrvoje Niksic
1999-02-04 17:49       ` Lars Magne Ingebrigtsen
1999-02-05  0:47         ` Stephen J. Turnbull
1999-02-05  2:43           ` Hrvoje Niksic
     [not found]           ` <m3hft163aa.fsf@peorth.gweep.net>
1999-02-05 19:06             ` Vladimir Volovich
     [not found]               ` <m3sockqqjx.fsf@peorth.gweep.net>
1999-02-06 15:55                 ` Lars Magne Ingebrigtsen
     [not found]                   ` <m3lnia5922.fsf@peorth.gweep.net>
1999-02-07 21:02                     ` Hrvoje Niksic
1999-02-09 15:56                       ` Lars Magne Ingebrigtsen
1999-02-09 17:21                         ` Hrvoje Niksic
1999-02-09 17:31                           ` Alan Shutko
1999-02-09 17:37                           ` Lars Magne Ingebrigtsen
1999-02-09 18:06                             ` Hrvoje Niksic
1999-02-08 16:04                   ` Bill White
1999-02-09 16:04                     ` Lars Magne Ingebrigtsen
1999-02-06  8:17           ` Lars Magne Ingebrigtsen
1999-02-09 10:27           ` Displayed [ 0: Stephen J. Turnbull ] but it had lots of lines Alf-Ivar Holm
1999-02-09 16:14             ` Lars Magne Ingebrigtsen
1999-02-09 22:07           ` More charset things Jan Vroonhof
     [not found]           ` <m3hft163aa.fsf@p <byu2wv6xkb.fsf@bolzano.math.ethz.ch>
1999-02-09 22:13             ` Hrvoje Niksic
1999-02-07 20:43         ` François Pinard
1999-02-08  2:09           ` Martin Buchholz
1999-02-22 15:52             ` François Pinard
1999-02-08 14:49           ` Robert Bihlmeyer
     [not found]           ` <m37lttydo2.fsf@peorth.gweep.net>
1999-02-08  9:55             ` Kai.Grossjohann
1999-02-08 15:52             ` François Pinard
     [not found]               ` <m3n22ou09w.fsf@peorth.gweep.net>
1999-02-08 23:19                 ` François Pinard
1999-02-09  8:05               ` Steinar Bang
1999-02-14 18:10                 ` UTF-8 (Was: More charset things) Steinar Bang
1999-02-09 16:03               ` More charset things Lars Magne Ingebrigtsen
1999-02-08 17:29             ` Karl Eichwalder
1999-02-08 22:03             ` James H. Cloos Jr.
1999-02-09  5:29               ` Russ Allbery
1999-02-09  7:33                 ` James H. Cloos Jr.
1999-02-10  2:13                   ` Stephen Zander
1999-02-11 10:09           ` Jan Vroonhof
1999-02-07 19:37       ` François Pinard
1999-02-08  0:06         ` Kenichi Handa
1999-02-07 19:35     ` François Pinard
1999-02-08 13:37       ` Simon Josefsson
1999-02-08 23:43         ` Kenichi Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).