Gnus development mailing list
 help / color / mirror / Atom feed
* Where to complain?
@ 2006-01-10  7:31 David Kastrup
  2006-01-10  8:36 ` Katsumi Yamaoka
  0 siblings, 1 reply; 7+ messages in thread
From: David Kastrup @ 2006-01-10  7:31 UTC (permalink / raw)



Hi,

when Gnus is rendering HTML mail parts in an utf-8 language setting,
it replaces all (properly declared) latin-1 characters in the HTML
mail part with spaces.  If one uses K B to select the text
alternative, those characters render fine.

So what software subsystem is involved with the HTML rendering, and
where would one report it?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Where to complain?
  2006-01-10  7:31 Where to complain? David Kastrup
@ 2006-01-10  8:36 ` Katsumi Yamaoka
  2006-01-10  9:05   ` David Kastrup
  0 siblings, 1 reply; 7+ messages in thread
From: Katsumi Yamaoka @ 2006-01-10  8:36 UTC (permalink / raw)
  Cc: ding

>>>>> In <85u0cc7d4r.fsf@lola.goethe.zz> David Kastrup wrote:

> when Gnus is rendering HTML mail parts in an utf-8 language setting,
> it replaces all (properly declared) latin-1 characters in the HTML
> mail part with spaces.  If one uses K B to select the text
> alternative, those characters render fine.

> So what software subsystem is involved with the HTML rendering, and
> where would one report it?

I don't know what does it, but such a behavior is performed by
the HTML renderer specified in the `mm-text-html-renderer'
variable; `w3m', `links' and `lynx' work fine, AFAIK.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Where to complain?
  2006-01-10  8:36 ` Katsumi Yamaoka
@ 2006-01-10  9:05   ` David Kastrup
  2006-01-10  9:36     ` Katsumi Yamaoka
  0 siblings, 1 reply; 7+ messages in thread
From: David Kastrup @ 2006-01-10  9:05 UTC (permalink / raw)
  Cc: ding

Katsumi Yamaoka <yamaoka@jpl.org> writes:

>>>>>> In <85u0cc7d4r.fsf@lola.goethe.zz> David Kastrup wrote:
>
>> when Gnus is rendering HTML mail parts in an utf-8 language setting,
>> it replaces all (properly declared) latin-1 characters in the HTML
>> mail part with spaces.  If one uses K B to select the text
>> alternative, those characters render fine.
>
>> So what software subsystem is involved with the HTML rendering, and
>> where would one report it?
>
> I don't know what does it, but such a behavior is performed by
> the HTML renderer specified in the `mm-text-html-renderer'
> variable; `w3m', `links' and `lynx' work fine, AFAIK.

w3m-standalone in my case, which would be the standard setting.  So it
would appear that the character encoding (which is part of the MIME
part declaration) is not passed to w3m or something?

w3m alone used on web pages outside of Emacs appears to work.  Maybe
something is wrong in mm-view.el?  It would appear that the following
is responsible for the stuff:

(defun mm-inline-render-with-stdin (handle post-func cmd &rest args)
  (let ((source (mm-get-part handle)))
    (mm-insert-inline
     handle
     (mm-with-unibyte-buffer
       (insert source)
       (apply 'mm-inline-wash-with-stdin post-func cmd args)
       (buffer-string)))))

Now what happens with the unibyte-buffer?  Perhaps the
mm-inline-wash-with-stdin process filter is not too happy about
inserting w3m output in utf-8 (the current locale) into a unibyte
buffer?  Or buffer-string, which likely happens to be unibyte decoded
into utf-8, then gets interpreted as latin-1, the encoding of the mime
part?  Or w3m tries interpreting the latin-1 encoded input as utf-8?

There seem many ways to get this wrong.  When the locale and language
environment and part encoding are all latin-1, stuff works.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Where to complain?
  2006-01-10  9:05   ` David Kastrup
@ 2006-01-10  9:36     ` Katsumi Yamaoka
  2006-01-10 10:10       ` David Kastrup
  0 siblings, 1 reply; 7+ messages in thread
From: Katsumi Yamaoka @ 2006-01-10  9:36 UTC (permalink / raw)
  Cc: ding

>>>>> In <85hd8c78rb.fsf@lola.goethe.zz> David Kastrup wrote:

> Katsumi Yamaoka <yamaoka@jpl.org> writes:

>> I don't know what does it, but such a behavior is performed by
>> the HTML renderer specified in the `mm-text-html-renderer'
>> variable; `w3m', `links' and `lynx' work fine, AFAIK.

> w3m-standalone in my case, which would be the standard setting.  So it
> would appear that the character encoding (which is part of the MIME
> part declaration) is not passed to w3m or something?

> w3m alone used on web pages outside of Emacs appears to work.  Maybe
> something is wrong in mm-view.el?  It would appear that the following
> is responsible for the stuff:

> (defun mm-inline-render-with-stdin (handle post-func cmd &rest args)
>   (let ((source (mm-get-part handle)))
>     (mm-insert-inline
>      handle
>      (mm-with-unibyte-buffer
>        (insert source)
>        (apply 'mm-inline-wash-with-stdin post-func cmd args)
>        (buffer-string)))))

> Now what happens with the unibyte-buffer?  Perhaps the
> mm-inline-wash-with-stdin process filter is not too happy about
> inserting w3m output in utf-8 (the current locale) into a unibyte
> buffer?  Or buffer-string, which likely happens to be unibyte decoded
> into utf-8, then gets interpreted as latin-1, the encoding of the mime
> part?  Or w3m tries interpreting the latin-1 encoded input as utf-8?

I see.  That is just the point.  When w3m is called by way of
emacs-w3m (i.e., mm-text-html-renderer is set to `w3m'),
w3m-input-coding-system and w3m-output-coding-system are used
for exchanging data.

> There seem many ways to get this wrong.  When the locale and language
> environment and part encoding are all latin-1, stuff works.

The locale and the language environment are irrelevant, I think.
We may need to make Gnus provide the following functions:

mm-inline-text-html-render-with-w3m-standalone
gnus-article-wash-html-with-w3m-standalone

However, they will be complicated like emacs-w3m if ones want to
make w3m-standalone work well like emacs-w3m.  Why don't you use
w3m? :)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Where to complain?
  2006-01-10  9:36     ` Katsumi Yamaoka
@ 2006-01-10 10:10       ` David Kastrup
  2006-01-10 11:38         ` Katsumi Yamaoka
  0 siblings, 1 reply; 7+ messages in thread
From: David Kastrup @ 2006-01-10 10:10 UTC (permalink / raw)
  Cc: ding

Katsumi Yamaoka <yamaoka@jpl.org> writes:

>>>>>> In <85hd8c78rb.fsf@lola.goethe.zz> David Kastrup wrote:
>
>> Katsumi Yamaoka <yamaoka@jpl.org> writes:
>
>>> I don't know what does it, but such a behavior is performed by
>>> the HTML renderer specified in the `mm-text-html-renderer'
>>> variable; `w3m', `links' and `lynx' work fine, AFAIK.
>
>> w3m-standalone in my case, which would be the standard setting.  So it
>> would appear that the character encoding (which is part of the MIME
>> part declaration) is not passed to w3m or something?
>
>> w3m alone used on web pages outside of Emacs appears to work.  Maybe
>> something is wrong in mm-view.el?  It would appear that the following
>> is responsible for the stuff:
>
>> (defun mm-inline-render-with-stdin (handle post-func cmd &rest args)
>>   (let ((source (mm-get-part handle)))
>>     (mm-insert-inline
>>      handle
>>      (mm-with-unibyte-buffer
>>        (insert source)
>>        (apply 'mm-inline-wash-with-stdin post-func cmd args)
>>        (buffer-string)))))
>
>> Now what happens with the unibyte-buffer?  Perhaps the
>> mm-inline-wash-with-stdin process filter is not too happy about
>> inserting w3m output in utf-8 (the current locale) into a unibyte
>> buffer?  Or buffer-string, which likely happens to be unibyte decoded
>> into utf-8, then gets interpreted as latin-1, the encoding of the mime
>> part?  Or w3m tries interpreting the latin-1 encoded input as utf-8?
>
>> There seem many ways to get this wrong.  When the locale and language
>> environment and part encoding are all latin-1, stuff works.
>
> The locale and the language environment are irrelevant, I think.

Well, in an all-latin1 locale this works without losing characters.
But that involved both the locale of Emacs (I have no hard-configured
language-environment) and w3m.

> We may need to make Gnus provide the following functions:
>
> mm-inline-text-html-render-with-w3m-standalone
> gnus-article-wash-html-with-w3m-standalone

I don't see how w3m would differ from any other HTML engine receiving
HTML and outputting suitable for the current locale.

> However, they will be complicated like emacs-w3m if ones want to
> make w3m-standalone work well like emacs-w3m.  Why don't you use
> w3m? :)

Because it is not part of Emacs proper.  The default setup of Emacs
should not be broken.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Where to complain?
  2006-01-10 10:10       ` David Kastrup
@ 2006-01-10 11:38         ` Katsumi Yamaoka
  2006-01-11  8:34           ` w3m-standalone (was Re: Where to complain?) Katsumi Yamaoka
  0 siblings, 1 reply; 7+ messages in thread
From: Katsumi Yamaoka @ 2006-01-10 11:38 UTC (permalink / raw)


>>>>> In <858xto75qn.fsf@lola.goethe.zz> David Kastrup wrote:

> Katsumi Yamaoka <yamaoka@jpl.org> writes:

>> The locale and the language environment are irrelevant, I think.

> Well, in an all-latin1 locale this works without losing characters.
> But that involved both the locale of Emacs (I have no hard-configured
> language-environment) and w3m.

[...]

> I don't see how w3m would differ from any other HTML engine receiving
> HTML and outputting suitable for the current locale.

Do those engines handle non-Latin text, e.g., CJK?  There are
not only wide characters but also data that might be
misunderstood as HTML commands, so HTML engines will be unable
to render them properly if a charset used is not given.  The w3m
developers decided charsets should be specified for input and
output as the arguments, and data fed should be encoded by that
charset (I'm not a w3m developer, though).  Therefore, the
present w3m-standalone code in Gnus should never work with
non-ASCII text.

>> However, they will be complicated like emacs-w3m if ones want to
>> make w3m-standalone work well like emacs-w3m.  Why don't you use
>> w3m? :)

> Because it is not part of Emacs proper.  The default setup of Emacs
> should not be broken.

You can set mm-text-html-renderer to `links', `lynx',
`html2text' or a Lisp function you made instead.  I don't know
whether all the candidates other than w3m to customize
mm-text-html-renderer are Emacs proper, though.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* w3m-standalone (was Re: Where to complain?)
  2006-01-10 11:38         ` Katsumi Yamaoka
@ 2006-01-11  8:34           ` Katsumi Yamaoka
  0 siblings, 0 replies; 7+ messages in thread
From: Katsumi Yamaoka @ 2006-01-11  8:34 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 387 bytes --]

>>>>> In <b4md5j0gvob.fsf@jpl.org> Katsumi Yamaoka wrote:

> Therefore, the present w3m-standalone code in Gnus should never work
> with non-ASCII text.

The attached code will allow you to read HTML mails containing
non-ASCII text.  I tested it only with Latin-1 and Japanese.
Note that HTML mails have to have the charset spec as a MIME
header, and it won't work fine like emacs-w3m.


[-- Attachment #2: Type: application/emacs-lisp, Size: 957 bytes --]

[-- Attachment #3: Type: text/plain, Size: 67 bytes --]


I have no idea to do it with mm-text-html-washer-alist similarly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-01-11  8:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-10  7:31 Where to complain? David Kastrup
2006-01-10  8:36 ` Katsumi Yamaoka
2006-01-10  9:05   ` David Kastrup
2006-01-10  9:36     ` Katsumi Yamaoka
2006-01-10 10:10       ` David Kastrup
2006-01-10 11:38         ` Katsumi Yamaoka
2006-01-11  8:34           ` w3m-standalone (was Re: Where to complain?) Katsumi Yamaoka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).