Gnus development mailing list
 help / color / mirror / Atom feed
* documentation of gnus-article-treat-dumbquotes outdated?
       [not found] ` <v98yq07afl.fsf@marauder.physik.uni-ulm.de>
@ 2003-08-12 20:09   ` Reiner Steib
  2003-08-12 21:12     ` Matthias Andree
                       ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Reiner Steib @ 2003-08-12 20:09 UTC (permalink / raw)
  Cc: Dan Jacobson

[ taking this issue from bugs@gnus.org to ding ]

On Tue, Aug 12 2003, Dan Jacobson wrote:
| Newsgroups: gnus.gnus-bug
| Date: Tue Aug 12 06:09:50 2003 +0200
| Message-ID: <87ptjb7bzl.fsf@jidanni.org>

> gnus-article-treat-dumbquotes docstring doesn't mention it can fix the
> Euro symbol too...

I agree with Dan that the doc-string should mention that it also
covers not only quotations, but also other characters (EUR, ...).
I would like to fix the doc-string and the manual entry.  It seems to
me that nowadays, the manual entry is not accurate anymore:

,----[ (info "(gnus)Article Washing") ]
| `W d'
|      Treat M****s*** sm*rtq**t*s according to
|      `gnus-article-dumbquotes-map' (`gnus-article-treat-dumbquotes').
|      Note that this function guesses whether a character is a
|      sm*rtq**t* or not, so it should only be used interactively.
| 
|      Sm*rtq**t*s are M****s***'s unilateral extension to the character
|      map in an attempt to provide more quoting characters.  If you see
|      something like `\222' or `\264' where you're expecting some kind of
|      apostrophe or quotation mark, then try this wash.
`----

- What does "unilateral extension to the character map" mean?  Which
  character map?  Latin1 (= iso-8859-1)?

- We should mention \200 (-> EUR), as it's probably the most frequent
  `dumb-quote' now (at least in Europe).

- I guess most (or all?) chars from `gnus-article-dumbquotes-map' are
  `windows-1252' characters, not present in Latin-1.  `windows-1252'
  is a registered[1] charset.  So the real problem is that some
  Windows clients send `windows-1252' labeled as `iso-8859-1'.

- A better alternative to `W d' in the development version of Emacs
  (CVS HEAD) is `1 g windows-1252 RET' after `(require "code-pages")'.
  Should we mentions this?

Any suggestions, opinions?

Bye, Reiner.

[1] http://www.iana.org/assignments/character-sets
    See also:
    http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
    http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo--- PGP key available via WWW   http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: documentation of gnus-article-treat-dumbquotes outdated?
  2003-08-12 20:09   ` documentation of gnus-article-treat-dumbquotes outdated? Reiner Steib
@ 2003-08-12 21:12     ` Matthias Andree
  2003-08-13 15:27     ` Benjamin Riefenstahl
  2003-08-18  2:00     ` James H. Cloos Jr.
  2 siblings, 0 replies; 6+ messages in thread
From: Matthias Andree @ 2003-08-12 21:12 UTC (permalink / raw)


Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes:

> [ taking this issue from bugs@gnus.org to ding ]
>
> On Tue, Aug 12 2003, Dan Jacobson wrote:
> | Newsgroups: gnus.gnus-bug
> | Date: Tue Aug 12 06:09:50 2003 +0200
> | Message-ID: <87ptjb7bzl.fsf@jidanni.org>
>
>> gnus-article-treat-dumbquotes docstring doesn't mention it can fix the
>> Euro symbol too...
>
> I agree with Dan that the doc-string should mention that it also
> covers not only quotations, but also other characters (EUR, ...).
> I would like to fix the doc-string and the manual entry.  It seems to
> me that nowadays, the manual entry is not accurate anymore:

Well, this function should also get a name that's not so easy to get
wrong. The dumbquotes stuff aims to fix misdeclared characters (outside
the declared character set), so it deals with quote MARKS -- but
Microsoft software also gets indenting quotation wrong.

So how about renaming that to gnus-article-treat-quotemarks and 

> - What does "unilateral extension to the character map" mean?  Which
>   character map?  Latin1 (= iso-8859-1)?

This is irrelevant. Elide it.

> - We should mention \200 (-> EUR), as it's probably the most frequent
>   `dumb-quote' now (at least in Europe).

Speaking of Europe and the Euro zone: Does Greek suffer from the same
problem in real life, i. e. ISO-8859-7 misdeclared for Windows-1253
content? 1253 also has the Euro sign (€) at \200 = \x80. Microsoft on
this, in their Euro FAQ:

  "Q What is the symbol's Windows codepage location?

   A The symbol has been added to the following codepages at position
     '0x80'; 1250 Eastern European, 1252 Western, 1253 Greek, 1254
     Turkish, 1257 Baltic, 1255 Hebrew, 1256 Arabic, 1258 Vietnamese,
     874 Thai. In 1251 Cyrillic the symbol will be added at position
     '0x88'. Other codepages are controlled by governments or standards
     bodies. Microsoft is working with these organizations on the
     placement of the euro."
  (http://www.microsoft.com/typography/faq/faq12.htm)

> - I guess most (or all?) chars from `gnus-article-dumbquotes-map' are
>   `windows-1252' characters, not present in Latin-1.  `windows-1252'
>   is a registered[1] charset.  So the real problem is that some
>   Windows clients send `windows-1252' labeled as `iso-8859-1'.

True for anything misdeclared I've found so far.

-- 
Matthias Andree



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: documentation of gnus-article-treat-dumbquotes outdated?
  2003-08-12 20:09   ` documentation of gnus-article-treat-dumbquotes outdated? Reiner Steib
  2003-08-12 21:12     ` Matthias Andree
@ 2003-08-13 15:27     ` Benjamin Riefenstahl
  2003-08-13 16:46       ` Reiner Steib
  2003-08-18  2:00     ` James H. Cloos Jr.
  2 siblings, 1 reply; 6+ messages in thread
From: Benjamin Riefenstahl @ 2003-08-13 15:27 UTC (permalink / raw)
  Cc: ding, Dan Jacobson

Hi Reiner, Dan,


Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes:
> - I guess most (or all?) chars from `gnus-article-dumbquotes-map' are
>   `windows-1252' characters, not present in Latin-1.  `windows-1252'
>   is a registered[1] charset.  So the real problem is that some
>   Windows clients send `windows-1252' labeled as `iso-8859-1'.

See http://mail.gnu.org/archive/html/emacs-devel/2003-04/msg00177.html
and the discussion around that.

I than wrote:
> That's almost always cp1252, often mislabeled as iso-8859-1 or
> us-ascii, or even unlabeled.  I use these settings in GNUS to cope:
> 
>  (setq gnus-newsgroup-ignored-charsets
>        '(unknown-8bit x-unknown us-ascii iso-8859-1))
>  (setq gnus-default-charset 'cp1252)
> 
> I.e. I disable using the MIME parameters for us-ascii and iso-8859-1
> and set the default to cp1252 instead.  This has worked fine so far
> for me.  Real us-ascii or iso-8859-1 messages don't have a problem
> with this, as cp1252 is a proper superset of iso-8859-1.


benny




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: documentation of gnus-article-treat-dumbquotes outdated?
  2003-08-13 15:27     ` Benjamin Riefenstahl
@ 2003-08-13 16:46       ` Reiner Steib
  2003-08-13 18:12         ` Benjamin Riefenstahl
  0 siblings, 1 reply; 6+ messages in thread
From: Reiner Steib @ 2003-08-13 16:46 UTC (permalink / raw)


On Wed, Aug 13 2003, Benjamin Riefenstahl wrote:

> Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes:
>> - I guess most (or all?) chars from `gnus-article-dumbquotes-map' are
>>   `windows-1252' characters, not present in Latin-1.  `windows-1252'
>>   is a registered[1] charset.  So the real problem is that some
>>   Windows clients send `windows-1252' labeled as `iso-8859-1'.

[ http://mail.gnu.org/archive/html/emacs-devel/2003-04/msg00177.html ]
> I than wrote:
[...]
>>  (setq gnus-newsgroup-ignored-charsets
>>        '(unknown-8bit x-unknown us-ascii iso-8859-1))
>>  (setq gnus-default-charset 'cp1252)
>> 
>> I.e. I disable using the MIME parameters for us-ascii and iso-8859-1
>> and set the default to cp1252 instead.  This has worked fine so far
>> for me.  Real us-ascii or iso-8859-1 messages don't have a problem
>> with this, as cp1252 is a proper superset of iso-8859-1.

I understand that this works (as long as "un-/mis-labeled" doesn't
include other charsets).  Could you elaborate what you suggest WRT the
documentation of gnus-article-treat-dumbquotes?

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo--- PGP key available via WWW   http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: documentation of gnus-article-treat-dumbquotes outdated?
  2003-08-13 16:46       ` Reiner Steib
@ 2003-08-13 18:12         ` Benjamin Riefenstahl
  0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Riefenstahl @ 2003-08-13 18:12 UTC (permalink / raw)
  Cc: Reiner Steib

Hi Reiner,


Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes:
> I understand that this works (as long as "un-/mis-labeled" doesn't
> include other charsets).  Could you elaborate what you suggest WRT
> the documentation of gnus-article-treat-dumbquotes?

Sorry, this was more of a drive-by response.  I had solved the
underlying issue in a different way than gnus-article-treat-dumbquotes
or the solution in that discussion that I mentioned which was actually
about rmail.

I think that somehow patching/washing Latin-1 text is not an adequate
solution, when what we have is actually a problem of mis-labeling.
Correcting the label seems the right solution to me.

After that of course comes the separate question of how to represent
the result.  If you are on a latin-1-only text terminal or you don't
have the fonts, you would need to represent the characters involved
by approximation, e.g. by using the display table. 

In effect, if it was just me, I'd devise a solution along those lines
and deprecate things like gnus-article-treat-dumbquotes altogether.
Your mentioning of `1 g windows-1252 RET' goes into the same
direction.

This could be simplyfied if we had a customizable variable that has a
selection of language environments and that sets the variables
gnus-newsgroup-ignored-charsets and gnus-default-charset accordingly.
Or it could even be automatic with setting gnus-default-charset.
I.e. when I do (setq gnus-default-charset 'windows-1252), us-ascii and
iso-8859-1 could be added to gnus-newsgroup-ignored-charsets
automatically.


> Could you elaborate what you suggest WRT the documentation of
> gnus-article-treat-dumbquotes?

On a more "constructive" note:

> - What does "unilateral extension to the character map" mean?  Which
> character map?  Latin1 (= iso-8859-1)?

That's what it does, see gnus-article-dumbquotes-map. 

> - I guess most (or all?) chars from `gnus-article-dumbquotes-map'
> are `windows-1252' characters, not present in Latin-1.

Actually my version of gnus-article-dumbquotes-map (Gnus 5.10.1) also
includes "\264" (replacing it with "'") which is a regular Latin-1
character, I believe.  Not sure why, possibly for symmetry.

> - A better alternative to `W d' in the development version of Emacs
> (CVS HEAD) is `1 g windows-1252 RET' after `(require "code-pages")'.
> Should we mentions this?

> ,----[ (info "(gnus)Article Washing") ]
> | `W d'
> |      Treat M****s*** sm*rtq**t*s according to
> |      `gnus-article-dumbquotes-map' (`gnus-article-treat-dumbquotes').
> |      Note that this function guesses whether a character is a
> |      sm*rtq**t* or not, so it should only be used interactively.
> | 
> |      Sm*rtq**t*s are M****s***'s unilateral extension to the character
> |      map in an attempt to provide more quoting characters.  If you see
> |      something like `\222' or `\264' where you're expecting some kind of
> |      apostrophe or quotation mark, then try this wash.
> `----

How about:

  Replace characters in the article according to
  `gnus-article-dumbquotes-map' (`gnus-article-treat-dumbquotes').

  This function is intended for articles that show `\222' or `\264'
  where you're expecting some kind of apostrophe or quotation mark, or
  `\264' where you expect a Euro sign or similar breakage.

  Using the default settings for `gnus-article-dumbquotes-map' the
  function will replace characters from Microsoft's extension to the
  ISO-8859-1 charset with ASCII characters.  This means e.g. that
  curly-quote characters will be replaced with ASCII quotes and the
  Euro sign will be replaced with the string "EUR".

  Note that on displays that support it, `1 g windows-1252 RET' (as a
  one-shot) or customizing `gnus-default-charset' to `windows-1252'
  (as a permanent solution) is a better way of handling the problem.

I'm probably not fully adhering to the rules for Emacs documentation
strings here ;-).  The last sentence assumes the implementation I
mention above.  I'm not sure how and when to mention (require
'code-pages).  Ideally code-pages.el should be autoloaded or preloaded
IMO.  I hope that the extension for other language environments is
obvious to a user, otherwise that topic should probably be expanded in
the info docs.


benny




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: documentation of gnus-article-treat-dumbquotes outdated?
  2003-08-12 20:09   ` documentation of gnus-article-treat-dumbquotes outdated? Reiner Steib
  2003-08-12 21:12     ` Matthias Andree
  2003-08-13 15:27     ` Benjamin Riefenstahl
@ 2003-08-18  2:00     ` James H. Cloos Jr.
  2 siblings, 0 replies; 6+ messages in thread
From: James H. Cloos Jr. @ 2003-08-18  2:00 UTC (permalink / raw)


Yes, gnus should start treating the doze charsets as actual charsets,
and provide a function that redisplays messages that were tagged as
latin1 et al under the corresponding doze charset.  Then that function
could replace gnus-article-treat-dumbquotes in the washing map.

Incidently, as I primarily use utf8, I setq gnus-article-dumbquotes-map
to this in my ~/.gnus: (probably won’t look right it you cannot read
utf8....)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(setq gnus-article-dumbquotes-map
  '(
    ("\200" "€")

    ("\202" "‚")
    ("\203" "ƒ")
    ("\204" "„")
    ("\205" "…")
    ("\206" "†")
    ("\207" "‡")
    ("\210" "ˆ")
    ("\211" "‰")
    ("\212" "Š")
    ("\213" "‹")
    ("\214" "Œ")

    ("\216" "Ž")


    ("\221" "‘")
    ("\222" "’")
    ("\223" "“")
    ("\224" "”")
    ("\225" "•")
    ("\226" "–")
    ("\227" "—")
    ("\230" "˜")
    ("\231" "™")
    ("\232" "š")
    ("\233" "›")
    ("\234" "œ")

    ("\236" "ž")
    ("\237" "Ÿ")
    ("\264" "´")
    ))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

That ensures that I see what the author intended.

-JimC





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-08-18  2:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87znigdhpa.fsf@jidanni.org>
     [not found] ` <v98yq07afl.fsf@marauder.physik.uni-ulm.de>
2003-08-12 20:09   ` documentation of gnus-article-treat-dumbquotes outdated? Reiner Steib
2003-08-12 21:12     ` Matthias Andree
2003-08-13 15:27     ` Benjamin Riefenstahl
2003-08-13 16:46       ` Reiner Steib
2003-08-13 18:12         ` Benjamin Riefenstahl
2003-08-18  2:00     ` James H. Cloos Jr.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).