* documentation of gnus-article-treat-dumbquotes outdated? [not found] ` <v98yq07afl.fsf@marauder.physik.uni-ulm.de> @ 2003-08-12 20:09 ` Reiner Steib 2003-08-12 21:12 ` Matthias Andree ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Reiner Steib @ 2003-08-12 20:09 UTC (permalink / raw) Cc: Dan Jacobson [ taking this issue from bugs@gnus.org to ding ] On Tue, Aug 12 2003, Dan Jacobson wrote: | Newsgroups: gnus.gnus-bug | Date: Tue Aug 12 06:09:50 2003 +0200 | Message-ID: <87ptjb7bzl.fsf@jidanni.org> > gnus-article-treat-dumbquotes docstring doesn't mention it can fix the > Euro symbol too... I agree with Dan that the doc-string should mention that it also covers not only quotations, but also other characters (EUR, ...). I would like to fix the doc-string and the manual entry. It seems to me that nowadays, the manual entry is not accurate anymore: ,----[ (info "(gnus)Article Washing") ] | `W d' | Treat M****s*** sm*rtq**t*s according to | `gnus-article-dumbquotes-map' (`gnus-article-treat-dumbquotes'). | Note that this function guesses whether a character is a | sm*rtq**t* or not, so it should only be used interactively. | | Sm*rtq**t*s are M****s***'s unilateral extension to the character | map in an attempt to provide more quoting characters. If you see | something like `\222' or `\264' where you're expecting some kind of | apostrophe or quotation mark, then try this wash. `---- - What does "unilateral extension to the character map" mean? Which character map? Latin1 (= iso-8859-1)? - We should mention \200 (-> EUR), as it's probably the most frequent `dumb-quote' now (at least in Europe). - I guess most (or all?) chars from `gnus-article-dumbquotes-map' are `windows-1252' characters, not present in Latin-1. `windows-1252' is a registered[1] charset. So the real problem is that some Windows clients send `windows-1252' labeled as `iso-8859-1'. - A better alternative to `W d' in the development version of Emacs (CVS HEAD) is `1 g windows-1252 RET' after `(require "code-pages")'. Should we mentions this? Any suggestions, opinions? Bye, Reiner. [1] http://www.iana.org/assignments/character-sets See also: http://www.microsoft.com/globaldev/reference/sbcs/1252.htm http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT -- ,,, (o o) ---ooO-(_)-Ooo--- PGP key available via WWW http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: documentation of gnus-article-treat-dumbquotes outdated? 2003-08-12 20:09 ` documentation of gnus-article-treat-dumbquotes outdated? Reiner Steib @ 2003-08-12 21:12 ` Matthias Andree 2003-08-13 15:27 ` Benjamin Riefenstahl 2003-08-18 2:00 ` James H. Cloos Jr. 2 siblings, 0 replies; 6+ messages in thread From: Matthias Andree @ 2003-08-12 21:12 UTC (permalink / raw) Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes: > [ taking this issue from bugs@gnus.org to ding ] > > On Tue, Aug 12 2003, Dan Jacobson wrote: > | Newsgroups: gnus.gnus-bug > | Date: Tue Aug 12 06:09:50 2003 +0200 > | Message-ID: <87ptjb7bzl.fsf@jidanni.org> > >> gnus-article-treat-dumbquotes docstring doesn't mention it can fix the >> Euro symbol too... > > I agree with Dan that the doc-string should mention that it also > covers not only quotations, but also other characters (EUR, ...). > I would like to fix the doc-string and the manual entry. It seems to > me that nowadays, the manual entry is not accurate anymore: Well, this function should also get a name that's not so easy to get wrong. The dumbquotes stuff aims to fix misdeclared characters (outside the declared character set), so it deals with quote MARKS -- but Microsoft software also gets indenting quotation wrong. So how about renaming that to gnus-article-treat-quotemarks and > - What does "unilateral extension to the character map" mean? Which > character map? Latin1 (= iso-8859-1)? This is irrelevant. Elide it. > - We should mention \200 (-> EUR), as it's probably the most frequent > `dumb-quote' now (at least in Europe). Speaking of Europe and the Euro zone: Does Greek suffer from the same problem in real life, i. e. ISO-8859-7 misdeclared for Windows-1253 content? 1253 also has the Euro sign (€) at \200 = \x80. Microsoft on this, in their Euro FAQ: "Q What is the symbol's Windows codepage location? A The symbol has been added to the following codepages at position '0x80'; 1250 Eastern European, 1252 Western, 1253 Greek, 1254 Turkish, 1257 Baltic, 1255 Hebrew, 1256 Arabic, 1258 Vietnamese, 874 Thai. In 1251 Cyrillic the symbol will be added at position '0x88'. Other codepages are controlled by governments or standards bodies. Microsoft is working with these organizations on the placement of the euro." (http://www.microsoft.com/typography/faq/faq12.htm) > - I guess most (or all?) chars from `gnus-article-dumbquotes-map' are > `windows-1252' characters, not present in Latin-1. `windows-1252' > is a registered[1] charset. So the real problem is that some > Windows clients send `windows-1252' labeled as `iso-8859-1'. True for anything misdeclared I've found so far. -- Matthias Andree ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: documentation of gnus-article-treat-dumbquotes outdated? 2003-08-12 20:09 ` documentation of gnus-article-treat-dumbquotes outdated? Reiner Steib 2003-08-12 21:12 ` Matthias Andree @ 2003-08-13 15:27 ` Benjamin Riefenstahl 2003-08-13 16:46 ` Reiner Steib 2003-08-18 2:00 ` James H. Cloos Jr. 2 siblings, 1 reply; 6+ messages in thread From: Benjamin Riefenstahl @ 2003-08-13 15:27 UTC (permalink / raw) Cc: ding, Dan Jacobson Hi Reiner, Dan, Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes: > - I guess most (or all?) chars from `gnus-article-dumbquotes-map' are > `windows-1252' characters, not present in Latin-1. `windows-1252' > is a registered[1] charset. So the real problem is that some > Windows clients send `windows-1252' labeled as `iso-8859-1'. See http://mail.gnu.org/archive/html/emacs-devel/2003-04/msg00177.html and the discussion around that. I than wrote: > That's almost always cp1252, often mislabeled as iso-8859-1 or > us-ascii, or even unlabeled. I use these settings in GNUS to cope: > > (setq gnus-newsgroup-ignored-charsets > '(unknown-8bit x-unknown us-ascii iso-8859-1)) > (setq gnus-default-charset 'cp1252) > > I.e. I disable using the MIME parameters for us-ascii and iso-8859-1 > and set the default to cp1252 instead. This has worked fine so far > for me. Real us-ascii or iso-8859-1 messages don't have a problem > with this, as cp1252 is a proper superset of iso-8859-1. benny ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: documentation of gnus-article-treat-dumbquotes outdated? 2003-08-13 15:27 ` Benjamin Riefenstahl @ 2003-08-13 16:46 ` Reiner Steib 2003-08-13 18:12 ` Benjamin Riefenstahl 0 siblings, 1 reply; 6+ messages in thread From: Reiner Steib @ 2003-08-13 16:46 UTC (permalink / raw) On Wed, Aug 13 2003, Benjamin Riefenstahl wrote: > Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes: >> - I guess most (or all?) chars from `gnus-article-dumbquotes-map' are >> `windows-1252' characters, not present in Latin-1. `windows-1252' >> is a registered[1] charset. So the real problem is that some >> Windows clients send `windows-1252' labeled as `iso-8859-1'. [ http://mail.gnu.org/archive/html/emacs-devel/2003-04/msg00177.html ] > I than wrote: [...] >> (setq gnus-newsgroup-ignored-charsets >> '(unknown-8bit x-unknown us-ascii iso-8859-1)) >> (setq gnus-default-charset 'cp1252) >> >> I.e. I disable using the MIME parameters for us-ascii and iso-8859-1 >> and set the default to cp1252 instead. This has worked fine so far >> for me. Real us-ascii or iso-8859-1 messages don't have a problem >> with this, as cp1252 is a proper superset of iso-8859-1. I understand that this works (as long as "un-/mis-labeled" doesn't include other charsets). Could you elaborate what you suggest WRT the documentation of gnus-article-treat-dumbquotes? Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- PGP key available via WWW http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: documentation of gnus-article-treat-dumbquotes outdated? 2003-08-13 16:46 ` Reiner Steib @ 2003-08-13 18:12 ` Benjamin Riefenstahl 0 siblings, 0 replies; 6+ messages in thread From: Benjamin Riefenstahl @ 2003-08-13 18:12 UTC (permalink / raw) Cc: Reiner Steib Hi Reiner, Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes: > I understand that this works (as long as "un-/mis-labeled" doesn't > include other charsets). Could you elaborate what you suggest WRT > the documentation of gnus-article-treat-dumbquotes? Sorry, this was more of a drive-by response. I had solved the underlying issue in a different way than gnus-article-treat-dumbquotes or the solution in that discussion that I mentioned which was actually about rmail. I think that somehow patching/washing Latin-1 text is not an adequate solution, when what we have is actually a problem of mis-labeling. Correcting the label seems the right solution to me. After that of course comes the separate question of how to represent the result. If you are on a latin-1-only text terminal or you don't have the fonts, you would need to represent the characters involved by approximation, e.g. by using the display table. In effect, if it was just me, I'd devise a solution along those lines and deprecate things like gnus-article-treat-dumbquotes altogether. Your mentioning of `1 g windows-1252 RET' goes into the same direction. This could be simplyfied if we had a customizable variable that has a selection of language environments and that sets the variables gnus-newsgroup-ignored-charsets and gnus-default-charset accordingly. Or it could even be automatic with setting gnus-default-charset. I.e. when I do (setq gnus-default-charset 'windows-1252), us-ascii and iso-8859-1 could be added to gnus-newsgroup-ignored-charsets automatically. > Could you elaborate what you suggest WRT the documentation of > gnus-article-treat-dumbquotes? On a more "constructive" note: > - What does "unilateral extension to the character map" mean? Which > character map? Latin1 (= iso-8859-1)? That's what it does, see gnus-article-dumbquotes-map. > - I guess most (or all?) chars from `gnus-article-dumbquotes-map' > are `windows-1252' characters, not present in Latin-1. Actually my version of gnus-article-dumbquotes-map (Gnus 5.10.1) also includes "\264" (replacing it with "'") which is a regular Latin-1 character, I believe. Not sure why, possibly for symmetry. > - A better alternative to `W d' in the development version of Emacs > (CVS HEAD) is `1 g windows-1252 RET' after `(require "code-pages")'. > Should we mentions this? > ,----[ (info "(gnus)Article Washing") ] > | `W d' > | Treat M****s*** sm*rtq**t*s according to > | `gnus-article-dumbquotes-map' (`gnus-article-treat-dumbquotes'). > | Note that this function guesses whether a character is a > | sm*rtq**t* or not, so it should only be used interactively. > | > | Sm*rtq**t*s are M****s***'s unilateral extension to the character > | map in an attempt to provide more quoting characters. If you see > | something like `\222' or `\264' where you're expecting some kind of > | apostrophe or quotation mark, then try this wash. > `---- How about: Replace characters in the article according to `gnus-article-dumbquotes-map' (`gnus-article-treat-dumbquotes'). This function is intended for articles that show `\222' or `\264' where you're expecting some kind of apostrophe or quotation mark, or `\264' where you expect a Euro sign or similar breakage. Using the default settings for `gnus-article-dumbquotes-map' the function will replace characters from Microsoft's extension to the ISO-8859-1 charset with ASCII characters. This means e.g. that curly-quote characters will be replaced with ASCII quotes and the Euro sign will be replaced with the string "EUR". Note that on displays that support it, `1 g windows-1252 RET' (as a one-shot) or customizing `gnus-default-charset' to `windows-1252' (as a permanent solution) is a better way of handling the problem. I'm probably not fully adhering to the rules for Emacs documentation strings here ;-). The last sentence assumes the implementation I mention above. I'm not sure how and when to mention (require 'code-pages). Ideally code-pages.el should be autoloaded or preloaded IMO. I hope that the extension for other language environments is obvious to a user, otherwise that topic should probably be expanded in the info docs. benny ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: documentation of gnus-article-treat-dumbquotes outdated? 2003-08-12 20:09 ` documentation of gnus-article-treat-dumbquotes outdated? Reiner Steib 2003-08-12 21:12 ` Matthias Andree 2003-08-13 15:27 ` Benjamin Riefenstahl @ 2003-08-18 2:00 ` James H. Cloos Jr. 2 siblings, 0 replies; 6+ messages in thread From: James H. Cloos Jr. @ 2003-08-18 2:00 UTC (permalink / raw) Yes, gnus should start treating the doze charsets as actual charsets, and provide a function that redisplays messages that were tagged as latin1 et al under the corresponding doze charset. Then that function could replace gnus-article-treat-dumbquotes in the washing map. Incidently, as I primarily use utf8, I setq gnus-article-dumbquotes-map to this in my ~/.gnus: (probably won’t look right it you cannot read utf8....) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (setq gnus-article-dumbquotes-map '( ("\200" "€") ("\202" "‚") ("\203" "ƒ") ("\204" "„") ("\205" "…") ("\206" "†") ("\207" "‡") ("\210" "ˆ") ("\211" "‰") ("\212" "Š") ("\213" "‹") ("\214" "Œ") ("\216" "Ž") ("\221" "‘") ("\222" "’") ("\223" "“") ("\224" "”") ("\225" "•") ("\226" "–") ("\227" "—") ("\230" "˜") ("\231" "™") ("\232" "š") ("\233" "›") ("\234" "œ") ("\236" "ž") ("\237" "Ÿ") ("\264" "´") )) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; That ensures that I see what the author intended. -JimC ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-08-18 2:00 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <87znigdhpa.fsf@jidanni.org> [not found] ` <v98yq07afl.fsf@marauder.physik.uni-ulm.de> 2003-08-12 20:09 ` documentation of gnus-article-treat-dumbquotes outdated? Reiner Steib 2003-08-12 21:12 ` Matthias Andree 2003-08-13 15:27 ` Benjamin Riefenstahl 2003-08-13 16:46 ` Reiner Steib 2003-08-13 18:12 ` Benjamin Riefenstahl 2003-08-18 2:00 ` James H. Cloos Jr.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).