* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 [not found] ` <jwv8wsnjknd.fsf-monnier+emacsbugreports@gnu.org> @ 2008-10-17 18:19 ` Reiner Steib 2008-10-17 18:36 ` Frank Schmitt 0 siblings, 1 reply; 21+ messages in thread From: Reiner Steib @ 2008-10-17 18:19 UTC (permalink / raw) To: Stefan Monnier; +Cc: Simon Josefsson, ding, 1174, Frank Schmitt On Fri, Oct 17 2008, Stefan Monnier wrote: >> ;; BEWARE: we used to use string-as-multibyte here which is braindead >> ;; because it will turn accidental emacs-mule-valid byte sequences >> ;; into multibyte chars. --Stef >> ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be >> ;; that bad. --Simon > > Who's this Simon who reverted my change without even explaining why? The one who's listed as the author of nnimap.el (cc-ed). >> which is called at several places. And this breaks it. If I change >> this function so that string is not changed, my mails are displayed >> correctly. Does it work correctly when using Stefan's version? ( s/string-as-multibyte/string-to-multibyte/ ...) (defun nnimap-demule (string) ;; BEWARE: we used to use string-as-multibyte here which is braindead ;; because it will turn accidental emacs-mule-valid byte sequences ;; into multibyte chars. --Stef (funcall (if (and (fboundp 'string-to-multibyte) (subrp (symbol-function 'string-to-multibyte))) 'string-to-multibyte 'identity) (or string ""))) Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-10-17 18:19 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Reiner Steib @ 2008-10-17 18:36 ` Frank Schmitt 2008-11-29 12:08 ` Reiner Steib 0 siblings, 1 reply; 21+ messages in thread From: Frank Schmitt @ 2008-10-17 18:36 UTC (permalink / raw) To: Stefan Monnier; +Cc: Simon Josefsson, ding, 1174 Reiner Steib <reinersteib+gmane@imap.cc> writes: > On Fri, Oct 17 2008, Stefan Monnier wrote: > >>> ;; BEWARE: we used to use string-as-multibyte here which is braindead >>> ;; because it will turn accidental emacs-mule-valid byte sequences >>> ;; into multibyte chars. --Stef >>> ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be >>> ;; that bad. --Simon >> >> Who's this Simon who reverted my change without even explaining why? > > The one who's listed as the author of nnimap.el (cc-ed). > >>> which is called at several places. And this breaks it. If I change >>> this function so that string is not changed, my mails are displayed >>> correctly. > > Does it work correctly when using Stefan's version? > ( s/string-as-multibyte/string-to-multibyte/ ...) > > (defun nnimap-demule (string) > ;; BEWARE: we used to use string-as-multibyte here which is braindead > ;; because it will turn accidental emacs-mule-valid byte sequences > ;; into multibyte chars. --Stef > (funcall (if (and (fboundp 'string-to-multibyte) > (subrp (symbol-function 'string-to-multibyte))) > 'string-to-multibyte > 'identity) > (or string ""))) Yes, it does. Frank -- Have you ever considered how much text can fit in eighty columns? Given that a signature typically contains up to four lines of text, this space allows you to attach a tremendous amount of valuable information to your messages. Seize the opportunity and don't waste your signature on bullshit that nobody cares about. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-10-17 18:36 ` Frank Schmitt @ 2008-11-29 12:08 ` Reiner Steib 2008-11-29 12:18 ` Simon Josefsson 2008-12-01 21:04 ` Stefan Monnier 0 siblings, 2 replies; 21+ messages in thread From: Reiner Steib @ 2008-11-29 12:08 UTC (permalink / raw) To: Simon Josefsson; +Cc: Frank Schmitt, Stefan Monnier, ding, 1174 On Fri, Oct 17 2008, Frank Schmitt wrote: > Reiner Steib <reinersteib+gmane@imap.cc> writes: > >> On Fri, Oct 17 2008, Stefan Monnier wrote: >> >>>> ;; BEWARE: we used to use string-as-multibyte here which is braindead >>>> ;; because it will turn accidental emacs-mule-valid byte sequences >>>> ;; into multibyte chars. --Stef >>>> ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be >>>> ;; that bad. --Simon Simon, could you please clarify why you reverted Stefan's change in `nnimap-demule'? It breaks reading UTF-8 articles via nnimap.el in Emacs 23. >>> Who's this Simon who reverted my change without even explaining why? >> >> The one who's listed as the author of nnimap.el (cc-ed). >> >>>> which is called at several places. And this breaks it. If I change >>>> this function so that string is not changed, my mails are displayed >>>> correctly. >> >> Does it work correctly when using Stefan's version? >> ( s/string-as-multibyte/string-to-multibyte/ ...) >> >> (defun nnimap-demule (string) >> ;; BEWARE: we used to use string-as-multibyte here which is braindead >> ;; because it will turn accidental emacs-mule-valid byte sequences >> ;; into multibyte chars. --Stef >> (funcall (if (and (fboundp 'string-to-multibyte) >> (subrp (symbol-function 'string-to-multibyte))) >> 'string-to-multibyte >> 'identity) >> (or string ""))) > > Yes, it does. Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-11-29 12:08 ` Reiner Steib @ 2008-11-29 12:18 ` Simon Josefsson 2008-11-29 15:30 ` Reiner Steib 2009-01-12 10:54 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Simon Josefsson 2008-12-01 21:04 ` Stefan Monnier 1 sibling, 2 replies; 21+ messages in thread From: Simon Josefsson @ 2008-11-29 12:18 UTC (permalink / raw) To: Frank Schmitt; +Cc: Stefan Monnier, ding, 1174 Reiner Steib <reinersteib+gmane@imap.cc> writes: > On Fri, Oct 17 2008, Frank Schmitt wrote: > >> Reiner Steib <reinersteib+gmane@imap.cc> writes: >> >>> On Fri, Oct 17 2008, Stefan Monnier wrote: >>> >>>>> ;; BEWARE: we used to use string-as-multibyte here which is braindead >>>>> ;; because it will turn accidental emacs-mule-valid byte sequences >>>>> ;; into multibyte chars. --Stef >>>>> ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be >>>>> ;; that bad. --Simon > > Simon, could you please clarify why you reverted Stefan's change in > `nnimap-demule'? It breaks reading UTF-8 articles via nnimap.el in > Emacs 23. I don't recall, but people should notice relatively quickly if there are problems in this area (wrong display of non-ascii) so feel free to revert the patch or apply another patch instead. It needs to be tested under Emacs 22 too, though, if it is installed in the Gnus CVS. /Simon ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-11-29 12:18 ` Simon Josefsson @ 2008-11-29 15:30 ` Reiner Steib 2008-11-29 21:30 ` Stefan Monnier 2008-11-29 22:14 ` James Cloos 2009-01-12 10:54 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Simon Josefsson 1 sibling, 2 replies; 21+ messages in thread From: Reiner Steib @ 2008-11-29 15:30 UTC (permalink / raw) To: Simon Josefsson Cc: Frank Schmitt, James Cloos, Stefan Monnier, ding, 1174, Clemens Schueller On Sat, Nov 29 2008, Simon Josefsson wrote: > Reiner Steib <reinersteib+gmane@imap.cc> writes: >>>>>> ;; BEWARE: we used to use string-as-multibyte here which is braindead >>>>>> ;; because it will turn accidental emacs-mule-valid byte sequences >>>>>> ;; into multibyte chars. --Stef >>>>>> ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be >>>>>> ;; that bad. --Simon >> >> Simon, could you please clarify why you reverted Stefan's change in >> `nnimap-demule'? It breaks reading UTF-8 articles via nnimap.el in >> Emacs 23. > > I don't recall, but people should notice relatively quickly if there are > problems in this area (wrong display of non-ascii) Hm, both changes happened 2004, but I don't recall any reports beside the recent ones in 2008: http://thread.gmane.org/gmane.emacs.gnus.general/67112 (bug#464, reported by James Cloos) http://thread.gmane.org/gmane.emacs.bugs/21524 (bug#1174, reported by Frank Schmitt) ---------------------------- revision 7.9 date: 2004-09-13 13:52:48 +0200; author: jas; state: Exp; lines: +5 -3 (nnimap-demule): Revert 2004-08-30 change. ---------------------------- > so feel free to revert the patch or apply another patch instead. It > needs to be tested under Emacs 22 too, though, I never saw this problem myself. I cannot see any difference with a few UTF-8 articles (C-T-E: 8bit, [1]), neither with Emacs 22 (with current Gnus trunk) or Emacs trunk (Gnus 5.13 from there). Stefan, what are the "accidental emacs-mule-valid byte sequences" that trigger this problem? It would be good if someone could send me a problematic article. [2] I've just checked in (Gnus and Emacs) some code to debug this problem. I'd like to ask those who saw the bug in Emacs 23 to test the articles in question: - With current Emacs 23 (Emacs CVS trunk) - With Emacs 22 plus current Gnus CVS trunk (No Gnus) - If you see wrong display: does it display correctly after evaluating the following: M-x gnus-backlog-shutdown RET (setq nnimap-demule-use-string-to-multibyte nil gnus-verbose 10) Check the *Messages* buffer for messages "nnimap-demule-use-string-to-multibyte: nil" to ensure that the article is decoded again with this setting. You may need to re-enter the group. > if it is installed in the Gnus CVS. I don't want different code in Gnus and Emacs. If all else fails, we can make it conditional. Bye, Reiner. [1] Cc-s of the following articles (available on Gmane) from Aidan Kehoe: <18492.30425.377545.700503@parhasard.net> <18518.43672.183610.662699@parhasard.net> <18712.43474.265690.792714@parhasard.net> Non-ascii characters in the attribution line ("scríobh"), the signature ("¿Dónde estará ahora mi sobrino Yoghurtu Nghé ..."), and "’", "İ", "ı" [2] (push '(utf-8 . 8bit) mm-body-charset-encoding-alist) --> trying to produce a problematic article here: AE-Ä OE-Ö UE-Ü ae-ä oe-ö ue-ü ss-ß Should be sent with: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-11-29 15:30 ` Reiner Steib @ 2008-11-29 21:30 ` Stefan Monnier 2008-11-30 13:12 ` Reiner Steib 2008-11-29 22:14 ` James Cloos 1 sibling, 1 reply; 21+ messages in thread From: Stefan Monnier @ 2008-11-29 21:30 UTC (permalink / raw) To: Reiner Steib Cc: Simon Josefsson, Frank Schmitt, James Cloos, ding, 1174, Clemens Schueller > I never saw this problem myself. I cannot see any difference with a > few UTF-8 articles (C-T-E: 8bit, [1]), neither with Emacs 22 (with > current Gnus trunk) or Emacs trunk (Gnus 5.13 from there). Stefan, > what are the "accidental emacs-mule-valid byte sequences" that trigger > this problem? It would be good if someone could send me a problematic > article. [2] In Emacs-22, the problem was more difficult to trigger: you had to receive an email whose undecoded text contained emacs-mule escape sequences, which is rather uncommon. With Emacs-23, it's a lot more common since the internal encoding has changed to a variant of utf-8: an 8bit body using utf-8 will see its content unwillingly decoded during nnimap-demule which leads to the bugs we've seen recently. I'm pretty sure that string-as-multibyte is wrong here in general. Maybe the problem is that nnimap-demule is used blindly in different contexts where some need string-to-multibyte and some need string-as-multibyte. E.g. maybe Simon's problem was linked to imap groups with non-ASCII chars in their names, rather than in the message bodies. Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-11-29 21:30 ` Stefan Monnier @ 2008-11-30 13:12 ` Reiner Steib 0 siblings, 0 replies; 21+ messages in thread From: Reiner Steib @ 2008-11-30 13:12 UTC (permalink / raw) To: Stefan Monnier Cc: Simon Josefsson, Frank Schmitt, James Cloos, ding, 1174, Clemens Schueller On Sat, Nov 29 2008, Stefan Monnier wrote: >> Stefan, what are the "accidental emacs-mule-valid byte sequences" >> that trigger this problem? It would be good if someone could send >> me a problematic article. [2] > > In Emacs-22, the problem was more difficult to trigger: you had to > receive an email whose undecoded text contained emacs-mule escape > sequences, which is rather uncommon. With Emacs-23, it's a lot more > common since the internal encoding has changed to a variant of utf-8: > an 8bit body using utf-8 will see its content unwillingly decoded during > nnimap-demule which leads to the bugs we've seen recently. Could you send me an article demonstrating the problem? > I'm pretty sure that string-as-multibyte is wrong here in general. > Maybe the problem is that nnimap-demule is used blindly in different > contexts where some need string-to-multibyte and some need > string-as-multibyte. E.g. maybe Simon's problem was linked to imap > groups with non-ASCII chars in their names, rather than in the > message bodies. I'm not familiar with the IMAP code, but AFAICS, `nnimap-demule' is only used when getting headers or body: | nnimap.el:611: headers (nnimap-demule (defun nnimap-retrieve-headers-progress () "Hook to insert NOV line for current article into `nntp-server-buffer'." | nnimap.el:951: (nnimap-demule (defun nnimap-callback (article gnus-callback buffer) (when (eq article (imap-current-message)) (remove-hook 'imap-fetch-data-hook (nnimap-make-callback article gnus-callback buffer)) (with-current-buffer buffer (insert (with-current-buffer nnimap-server-buffer (nnimap-demule (if (imap-capability 'IMAP4rev1) ;; xxx don't just use car? alist doesn't contain ;; anything else now, but it might... (nth 2 (car (imap-message-get article 'BODYDETAIL))) (imap-message-get article 'RFC822))))) (nnheader-ms-strip-cr) (funcall gnus-callback t)))) | nnimap.el:977: (insert (nnimap-demule (if detail (defun nnimap-request-article-part (article part prop &optional group server to-buffer detail) Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-11-29 15:30 ` Reiner Steib 2008-11-29 21:30 ` Stefan Monnier @ 2008-11-29 22:14 ` James Cloos 2008-11-30 13:11 ` View articles with different charset (was: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23) Reiner Steib 1 sibling, 1 reply; 21+ messages in thread From: James Cloos @ 2008-11-29 22:14 UTC (permalink / raw) To: Reiner Steib Cc: Simon Josefsson, Frank Schmitt, Stefan Monnier, ding, 1174, Clemens Schueller The easiest test case are utf-8 emails which are 8bit mime and which fail to specify their charset, or specify utf-8 in some incorrect way. The most common example may be commit messgages; the scripts usually do not specify a charset since they commit messages and src could be in any encoding and they send them out as is. NB that this bug never showed up for me when using the unicode-2 branch before that was pulled into mainline. Some unibyte vs multibyte change was made after that happened which started triggering the bug. A possibly related issue is that (gnus-article-treat-dumbquotes) also does not work for me anymore. I do specify my own gnus-article-dumbquotes-map in ~/.gnus, but the syntax is the same as what is still (defvar)ed in gnus-art.el. (I use utf-8 results rather than ascii fallbacks.) The symptom is that the strings such as "\221" do not match the relevant octets in the *Article* buffer, as they used to do. This also stopped working at the same time. My goal would be for gnus to treat messages and mime blocks w/o a charset as utf-8 rather than ascii by default. That does work for qp and base64, just not for 8bitmime. Unless Stefan's patch is applied. But it also must be easy to manually tell it to use some other charset when one recognizes the need. I can use (gnus-article-view-part-as-charset) for a mime part (at least w/ Stefan's patch) but haven't managed to make that work for a message w/o any attachments or inlines. I'm running w/ Stefan's patch now and it works. I'll try a new compile tonight or tomorrow and test the committed code. -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6 ^ permalink raw reply [flat|nested] 21+ messages in thread
* View articles with different charset (was: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23) 2008-11-29 22:14 ` James Cloos @ 2008-11-30 13:11 ` Reiner Steib 2008-11-30 21:23 ` View articles with different charset James Cloos 0 siblings, 1 reply; 21+ messages in thread From: Reiner Steib @ 2008-11-30 13:11 UTC (permalink / raw) To: James Cloos; +Cc: ding [ Stripping Cc-s since this isn't directly related to the bug. ] On Sat, Nov 29 2008, James Cloos wrote: > But it also must be easy to manually tell it to use some other charset > when one recognizes the need. I can use (gnus-article-view-part-as-charset) > for a mime part (at least w/ Stefan's patch) but haven't managed to make > that work for a message w/o any attachments or inlines. Does this help? ,----[ (info "(gnus)Paging the Article") ] | `A g' | `g' | (Re)fetch the current article (`gnus-summary-show-article'). If | given a prefix, fetch the current article, but don't run any of | the article treatment functions. This will give you a "raw" | article, just the way it came from the server. | | If given a numerical prefix, you can do semi-manual charset | stuff. `C-u 0 g cn-gb-2312 RET' will decode the message as if it | were encoded in the `cn-gb-2312' charset. If you have | | (setq gnus-summary-show-article-charset-alist | '((1 . cn-gb-2312) | (2 . big5))) | | then you can say `C-u 1 g' to get the same effect. `---- There should be a better index entry here. Any ideas for useful index entries? @cindex charset, view article with different charset @cindex view article with different charset Maybe we should consider to specify a non-nil default for `gnus-summary-show-article-charset-alist' [1]. Bye, Reiner. [1] I use... ,----[ <f1> v gnus-summary-show-article-charset-alist RET ] | gnus-summary-show-article-charset-alist is a variable defined in | `gnus-sum.el'. Its value is | ((12 . windows-1252) | (0 . iso-8859-15) | (8 . utf-8)) | | Documentation: | Alist of number and charset. | The article will be shown with the charset corresponding to the | numbered argument. | For example: ((1 . cn-gb-2312) (2 . big5)). | | You can customize this variable. `---- -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: View articles with different charset 2008-11-30 13:11 ` View articles with different charset (was: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23) Reiner Steib @ 2008-11-30 21:23 ` James Cloos 0 siblings, 0 replies; 21+ messages in thread From: James Cloos @ 2008-11-30 21:23 UTC (permalink / raw) To: ding; +Cc: Reiner Steib >>>>> "Reiner" == Reiner Steib <reinersteib+gmane@imap.cc> writes: >> But it also must be easy to manually tell it to use some other charset >> when one recognizes the need. Reiner> Does this help? Reiner> ,----[ (info "(gnus)Paging the Article") ] Reiner> | If given a numerical prefix, you can do semi-manual charset stuff Damn. I don't remember reading that. But see below. Reiner> There should be a better index entry here. Any ideas for useful Reiner> index entries? Reiner> @cindex charset, view article with different charset Reiner> @cindex view article with different charset That looks helpful. Reiner> Maybe we should consider to specify a non-nil default for Reiner> `gnus-summary-show-article-charset-alist' [1]. I do have that set to ((1 . utf-8)). I don't remember doing so, and it was done via customize rather than ~/.gnus, so I don't have a RCS log to see when I did it. But it was presumably for using g with a prefix. Too bad I forgot all about it. Actually, I now do remember doing that. I gave up because, w/o Stefan's patch, it didn't do any good for the bug at hand. The 8bit unclassified utf-8 still wouldn't render. (And I didn't know about said patch until recently.) -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6 ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-11-29 12:18 ` Simon Josefsson 2008-11-29 15:30 ` Reiner Steib @ 2009-01-12 10:54 ` Simon Josefsson 2009-01-12 11:03 ` Frank Schmitt 1 sibling, 1 reply; 21+ messages in thread From: Simon Josefsson @ 2009-01-12 10:54 UTC (permalink / raw) To: Frank Schmitt; +Cc: 1174, ding I have recently upgraded my Gnus installation, and it included this fix. Now every e-mail I send has non-ASCII characters pre-fixed with \201. So the patch installed does not seem to be the right, or there is something else wrong with my configuration. Can anyone else reproduce this? I'm including 'åäö' in this e-mail for debugging. I'll see if I can debug this further, and find the exact part of the patch that cause the problem. /Simon Simon Josefsson <simon@josefsson.org> writes: > Reiner Steib <reinersteib+gmane@imap.cc> writes: > >> On Fri, Oct 17 2008, Frank Schmitt wrote: >> >>> Reiner Steib <reinersteib+gmane@imap.cc> writes: >>> >>>> On Fri, Oct 17 2008, Stefan Monnier wrote: >>>> >>>>>> ;; BEWARE: we used to use string-as-multibyte here which is braindead >>>>>> ;; because it will turn accidental emacs-mule-valid byte sequences >>>>>> ;; into multibyte chars. --Stef >>>>>> ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be >>>>>> ;; that bad. --Simon >> >> Simon, could you please clarify why you reverted Stefan's change in >> `nnimap-demule'? It breaks reading UTF-8 articles via nnimap.el in >> Emacs 23. > > I don't recall, but people should notice relatively quickly if there are > problems in this area (wrong display of non-ascii) so feel free to > revert the patch or apply another patch instead. It needs to be tested > under Emacs 22 too, though, if it is installed in the Gnus CVS. > > /Simon ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2009-01-12 10:54 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Simon Josefsson @ 2009-01-12 11:03 ` Frank Schmitt 2009-01-12 11:10 ` Simon Josefsson 0 siblings, 1 reply; 21+ messages in thread From: Frank Schmitt @ 2009-01-12 11:03 UTC (permalink / raw) To: Simon Josefsson; +Cc: Stefan Monnier, ding, 1174 Simon Josefsson <simon@josefsson.org> writes: > I have recently upgraded my Gnus installation, and it included this fix. > Now every e-mail I send has non-ASCII characters pre-fixed with \201. > So the patch installed does not seem to be the right, or there is > something else wrong with my configuration. Can anyone else reproduce > this? I'm including '\201å\201ä\201ö' in this e-mail for debugging. > > I'll see if I can debug this further, and find the exact part of the > patch that cause the problem. I see the \201 in your mail but for me everything is send and displayed perfectly since the patch was applied. Maybe the difference is that you are using Emacs 22? äöüß -- Have you ever considered how much text can fit in eighty columns? Given that a signature typically contains up to four lines of text, this space allows you to attach a tremendous amount of valuable information to your messages. Seize the opportunity and don't waste your signature on bullshit that nobody cares about. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2009-01-12 11:03 ` Frank Schmitt @ 2009-01-12 11:10 ` Simon Josefsson 2009-01-14 11:20 ` Gabor Z. Papp 0 siblings, 1 reply; 21+ messages in thread From: Simon Josefsson @ 2009-01-12 11:10 UTC (permalink / raw) To: Frank Schmitt; +Cc: Stefan Monnier, ding, 1174 Frank Schmitt <ich@frank-schmitt.net> writes: > Simon Josefsson <simon@josefsson.org> writes: > >> I have recently upgraded my Gnus installation, and it included this fix. >> Now every e-mail I send has non-ASCII characters pre-fixed with \201. >> So the patch installed does not seem to be the right, or there is >> something else wrong with my configuration. Can anyone else reproduce >> this? I'm including '\201å\201ä\201ö' in this e-mail for debugging. >> >> I'll see if I can debug this further, and find the exact part of the >> patch that cause the problem. > > I see the \201 in your mail but for me everything is send and displayed > perfectly since the patch was applied. Maybe the difference is that you > are using Emacs 22? äöüß I'm testing Emacs 23 now and everything works fine. So it seems the problem only occurs in Emacs 22. However, it seems the problem happens on _sending_ which confuses me. Maybe the old patch just worked around the problem, and made \201 in incoming e-mail disappear? However, I think I would have noticed this before anyway, because all my posts to mailing lists and news also has \201 before any non-ascii character now (with Emacs 22) but I don't think that has been the case before applying this patch. /Simon ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2009-01-12 11:10 ` Simon Josefsson @ 2009-01-14 11:20 ` Gabor Z. Papp 2009-01-14 20:12 ` Reiner Steib 0 siblings, 1 reply; 21+ messages in thread From: Gabor Z. Papp @ 2009-01-14 11:20 UTC (permalink / raw) To: Simon Josefsson; +Cc: ding * Simon Josefsson <simon@josefsson.org>: | I'm testing Emacs 23 now and everything works fine. So it seems the | problem only occurs in Emacs 22. Same problem here using Emacs 22. | However, it seems the problem happens on _sending_ which confuses | me. Exactly. Should I upgrade to Emacs 23 or downgrade gnus cvs to an earlier snapshot? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2009-01-14 11:20 ` Gabor Z. Papp @ 2009-01-14 20:12 ` Reiner Steib 2009-01-14 22:08 ` Stefan Monnier 0 siblings, 1 reply; 21+ messages in thread From: Reiner Steib @ 2009-01-14 20:12 UTC (permalink / raw) To: Gabor Z. Papp, Simon Josefsson; +Cc: Stefan Monnier, ding, 1174, Frank Schmitt reopen 1174 quit ----- end of commands for control@emacsbugs, bcc-ed ----- [ Gabor, please don't drop the other recipients, especially the bug data base 1174@emacsbugs.donarmstrong.com ] On Wed, Jan 14 2009, Gabor Z. Papp wrote: > * Simon Josefsson <simon@josefsson.org>: > > | I'm testing Emacs 23 now and everything works fine. So it seems the > | problem only occurs in Emacs 22. > > Same problem here using Emacs 22. > > | However, it seems the problem happens on _sending_ which confuses > | me. > > Exactly. > > Should I upgrade to Emacs 23 or downgrade gnus cvs to an earlier snapshot? Neither one. We need to find a solution that works in all supported Emacs versions. Stefan, could you please suggest a fix or give some advice how to debug this problem? Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2009-01-14 20:12 ` Reiner Steib @ 2009-01-14 22:08 ` Stefan Monnier 0 siblings, 0 replies; 21+ messages in thread From: Stefan Monnier @ 2009-01-14 22:08 UTC (permalink / raw) To: Reiner Steib; +Cc: Gabor Z. Papp, Simon Josefsson, ding, 1174, Frank Schmitt > Stefan, could you please suggest a fix or give some advice how to > debug this problem? I don't know where the problem is, so I can't suggest a fix. I guess what Simon should do is: - make sure he can reliably reproduce the problem. - change imap-demule to string-as-multibyte, make sure that removes the problem. - then try and figure out why/where is imap-demule used in such a way as to cause problem. I must say I can't understand why imap-demule would be called on the sending side, Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-11-29 12:08 ` Reiner Steib 2008-11-29 12:18 ` Simon Josefsson @ 2008-12-01 21:04 ` Stefan Monnier 2008-12-01 22:48 ` Reiner Steib 1 sibling, 1 reply; 21+ messages in thread From: Stefan Monnier @ 2008-12-01 21:04 UTC (permalink / raw) To: Simon Josefsson; +Cc: Frank Schmitt, ding, 1174 > Simon, could you please clarify why you reverted Stefan's change in > `nnimap-demule'? It breaks reading UTF-8 articles via nnimap.el in > Emacs 23. Having looked at the code again, I'm more than ever confident that string-to-unibyte is the right thing to use. Maybe the code I installed back then failed to fallback to string-as-unibyte when string-to-unibyte was not available, which caused a bug for Simon? In any case the newly committed code has a prenthesis typo that makes it still use the old code and ignore the new config var nnimap-demule-use-string-to-multibyte. Also I recommend to just use the patch below instead. The first hunk removes an unnecessary use of nnimap-demule since the output will be inserted into a unibyte buffer. Stefan --- nnimap.el.~1.50.~ 2008-12-01 15:38:55.000000000 -0500 +++ nnimap.el 2008-12-01 15:49:53.000000000 -0500 @@ -608,12 +608,11 @@ (with-current-buffer nnimap-server-buffer (setq uid imap-current-message mbx imap-current-mailbox - headers (nnimap-demule - (if (imap-capability 'IMAP4rev1) + headers (if (imap-capability 'IMAP4rev1) ;; xxx don't just use car? alist doesn't contain ;; anything else now, but it might... (nth 2 (car (imap-message-get uid 'BODYDETAIL))) - (imap-message-get uid 'RFC822.HEADER))) + (imap-message-get uid 'RFC822.HEADER)) lines (imap-body-lines (imap-message-body imap-current-message)) chars (imap-message-get imap-current-message 'RFC822.SIZE))) (nnheader-insert-nov @@ -901,40 +900,17 @@ (when (nnimap-possibly-change-server server) (nnoo-status-message 'nnimap server))) -(defvar nnimap-demule-use-string-to-multibyte (fboundp 'string-to-multibyte) - "Temporary internal debug variable. -If you have problems (UTF-8 not decoded correctly on IMAP) with -the default value, please report it as a bug!") -;; FIXME: Clarify if we need to make this variable conditional on the Emacs -;; version (Emacs 22 vs. Emacs 23;Emacs 21 doesn't have `string-to-multibyte' -;; anyhow). --rsteib -;; -;; http://thread.gmane.org/gmane.emacs.gnus.general/67112 -;; (bug#464, reported by James Cloos) -;; http://thread.gmane.org/gmane.emacs.bugs/21524 -;; (bug#1174, reported by Frank Schmitt) - -(defun nnimap-demule (string) - ;; BEWARE: we used to use string-as-multibyte here which is braindead - ;; because it will turn accidental emacs-mule-valid byte sequences - ;; into multibyte chars. --Stef - ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be - ;; that bad. --Simon - (gnus-message 9 "nnimap-demule-use-string-to-multibyte: %s" - nnimap-demule-use-string-to-multibyte) - (if nnimap-demule-use-string-to-multibyte - ;; Stefan - (funcall (if (and (fboundp 'string-to-multibyte) - (subrp (symbol-function 'string-to-multibyte))) - 'string-to-multibyte - 'identity) - (or string ""))) - ;; Simon - (funcall (if (and (fboundp 'string-as-multibyte) - (subrp (symbol-function 'string-as-multibyte))) - 'string-as-multibyte - 'identity) - (or string ""))) +;; We used to use a string-as-multibyte here, but it is really incorrect. +;; This function is used when we're about to insert a unibyte string +;; into a potentially multibyte buffer. The string is either an article +;; header or body (or both?), undecoded. When Emacs is asked to convert +;; a unibyte string to multibyte, it may either use the equivalent of +;; nothing (e.g. non-Mule XEmacs), string-make-unibyte (i.e. decode using +;; locale), string-as-multibyte (decode using emacs-internal coding system) +;; or string-to-multibyte (keep the data undecoded as a sequence of bytes). +;; Only the last one preserves the data such that we can reliably later on +;; decode the text using the mime info. +(defalias 'nnimap-demule 'mm-string-to-multibyte) (defun nnimap-make-callback (article gnus-callback buffer) "Return a callback function." ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-12-01 21:04 ` Stefan Monnier @ 2008-12-01 22:48 ` Reiner Steib 2008-12-02 7:36 ` Stefan Monnier 0 siblings, 1 reply; 21+ messages in thread From: Reiner Steib @ 2008-12-01 22:48 UTC (permalink / raw) To: Stefan Monnier; +Cc: Simon Josefsson, Frank Schmitt, ding, 1174 On Mon, Dec 01 2008, Stefan Monnier wrote: > Having looked at the code again, I'm more than ever confident that > string-to-unibyte is the right thing to use. Maybe the code I installed > back then failed to fallback to string-as-unibyte when string-to-unibyte > was not available, which caused a bug for Simon? Yes, it didn't fall back to string-as-unibyte: --- nnimap.el 17 Aug 2004 14:27:16 -0000 7.7 +++ nnimap.el 30 Aug 2004 18:13:58 -0000 7.8 [...] @@ -845,9 +847,12 @@ (nnoo-status-message 'nnimap server))) (defun nnimap-demule (string) - (funcall (if (and (fboundp 'string-as-multibyte) - (subrp (symbol-function 'string-as-multibyte))) - 'string-as-multibyte + ;; BEWARE: we used to use string-as-multibyte here which is braindead + ;; because it will turn accidental emacs-mule-valid byte sequences + ;; into multibyte chars. --Stef + (funcall (if (and (fboundp 'string-to-multibyte) + (subrp (symbol-function 'string-to-multibyte))) + 'string-to-multibyte 'identity) (or string ""))) > In any case the newly committed code has a prenthesis typo that makes > it still use the old code and ignore the new config var > nnimap-demule-use-string-to-multibyte. Oops, stupid me. > Also I recommend to just use the patch below instead. The first hunk > removes an unnecessary use of nnimap-demule since the output will be > inserted into a unibyte buffer. Thanks for your analysis. Please install the patch. I'll pull it into Gnus CVS ASAP (unless Miles syncs first). > +;; We used to use a string-as-multibyte here, but it is really incorrect. > +;; This function is used when we're about to insert a unibyte string > +;; into a potentially multibyte buffer. The string is either an article > +;; header or body (or both?), undecoded. When Emacs is asked to convert > +;; a unibyte string to multibyte, it may either use the equivalent of > +;; nothing (e.g. non-Mule XEmacs), string-make-unibyte (i.e. decode using > +;; locale), string-as-multibyte (decode using emacs-internal coding system) > +;; or string-to-multibyte (keep the data undecoded as a sequence of bytes). > +;; Only the last one preserves the data such that we can reliably later on > +;; decode the text using the mime info. > +(defalias 'nnimap-demule 'mm-string-to-multibyte) In Emacs 21 (which Gnus still aim to be compatible with), we have string-as-multibyte, but not string-to-multibyte. So your proposed code (i.e. mm-string-to-multibyte) runs (string-as-multibyte (char-to-string string)) whereas we used to run (string-as-multibyte string) Does char-to-string matter here? (defalias 'mm-string-to-multibyte (cond ((featurep 'xemacs) 'identity) ((fboundp 'string-to-multibyte) 'string-to-multibyte) (t (lambda (string) "Return a multibyte string with the same individual chars as string." (mapconcat (lambda (ch) (mm-string-as-multibyte (char-to-string ch))) string ""))))) Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-12-01 22:48 ` Reiner Steib @ 2008-12-02 7:36 ` Stefan Monnier 2008-12-04 19:43 ` Reiner Steib 0 siblings, 1 reply; 21+ messages in thread From: Stefan Monnier @ 2008-12-02 7:36 UTC (permalink / raw) To: Simon Josefsson; +Cc: Frank Schmitt, ding, 1174 > In Emacs 21 (which Gnus still aim to be compatible with), we have > string-as-multibyte, but not string-to-multibyte. So your proposed > code (i.e. mm-string-to-multibyte) runs > (string-as-multibyte (char-to-string string)) > whereas we used to run > (string-as-multibyte string) > Does char-to-string matter here? > (defalias 'mm-string-to-multibyte > (cond > ((featurep 'xemacs) > 'identity) > ((fboundp 'string-to-multibyte) > 'string-to-multibyte) > (t > (lambda (string) > "Return a multibyte string with the same individual chars as string." > (mapconcat > (lambda (ch) (mm-string-as-multibyte (char-to-string ch))) > string ""))))) Oh, that's clever: yes, the mapconcat/char-to-string dance does make it implement the string-to-multibyte behavior because doing the string-as-multibyte conversion one byte at a time avoids the problematic case. To quote myself from mm-util.el: ;; string-as-multibyte often doesn't really do what you think it does. ;; Example: ;; (aref (string-as-multibyte "\201") 0) -> 129 (aka ?\201) ;; (aref (string-as-multibyte "\300") 0) -> 192 (aka ?\300) ;; (aref (string-as-multibyte "\300\201") 0) -> 192 (aka ?\300) ;; (aref (string-as-multibyte "\300\201") 1) -> 129 (aka ?\201) ;; but ;; (aref (string-as-multibyte "\201\300") 0) -> 2240 ;; (aref (string-as-multibyte "\201\300") 1) -> <error> Basically when the sring passed is made of a single byte, string-as-multibyte is equal to string-to-multibyte, which is the property ued by the code you quoted above to build a poor man's string-to-multibyte. Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-12-02 7:36 ` Stefan Monnier @ 2008-12-04 19:43 ` Reiner Steib 2008-12-04 21:43 ` Frank Schmitt 0 siblings, 1 reply; 21+ messages in thread From: Reiner Steib @ 2008-12-04 19:43 UTC (permalink / raw) To: Stefan Monnier; +Cc: Simon Josefsson, Frank Schmitt, ding, 1174 On Tue, Dec 02 2008, Stefan Monnier wrote: >> In Emacs 21 (which Gnus still aim to be compatible with), we have >> string-as-multibyte, but not string-to-multibyte. So your proposed >> code (i.e. mm-string-to-multibyte) runs > >> (string-as-multibyte (char-to-string string)) >> whereas we used to run >> (string-as-multibyte string) >> Does char-to-string matter here? [...] >> (lambda (string) >> "Return a multibyte string with the same individual chars as string." >> (mapconcat >> (lambda (ch) (mm-string-as-multibyte (char-to-string ch))) >> string ""))))) > > Oh, that's clever: yes, the mapconcat/char-to-string dance does make it > implement the string-to-multibyte behavior because doing the > string-as-multibyte conversion one byte at a time avoids the > problematic case. Good. So I think you can close this bug. Thanks. Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 2008-12-04 19:43 ` Reiner Steib @ 2008-12-04 21:43 ` Frank Schmitt 0 siblings, 0 replies; 21+ messages in thread From: Frank Schmitt @ 2008-12-04 21:43 UTC (permalink / raw) To: Stefan Monnier; +Cc: Simon Josefsson, ding, 1174 Reiner Steib <reinersteib+gmane@imap.cc> writes: > On Tue, Dec 02 2008, Stefan Monnier wrote: > >>> In Emacs 21 (which Gnus still aim to be compatible with), we have >>> string-as-multibyte, but not string-to-multibyte. So your proposed >>> code (i.e. mm-string-to-multibyte) runs >> >>> (string-as-multibyte (char-to-string string)) >>> whereas we used to run >>> (string-as-multibyte string) >>> Does char-to-string matter here? > [...] >>> (lambda (string) >>> "Return a multibyte string with the same individual chars as string." >>> (mapconcat >>> (lambda (ch) (mm-string-as-multibyte (char-to-string ch))) >>> string ""))))) >> >> Oh, that's clever: yes, the mapconcat/char-to-string dance does make it >> implement the string-to-multibyte behavior because doing the >> string-as-multibyte conversion one byte at a time avoids the >> problematic case. > > Good. So I think you can close this bug. Thanks. Yes, I can confirm the bug is fixed in CVS head. -- Have you ever considered how much text can fit in eighty columns? Given that a signature typically contains up to four lines of text, this space allows you to attach a tremendous amount of valuable information to your messages. Seize the opportunity and don't waste your signature on bullshit that nobody cares about. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2009-01-14 22:08 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <m37i89boe6.fsf@mid.gehheimdienst.de> [not found] ` <jeabd5vvug.fsf@sykes.suse.de> [not found] ` <m363ntey23.fsf@mid.gehheimdienst.de> [not found] ` <u63nso2wk.fsf@gnu.org> [not found] ` <jebpxjvd53.fsf@sykes.suse.de> [not found] ` <m3ej2fd2o0.fsf@mid.gehheimdienst.de> [not found] ` <umyh3mttj.fsf@gnu.org> [not found] ` <m33aivfsa8.fsf@mid.gehheimdienst.de> [not found] ` <jwv8wsnjknd.fsf-monnier+emacsbugreports@gnu.org> 2008-10-17 18:19 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Reiner Steib 2008-10-17 18:36 ` Frank Schmitt 2008-11-29 12:08 ` Reiner Steib 2008-11-29 12:18 ` Simon Josefsson 2008-11-29 15:30 ` Reiner Steib 2008-11-29 21:30 ` Stefan Monnier 2008-11-30 13:12 ` Reiner Steib 2008-11-29 22:14 ` James Cloos 2008-11-30 13:11 ` View articles with different charset (was: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23) Reiner Steib 2008-11-30 21:23 ` View articles with different charset James Cloos 2009-01-12 10:54 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Simon Josefsson 2009-01-12 11:03 ` Frank Schmitt 2009-01-12 11:10 ` Simon Josefsson 2009-01-14 11:20 ` Gabor Z. Papp 2009-01-14 20:12 ` Reiner Steib 2009-01-14 22:08 ` Stefan Monnier 2008-12-01 21:04 ` Stefan Monnier 2008-12-01 22:48 ` Reiner Steib 2008-12-02 7:36 ` Stefan Monnier 2008-12-04 19:43 ` Reiner Steib 2008-12-04 21:43 ` Frank Schmitt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).