Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* gnus-search & imap: always "CHARSET UTF-8" when literal+ is supported
@ 2021-10-22  7:34 David Edmondson
  2021-10-22 17:47 ` Eric Abrahamsen
  0 siblings, 1 reply; 4+ messages in thread
From: David Edmondson @ 2021-10-22  7:34 UTC (permalink / raw)
  To: info-gnus-english

Using current emacs git head, talking to outlook.office365.com over
IMAP.

Attempts to use gnus-search always fail with the server reporting:

(("NO" ("BADCHARSET" "(US-ASCII)") "The" "specified" "charset" "is" "not" "supported."))

Looking at gnus-search.el, `gnus-search-imap-search-command' always
sends CHARSET UTF-8 if the server supports literal+ (which this one
does). Sending US-ASCII (or no charset at all) causes the server to
return the required results in simple test cases.

Is there a way to determine whether a server supports UTF-8 in searches,
and adjust the command sent accordingly? If not, could the use of UTF-8
be controlled with (yet another!) variable?

Thanks.

dme.
-- 
You know, somebody somewhere owes us a favor.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gnus-search & imap: always "CHARSET UTF-8" when literal+ is supported
  2021-10-22  7:34 gnus-search & imap: always "CHARSET UTF-8" when literal+ is supported David Edmondson
@ 2021-10-22 17:47 ` Eric Abrahamsen
  2021-10-25  8:19   ` David Edmondson
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Abrahamsen @ 2021-10-22 17:47 UTC (permalink / raw)
  To: info-gnus-english

David Edmondson <dme@dme.org> writes:

> Using current emacs git head, talking to outlook.office365.com over
> IMAP.
>
> Attempts to use gnus-search always fail with the server reporting:
>
> (("NO" ("BADCHARSET" "(US-ASCII)") "The" "specified" "charset" "is" "not" "supported."))
>
> Looking at gnus-search.el, `gnus-search-imap-search-command' always
> sends CHARSET UTF-8 if the server supports literal+ (which this one
> does). Sending US-ASCII (or no charset at all) causes the server to
> return the required results in simple test cases.
>
> Is there a way to determine whether a server supports UTF-8 in searches,
> and adjust the command sent accordingly? If not, could the use of UTF-8
> be controlled with (yet another!) variable?

A bit of internet research seems to indicate that Exchange can't handle
UTF-8 encoded search strings, and also there's no way to test that in
advance apart from simply seeing if it errors. Awesome!

I think what this means is that it's impossible to search for non-ascii
text on an Exchange server (can that be true?!). If that's true, then
the imap search command should be using the presence of a multibyte
string as the test for whether to use CHARSET UTF-8 or not. You're not
going to be able to search for a multibyte string, anyway.

Would you try eval'ling the below, and tell me if it works okay when
searching for a string with no non-ascii characters in it?

Also, when you do get the error message above, how does that present to
the user? Did you have to go digging to find it?


(cl-defmethod gnus-search-imap-search-command ((engine gnus-search-imap)
					       (query string))
  "Create the IMAP search command for QUERY.
Currently takes into account support for the LITERAL+ capability.
Other capabilities could be tested here."
  (with-slots (literal-plus) engine
    (when (and literal-plus
	       (string-match-p "\n" query))
      (setq query (split-string query "\n")))
    (cond
     ((consp query)
      ;; We're not really streaming, just need to prevent
      ;; `nnimap-send-command' from waiting for a response.
      (let* ((nnimap-streaming t)
	     (call
	      (nnimap-send-command
	       "UID SEARCH CHARSET UTF-8 %s"
	       (pop query))))
	(dolist (l query)
	  (process-send-string (get-buffer-process (current-buffer)) l)
	  (process-send-string (get-buffer-process (current-buffer))
			       (if (nnimap-newlinep nnimap-object)
				   "\n"
				 "\r\n")))
	(nnimap-get-response call)))
     (t (nnimap-command "UID SEARCH %s" query)))))



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gnus-search & imap: always "CHARSET UTF-8" when literal+ is supported
  2021-10-22 17:47 ` Eric Abrahamsen
@ 2021-10-25  8:19   ` David Edmondson
  2021-10-25 16:37     ` Eric Abrahamsen
  0 siblings, 1 reply; 4+ messages in thread
From: David Edmondson @ 2021-10-25  8:19 UTC (permalink / raw)
  To: info-gnus-english

On Friday, 2021-10-22 at 10:47:42 -07, Eric Abrahamsen wrote:

> David Edmondson <dme@dme.org> writes:
>
>> Using current emacs git head, talking to outlook.office365.com over
>> IMAP.
>>
>> Attempts to use gnus-search always fail with the server reporting:
>>
>> (("NO" ("BADCHARSET" "(US-ASCII)") "The" "specified" "charset" "is" "not" "supported."))
>>
>> Looking at gnus-search.el, `gnus-search-imap-search-command' always
>> sends CHARSET UTF-8 if the server supports literal+ (which this one
>> does). Sending US-ASCII (or no charset at all) causes the server to
>> return the required results in simple test cases.
>>
>> Is there a way to determine whether a server supports UTF-8 in searches,
>> and adjust the command sent accordingly? If not, could the use of UTF-8
>> be controlled with (yet another!) variable?
>
> A bit of internet research seems to indicate that Exchange can't handle
> UTF-8 encoded search strings, and also there's no way to test that in
> advance apart from simply seeing if it errors. Awesome!

That's also the impression I gained.

> I think what this means is that it's impossible to search for non-ascii
> text on an Exchange server (can that be true?!). If that's true, then
> the imap search command should be using the presence of a multibyte
> string as the test for whether to use CHARSET UTF-8 or not. You're not
> going to be able to search for a multibyte string, anyway.
>
> Would you try eval'ling the below, and tell me if it works okay when
> searching for a string with no non-ascii characters in it?

This works in a few simple tests, yes. Thanks!

> Also, when you do get the error message above, how does that present to
> the user? Did you have to go digging to find it?

I had to dig. The observed behaviour is that no messages match the
search.

> (cl-defmethod gnus-search-imap-search-command ((engine gnus-search-imap)
> 					       (query string))
>   "Create the IMAP search command for QUERY.
> Currently takes into account support for the LITERAL+ capability.
> Other capabilities could be tested here."
>   (with-slots (literal-plus) engine
>     (when (and literal-plus
> 	       (string-match-p "\n" query))
>       (setq query (split-string query "\n")))
>     (cond
>      ((consp query)
>       ;; We're not really streaming, just need to prevent
>       ;; `nnimap-send-command' from waiting for a response.
>       (let* ((nnimap-streaming t)
> 	     (call
> 	      (nnimap-send-command
> 	       "UID SEARCH CHARSET UTF-8 %s"
> 	       (pop query))))
> 	(dolist (l query)
> 	  (process-send-string (get-buffer-process (current-buffer)) l)
> 	  (process-send-string (get-buffer-process (current-buffer))
> 			       (if (nnimap-newlinep nnimap-object)
> 				   "\n"
> 				 "\r\n")))
> 	(nnimap-get-response call)))
>      (t (nnimap-command "UID SEARCH %s" query)))))

dme.
-- 
I've been waiting for so long, to come here now and sing this song.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gnus-search & imap: always "CHARSET UTF-8" when literal+ is supported
  2021-10-25  8:19   ` David Edmondson
@ 2021-10-25 16:37     ` Eric Abrahamsen
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Abrahamsen @ 2021-10-25 16:37 UTC (permalink / raw)
  To: info-gnus-english

David Edmondson <dme@dme.org> writes:

> On Friday, 2021-10-22 at 10:47:42 -07, Eric Abrahamsen wrote:
>
>> David Edmondson <dme@dme.org> writes:
>>
>>> Using current emacs git head, talking to outlook.office365.com over
>>> IMAP.
>>>
>>> Attempts to use gnus-search always fail with the server reporting:
>>>
>>> (("NO" ("BADCHARSET" "(US-ASCII)") "The" "specified" "charset" "is" "not" "supported."))
>>>
>>> Looking at gnus-search.el, `gnus-search-imap-search-command' always
>>> sends CHARSET UTF-8 if the server supports literal+ (which this one
>>> does). Sending US-ASCII (or no charset at all) causes the server to
>>> return the required results in simple test cases.
>>>
>>> Is there a way to determine whether a server supports UTF-8 in searches,
>>> and adjust the command sent accordingly? If not, could the use of UTF-8
>>> be controlled with (yet another!) variable?
>>
>> A bit of internet research seems to indicate that Exchange can't handle
>> UTF-8 encoded search strings, and also there's no way to test that in
>> advance apart from simply seeing if it errors. Awesome!
>
> That's also the impression I gained.
>
>> I think what this means is that it's impossible to search for non-ascii
>> text on an Exchange server (can that be true?!). If that's true, then
>> the imap search command should be using the presence of a multibyte
>> string as the test for whether to use CHARSET UTF-8 or not. You're not
>> going to be able to search for a multibyte string, anyway.
>>
>> Would you try eval'ling the below, and tell me if it works okay when
>> searching for a string with no non-ascii characters in it?
>
> This works in a few simple tests, yes. Thanks!

Great, I'll put this change in.

>> Also, when you do get the error message above, how does that present to
>> the user? Did you have to go digging to find it?
>
> I had to dig. The observed behaviour is that no messages match the
> search.

I'm hoping to make some changes to Gnus error reporting that should make
this less difficult. The user should definitely see the difference
between "no results" and an actual error.

Thanks for the report.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-25 16:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22  7:34 gnus-search & imap: always "CHARSET UTF-8" when literal+ is supported David Edmondson
2021-10-22 17:47 ` Eric Abrahamsen
2021-10-25  8:19   ` David Edmondson
2021-10-25 16:37     ` Eric Abrahamsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).