zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <pws@csr.com>
To: zsh-workers@sunsite.dk (Zsh hackers list)
Subject: Re: PATCH: ordering of hash table scans
Date: Wed, 7 Feb 2007 10:10:47 +0000	[thread overview]
Message-ID: <20070207101047.e4bff2a7.pws@csr.com> (raw)
In-Reply-To: <200702062142.l16LgOoa007853@pwslaptop.csr.com>

Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> This started off as an attempt to make ztrcmp() handle multibyte
> characters when MULTIBYTE is turned on, which I've done.  ztrcmp() is
> only used when sorting the names of hash nodes for use when scanning
> through hash tables, usually in order to print them out:  this is used
> for commands of various sort, parameters, and a few other miscellaneous
> bist and pieces.
> 
> I have a vague memory that it's deliberate that we don't use strcoll()
> here, which would make the sorting locale dependent and in particular
> possibly case-insensitive.

This gave me a sleepless night.  If we're not using strcoll(), it seems
overkill to convert every single multibyte character to a wide character on
every comparison between two strings when sorting.  So I've put that bit
back and stuck a note at the top.

Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.153
diff -u -r1.153 utils.c
--- Src/utils.c	6 Feb 2007 21:47:55 -0000	1.153
+++ Src/utils.c	7 Feb 2007 10:06:38 -0000
@@ -3693,61 +3693,46 @@
     return fn;
 }
 
-/* Unmetafy and compare two strings, comparing unsigned character values.
- * "a\0" sorts after "a".  */
+/*
+ * Unmetafy and compare two strings, comparing unsigned character values.
+ * "a\0" sorts after "a".
+ *
+ * Currently this is only used in hash table sorting, where the
+ * keys are names of hash nodes and where we don't use strcoll();
+ * it's not clear if that's right but it does guarantee the ordering
+ * of shell structures on output.
+ *
+ * As we don't use strcoll(), it seems overkill to convert multibyte
+ * characters to wide characters for comparison every time.  In the case
+ * of UTF-8, Unicode ordering is preserved when sorted raw, and for
+ * other character sets we rely on an extension of ASCII so the result,
+ * while it may not be correct, is at least rational.
+ */
 
 /**/
 int
 ztrcmp(char const *s1, char const *s2)
 {
-    convchar_t c1 = 0, c2;
-
-#ifdef MULTIBYTE_SUPPORT
-    if (isset(MULTIBYTE)) {
-	mb_metacharinit();
-	while (*s1) {
-	    int clen = mb_metacharlenconv(s1, &c1);
-
-	    if (strncmp(s1, s2, clen))
-		break;
-	    s1 += clen;
-	    s2 += clen;
-	}
-    } else
-#endif
-	while (*s1 && *s1 == *s2) {
-	    s1++;
-	    s2++;
-	}
+    int c1, c2;
 
-    if (!*s1) {
-	if (!*s2)
-	    return 0;
-	return -1;
-    }
-    if (!*s2)
-	return 1;
-#ifdef MULTIBYTE_SUPPORT
-    if (isset(MULTIBYTE)) {
-	/* TODO: shift state for s2 might be wrong? */
-	mb_metacharinit();
-	(void)mb_metacharlenconv(s2, &c2);
-	if (c1 == WEOF)
-	    c1 = STOUC(*s1 == Meta ? s1[1] ^ 32 : *s1);
-	if (c2 == WEOF)
-	    c2 = STOUC(*s2 == Meta ? s2[1] ^ 32 : *s2);
-    }
-    else
-#endif
-    {
-	c1 = STOUC(*s1 == Meta ? s1[1] ^ 32 : *s1);
-	c2 = STOUC(*s2 == Meta ? s2[1] ^ 32 : *s2);
-    }
+    while(*s1 && *s1 == *s2) {
+	s1++;
+	s2++;
+    }
+
+    if(!(c1 = *s1))
+	c1 = -1;
+    else if(c1 == STOUC(Meta))
+	c1 = *++s1 ^ 32;
+    if(!(c2 = *s2))
+	c2 = -1;
+    else if(c2 == STOUC(Meta))
+	c2 = *++s2 ^ 32;
 
-    if (c1 < c2)
-	return -1;
-    else if (c1 == c2)
+    if(c1 == c2)
 	return 0;
+    else if(c1 < c2)
+	return -1;
     else
 	return 1;
 }

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php

To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview


  reply	other threads:[~2007-02-07 10:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-06 21:42 Peter Stephenson
2007-02-07 10:10 ` Peter Stephenson [this message]
2007-02-07 16:09   ` Bart Schaefer
2007-02-07 16:06 ` Bart Schaefer
2007-02-07 16:33   ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070207101047.e4bff2a7.pws@csr.com \
    --to=pws@csr.com \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).