From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/2807
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@aerifal.cx>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: strverscmp
Date: Sun, 17 Feb 2013 13:39:44 -0500
Message-ID: <20130217183944.GF20323@brightrain.aerifal.cx>
References: <20130201071053.GA14593@brightrain.aerifal.cx>
 <20130201182340.c60061b6.idunham@lavabit.com>
 <20130202023833.GW20323@brightrain.aerifal.cx>
 <20130201200452.26687bed.idunham@lavabit.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1361126397 17319 80.91.229.3 (17 Feb 2013 18:39:57 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 17 Feb 2013 18:39:57 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-2808-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 17 19:40:17 2013
Return-path: <musl-return-2808-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-2808-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1U799t-0001CX-As
	for gllmg-musl@plane.gmane.org; Sun, 17 Feb 2013 19:40:17 +0100
Original-Received: (qmail 11327 invoked by uid 550); 17 Feb 2013 18:39:57 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 11319 invoked from network); 17 Feb 2013 18:39:56 -0000
Content-Disposition: inline
In-Reply-To: <20130201200452.26687bed.idunham@lavabit.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Xref: news.gmane.org gmane.linux.lib.musl.general:2807
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/2807>

Hey, sorry it's taken me so long to get to reading and reviewing this
well. Here are some comments, mostly stylistic, but one's about a
corner-case bug. See below. With that said, the algorithm looks right
to me, and I think we can commit soon.

Rich


On Fri, Feb 01, 2013 at 08:04:52PM -0800, Isaac Dunham wrote:
> On Fri, 1 Feb 2013 21:38:33 -0500
> Rich Felker <dalias@aerifal.cx> wrote:
> 
> > > Fix strverscmp (patch same as the last time I sent it)
> > 
> > I'm not sure whether we got it into a fully-working state or not; the
> > conversation kinda died out last time. I'll review it again too. I
> > remember it didn't look quite like the algorithm I described/proposed,
> > but that doesn't mean it's wrong. It looked like it could at least use
> > some streamlining though.
> 
> The last review was just before I got it working. Here's the final version.
> 
> Probably somewhere it could be optimized, though...
> As long as it doesn't end up looking like the GNU one.
> 
> -- 
> Isaac Dunham <idunham@lavabit.com>

> diff --git a/src/string/strverscmp.c b/src/string/strverscmp.c
> index 7054967..8f3f11f 100644
> --- a/src/string/strverscmp.c
> +++ b/src/string/strverscmp.c
> @@ -1,7 +1,41 @@
> +#define _GNU_SOURCE
> +#include <ctype.h>
>  #include <string.h>
>  
>  int strverscmp(const char *l, const char *r)
>  {
> -	/* FIXME */
> -	return strcmp(l, r);
> +	int haszero=1;
> +	while (*l && *r && l[0]==r[0]){

I used to do a lot of while loops like this, but now I feel like it
makes it hard to follow the end-of-string logic -- a lot more code
runs after the null terminator is hit, and it's nontrivial to follow
through it all and make sure the right thing happens, as the
subsequent code after the loop is now handling two very-distinct
loop-exit conditions:

1. Loop exited due to mismatch.
2. Loop exited due to hitting end-of-string with no mismatch.

What about instead doing something like this:

	while (*l==*r) {
		if (!*l) return 0;
		/* ... */

That way, the "equal strings" case is trivially correct. It also might
generate better code.

Note that I also made a couple stylistic changes here; unless you
object, I prefer *p to p[0] when p is just being used as a moving
pointer within a string, always accessed with offset 0. It's just more
concise/less cluttered and avoids suggesting it's used as a string in
itself. Also added a space before the open brace, for consistency with
style elsewhere.


> +		if (l[0]=='0'){
> +			if (haszero==1) {
> +				haszero=0;
> +			}
> +		} else if (isdigit(l[0])) {
> +			if (haszero==1) {
> +				haszero=2;
> +			}
> +		} else {
> +			haszero=1;
> +		}
> +		l++; r++;
> +	}
> +	if (haszero==1 && (l[0]=='0' || r[0]=='0')) {
> +		haszero=0;
> +	}
> +	if ((isdigit(l[0]) && isdigit(r[0]) ) && haszero) {
> +		int lenl=0, lenr=0, firstl=l[0], firstr=r[0];

These variables have the wrong types. int is not sufficient to store a
character count in a string. You need size_t.

> +		while (isdigit(l++[0]) ) {
> +			lenl++;
> +		}
> +		while (isdigit(r++[0]) ) {
> +			lenr++;
> +		}
> +		if (lenl==lenr) {
> +			return (firstl -  firstr);
> +		} else {
> +			return (lenl - lenr);
> +		}

I see what's going on here, but I find saving the first mismatching
characters (btw, you could have just saved the difference) then
continuing to advance the l/r pointers a bit confusing at first. Why
not instead do:

		while (isdigit(l[lenl])) lenl++;