[PATCH] conforming strverscmp() implementation

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [PATCH] conforming strverscmp() implementation
@ 2015-03-03 10:45 Sergey Dmitrouk
  2015-03-03 15:54 ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Sergey Dmitrouk @ 2015-03-03 10:45 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]

Hi,

Attached is strverscmp() that should conform to what is described in
manual pages.  It's not actually a diff as it would be harder to read
in this case.

The idea is just as described in manuals (once you get what they mean):

1. Find point where strings differ as in current code, but keep track
   of locations of last seen numbers.
2. If numbers are both integers (that is they don't start with zero or
   it's a single zero) compare as in current implementation.
3. If only one of numbers is integer, the other one is smaller.
4. Otherwise, number with longer leading sequence of zeroes is smaller.
5. If leading zeroes match, compare characters following last zero in
   both strings.

I didn't see much comments in musl, so there is none in this file, but
can add some if requested.

Implementation is admittedly more sophisticated than current version for
the reason that all simpler approaches I tried fail for at least one of
possible comparison cases.

Regards,
Sergey

[-- Attachment #2: strverscmp.c --]
[-- Type: text/plain, Size: 1030 bytes --]

#define _GNU_SOURCE
#include <ctype.h>
#include <string.h>

int strverscmp(const char *l, const char *r)
{
	const char *ln=(isdigit(*l) ? l : NULL), *rn=(isdigit(*r) ? r : NULL);
	while (*l==*r) {
		if (!*l) return 0;

		if (isdigit(*l)) {
			if (ln == NULL) {
				ln = l; rn = r;
			}
		} else {
			ln = NULL; rn = NULL;
		}

		l++; r++;
	}

	if ((*l != '\0' && !isdigit(*l)) || (*r != '\0' && !isdigit(*r))) {
		ln = NULL; rn = NULL;
	}

	if (ln != NULL) {
		int intl=(*ln != '0' || !isdigit(*(ln + 1)));
		int intr=(*rn != '0' || !isdigit(*(rn + 1)));

		if (intl ^ intr) {
			return intl ? 1 : -1;
		} else if (intl) {
			size_t lenl=0, lenr=0;
			while (isdigit(l[lenl])) lenl++;
			while (isdigit(r[lenr])) lenr++;
			if (lenl==lenr) {
				return (*l - *r);
			} else if (lenl>lenr) {
				return 1;
			} else {
				return -1;
			}
		} else {
			size_t zl=0, zr=0;
			while (ln[zl]=='0') zl++;
			while (rn[zr]=='0') zr++;
			if (zl>zr) {
				return -1;
			} else if (zl<zr) {
				return 1;
			}
		}
	}

	return (*l - *r);
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] conforming strverscmp() implementation
  2015-03-03 10:45 [PATCH] conforming strverscmp() implementation Sergey Dmitrouk
@ 2015-03-03 15:54 ` Rich Felker
  2015-03-03 19:27   ` Sergey Dmitrouk
  0 siblings, 1 reply; 3+ messages in thread
From: Rich Felker @ 2015-03-03 15:54 UTC (permalink / raw)
  To: musl

On Tue, Mar 03, 2015 at 12:45:07PM +0200, Sergey Dmitrouk wrote:
> Hi,
> 
> Attached is strverscmp() that should conform to what is described in
> manual pages.  It's not actually a diff as it would be harder to read
> in this case.
> 
> The idea is just as described in manuals (once you get what they mean):
> 
> 1. Find point where strings differ as in current code, but keep track
>    of locations of last seen numbers.
> 2. If numbers are both integers (that is they don't start with zero or
>    it's a single zero) compare as in current implementation.
> 3. If only one of numbers is integer, the other one is smaller.
> 4. Otherwise, number with longer leading sequence of zeroes is smaller.
> 5. If leading zeroes match, compare characters following last zero in
>    both strings.
> 
> I didn't see much comments in musl, so there is none in this file, but
> can add some if requested.
> 
> Implementation is admittedly more sophisticated than current version for
> the reason that all simpler approaches I tried fail for at least one of
> possible comparison cases.

Thanks. This has been an open issue for a while. I haven't looked at
your code in detail yet and at this point it will probably be
something I do after the 1.1.7 release, and I'd like to allow some
time for review. Have you run libc-test against it and checked that it
fixes all the test failures there? I would guess it does since they're
based on the man page examples but it would be good to double-check
anyway if you haven't.

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] conforming strverscmp() implementation
  2015-03-03 15:54 ` Rich Felker
@ 2015-03-03 19:27   ` Sergey Dmitrouk
  0 siblings, 0 replies; 3+ messages in thread
From: Sergey Dmitrouk @ 2015-03-03 19:27 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]

On Tue, Mar 03, 2015 at 07:54:11AM -0800, Rich Felker wrote:
> Have you run libc-test against it and checked that it
> fixes all the test failures there? I would guess it does since they're
> based on the man page examples but it would be good to double-check
> anyway if you haven't.

Yes, related libc-test regression tests pass now.  I also compared
results with glibc implementation and it seems to be correct for
numbers, but I just run check again in directory with some non-ASCII file
names and got:

glibc:      . .. 000 00 01 010 02 03 04 05 06 07 08 09 0 1 3 9 10 11 русские буквы в имени
musl (new): русские буквы в имени . .. 000 00 01 010 02 03 04 05 06 07 08 09 0 1 3 9 10 11
musl (old): русские буквы в имени . .. 0 00 000 01 010 02 03 04 05 06 07 08 09 1 3 9 10 11

You can see that sorting of Cyrillic file names differs from glibc in
UTF-8 locale.  I believe it's another bug and this one is related to:

    return (*l - *r);

which should be changed (maybe even for 1.1.6) to something equivalent to:

    return ((unsigned char)*l - (unsigned char)*r);

in which case results become:

glibc:      . .. 000 00 01 010 02 03 04 05 06 07 08 09 0 1 3 9 10 11 русские буквы в имени
musl (new): . .. 000 00 01 010 02 03 04 05 06 07 08 09 0 1 3 9 10 11 русские буквы в имени
musl (old): . .. 0 00 000 01 010 02 03 04 05 06 07 08 09 1 3 9 10 11 русские буквы в имени

-- 
Sergey

[-- Attachment #2: strverscmp.c --]
[-- Type: text/plain, Size: 1090 bytes --]

#define _GNU_SOURCE
#include <ctype.h>
#include <string.h>

int strverscmp(const char *l, const char *r)
{
	const char *ln=(isdigit(*l) ? l : NULL), *rn=(isdigit(*r) ? r : NULL);
	while (*l==*r) {
		if (!*l) return 0;

		if (isdigit(*l)) {
			if (ln == NULL) {
				ln = l; rn = r;
			}
		} else {
			ln = NULL; rn = NULL;
		}

		l++; r++;
	}

	if ((*l != '\0' && !isdigit(*l)) || (*r != '\0' && !isdigit(*r))) {
		ln = NULL; rn = NULL;
	}

	if (ln != NULL) {
		int intl=(*ln != '0' || !isdigit(*(ln + 1)));
		int intr=(*rn != '0' || !isdigit(*(rn + 1)));

		if (intl ^ intr) {
			return intl ? 1 : -1;
		} else if (intl) {
			size_t lenl=0, lenr=0;
			while (isdigit(l[lenl])) lenl++;
			while (isdigit(r[lenr])) lenr++;
			if (lenl==lenr) {
				return ((unsigned char)*l - (unsigned char)*r);
			} else if (lenl>lenr) {
				return 1;
			} else {
				return -1;
			}
		} else {
			size_t zl=0, zr=0;
			while (ln[zl]=='0') zl++;
			while (rn[zr]=='0') zr++;
			if (zl>zr) {
				return -1;
			} else if (zl<zr) {
				return 1;
			}
		}
	}

	return ((unsigned char)*l - (unsigned char)*r);
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-03-03 19:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-03 10:45 [PATCH] conforming strverscmp() implementation Sergey Dmitrouk
2015-03-03 15:54 ` Rich Felker
2015-03-03 19:27   ` Sergey Dmitrouk

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).