Non-stub gettext API functions committed, ready for testing

mailing list of musl libc
 help / color / mirror / code / Atom feed

* Non-stub gettext API functions committed, ready for testing
@ 2014-07-27  8:46 Rich Felker
  2014-07-27 10:06 ` Harald Becker
  2014-07-27 10:19 ` Harald Becker
  0 siblings, 2 replies; 14+ messages in thread
From: Rich Felker @ 2014-07-27  8:46 UTC (permalink / raw)
  To: musl

As of commit 2068b4e8911a3a49cded44b4568f6c943a8c98f8, it should now
be possible to support message translation at the application level
using the gettext (libintl.h) functions provided by musl. Feedback
from users interested in this functionality would be much appreciated!

Note that some (many?) applications may attempt to use their own
included gettext rather than the one in libc, so before testing it
would be helpful to check this and make sure the functions in musl are
actually getting called. I'm not familiar with what types of checks
typical autoconf scripts do to choose whether to use libc gettext or
their own, so information on this topic would also be helpful,
especially if there's anything we could do to get apps to choose the
one in musl rather than pulling in their own bloated copy of GNU
gettext.

As mentioned in the commit message, some functionality is still
missing. For the plurals stuff, I can't find the information on how
you actually get the plural rules out of the .mo file and apply them.
For the LANGUAGE variable, it's just a matter of adding some
loop-and-retry logic.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-27  8:46 Non-stub gettext API functions committed, ready for testing Rich Felker
@ 2014-07-27 10:06 ` Harald Becker
  2014-07-27 14:14   ` Szabolcs Nagy
  2014-07-27 10:19 ` Harald Becker
  1 sibling, 1 reply; 14+ messages in thread
From: Harald Becker @ 2014-07-27 10:06 UTC (permalink / raw)
  To: musl

Hi Rich !
> As mentioned in the commit message, some functionality is still
> missing. For the plurals stuff, I can't find the information on how
> you actually get the plural rules out of the .mo file and apply them.
> For the LANGUAGE variable, it's just a matter of adding some
> loop-and-retry logic.

Does this text help to clarify the plurals question?

http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-27  8:46 Non-stub gettext API functions committed, ready for testing Rich Felker
  2014-07-27 10:06 ` Harald Becker
@ 2014-07-27 10:19 ` Harald Becker
  1 sibling, 0 replies; 14+ messages in thread
From: Harald Becker @ 2014-07-27 10:19 UTC (permalink / raw)
  To: musl

 > [gettext plurals]

And this may also help:
http://localization-guide.readthedocs.org/en/lates/l10n/pluralforms.html



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-27 10:06 ` Harald Becker
@ 2014-07-27 14:14   ` Szabolcs Nagy
  2014-07-27 16:49     ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Szabolcs Nagy @ 2014-07-27 14:14 UTC (permalink / raw)
  To: musl

* Harald Becker <ralda@gmx.de> [2014-07-27 12:06:01 +0200]:
> >As mentioned in the commit message, some functionality is still
> >missing. For the plurals stuff, I can't find the information on how
> >you actually get the plural rules out of the .mo file and apply them.
> >For the LANGUAGE variable, it's just a matter of adding some
> >loop-and-retry logic.
> 
> Does this text help to clarify the plurals question?
> 
> http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms

it shows that a c arithmetic expression parser is needed to handle plurals
(and the expression has to be evaluated every time dcngettext is invoked)

  Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;

  The nplurals value must be a decimal number which specifies how many
  different plural forms exist for this language. The string following
  plural is an expression which is using the C language syntax.
  Exceptions are that no negative numbers are allowed, numbers must be
  decimal, and the only variable allowed is n.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-27 14:14   ` Szabolcs Nagy
@ 2014-07-27 16:49     ` Rich Felker
  2014-07-27 17:23       ` Szabolcs Nagy
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2014-07-27 16:49 UTC (permalink / raw)
  To: musl

On Sun, Jul 27, 2014 at 04:14:18PM +0200, Szabolcs Nagy wrote:
> * Harald Becker <ralda@gmx.de> [2014-07-27 12:06:01 +0200]:
> > >As mentioned in the commit message, some functionality is still
> > >missing. For the plurals stuff, I can't find the information on how
> > >you actually get the plural rules out of the .mo file and apply them.
> > >For the LANGUAGE variable, it's just a matter of adding some
> > >loop-and-retry logic.
> > 
> > Does this text help to clarify the plurals question?
> > 
> > http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms

I read that before but thought it shows how the data is written in po
files but not where to find it in the mo file... But now I see it's in
the "header" that's, by convention, the translation for "". How ugly..

> it shows that a c arithmetic expression parser is needed to handle plurals
> (and the expression has to be evaluated every time dcngettext is invoked)

Not necessarily. You could cache results. Or (this is likely the more
reasonable implementation) just hard-code the expression strings that
are actually used for real languages and implement them in C when a
match is found.

>   Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
> 
>   The nplurals value must be a decimal number which specifies how many
>   different plural forms exist for this language. The string following
>   plural is an expression which is using the C language syntax.
>   Exceptions are that no negative numbers are allowed, numbers must be
>   decimal, and the only variable allowed is n.

This is a very poor description. Does it allow casts? Compound
literals? Floating point? Function calls? ...?

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-27 16:49     ` Rich Felker
@ 2014-07-27 17:23       ` Szabolcs Nagy
  2014-07-27 17:36         ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Szabolcs Nagy @ 2014-07-27 17:23 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2014-07-27 12:49:21 -0400]:
> On Sun, Jul 27, 2014 at 04:14:18PM +0200, Szabolcs Nagy wrote:
> > it shows that a c arithmetic expression parser is needed to handle plurals
> > (and the expression has to be evaluated every time dcngettext is invoked)
> 
> Not necessarily. You could cache results. Or (this is likely the more
> reasonable implementation) just hard-code the expression strings that
> are actually used for real languages and implement them in C when a
> match is found.
> 

hardcoding the strings will fail if .mo files are updated to
use different expressions

with caching the expr has to be evaluated for every uncached n

> >   Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
> > 
> >   The nplurals value must be a decimal number which specifies how many
> >   different plural forms exist for this language. The string following
> >   plural is an expression which is using the C language syntax.
> >   Exceptions are that no negative numbers are allowed, numbers must be
> >   decimal, and the only variable allowed is n.
> 
> This is a very poor description. Does it allow casts? Compound
> literals? Floating point? Function calls? ...?
> 

the parser in gnu gettext:
http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y

so they implement
 conditional (?:)
 logic (&&, ||, !)
 relational (==, !=, <, >, <=, >=)
 and arithmetic (+, -, *, /, %)
operators with unsigned long args only

but in practice the only arithmetic operator used is %
and operators are not combined arbitrarily


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-27 17:23       ` Szabolcs Nagy
@ 2014-07-27 17:36         ` Rich Felker
  2014-07-27 17:51           ` Szabolcs Nagy
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2014-07-27 17:36 UTC (permalink / raw)
  To: musl

On Sun, Jul 27, 2014 at 07:23:09PM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2014-07-27 12:49:21 -0400]:
> > On Sun, Jul 27, 2014 at 04:14:18PM +0200, Szabolcs Nagy wrote:
> > > it shows that a c arithmetic expression parser is needed to handle plurals
> > > (and the expression has to be evaluated every time dcngettext is invoked)
> > 
> > Not necessarily. You could cache results. Or (this is likely the more
> > reasonable implementation) just hard-code the expression strings that
> > are actually used for real languages and implement them in C when a
> > match is found.
> > 
> 
> hardcoding the strings will fail if .mo files are updated to
> use different expressions
> 
> with caching the expr has to be evaluated for every uncached n

Yes.

> > >   Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
> > > 
> > >   The nplurals value must be a decimal number which specifies how many
> > >   different plural forms exist for this language. The string following
> > >   plural is an expression which is using the C language syntax.
> > >   Exceptions are that no negative numbers are allowed, numbers must be
> > >   decimal, and the only variable allowed is n.
> > 
> > This is a very poor description. Does it allow casts? Compound
> > literals? Floating point? Function calls? ...?
> > 
> 
> the parser in gnu gettext:
> http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y
> 
> so they implement
>  conditional (?:)
>  logic (&&, ||, !)
>  relational (==, !=, <, >, <=, >=)
>  and arithmetic (+, -, *, /, %)
> operators with unsigned long args only

And parentheses?

From what I can tell, that's not so bad. Anyone feel like writing an
expression evaluator for it? I think recursive descent is fine as long
as the length of the string being evaluated is capped at a sane length
(or just keep a depth counter and abort the evaluation if it exceeds
some reasonable limit).

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-27 17:36         ` Rich Felker
@ 2014-07-27 17:51           ` Szabolcs Nagy
  2014-07-27 18:00             ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Szabolcs Nagy @ 2014-07-27 17:51 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2014-07-27 13:36:05 -0400]:
> On Sun, Jul 27, 2014 at 07:23:09PM +0200, Szabolcs Nagy wrote:
> > so they implement
> >  conditional (?:)
> >  logic (&&, ||, !)
> >  relational (==, !=, <, >, <=, >=)
> >  and arithmetic (+, -, *, /, %)
> > operators with unsigned long args only
> 
> And parentheses?
> 

yes

> >From what I can tell, that's not so bad. Anyone feel like writing an
> expression evaluator for it? I think recursive descent is fine as long
> as the length of the string being evaluated is capped at a sane length
> (or just keep a depth counter and abort the evaluation if it exceeds
> some reasonable limit).
> 

i can try


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-27 17:51           ` Szabolcs Nagy
@ 2014-07-27 18:00             ` Rich Felker
  2014-07-28 10:18               ` Szabolcs Nagy
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2014-07-27 18:00 UTC (permalink / raw)
  To: musl

On Sun, Jul 27, 2014 at 07:51:26PM +0200, Szabolcs Nagy wrote:
> > >From what I can tell, that's not so bad. Anyone feel like writing an
> > expression evaluator for it? I think recursive descent is fine as long
> > as the length of the string being evaluated is capped at a sane length
> > (or just keep a depth counter and abort the evaluation if it exceeds
> > some reasonable limit).
> > 
> 
> i can try

OK. Some thoughts on implementation: It should probably accept the
expression as a base+length rather than a C string so it can be used
in-place from within the mo file "header" (this design might help for
recursion anyway I suppose). And it should be safe against malicious
changes to the expression during evaluation (at worst give wrong
results or error out rather than risk of stack overflow, out-of-bounds
reads, etc.) since I'm aiming to make the whole system safe against
malicious translation files (assuming the caller doesn't use the
results in unsafe ways like as a format string).

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-27 18:00             ` Rich Felker
@ 2014-07-28 10:18               ` Szabolcs Nagy
  2014-07-28 13:00                 ` Szabolcs Nagy
  0 siblings, 1 reply; 14+ messages in thread
From: Szabolcs Nagy @ 2014-07-28 10:18 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

* Rich Felker <dalias@libc.org> [2014-07-27 14:00:41 -0400]:
> On Sun, Jul 27, 2014 at 07:51:26PM +0200, Szabolcs Nagy wrote:
> > > >From what I can tell, that's not so bad. Anyone feel like writing an
> > > expression evaluator for it? I think recursive descent is fine as long
> > > as the length of the string being evaluated is capped at a sane length
> > > (or just keep a depth counter and abort the evaluation if it exceeds
> > > some reasonable limit).
> > > 
> > 
> > i can try
> 
> OK. Some thoughts on implementation: It should probably accept the
> expression as a base+length rather than a C string so it can be used
> in-place from within the mo file "header" (this design might help for
> recursion anyway I suppose). And it should be safe against malicious
> changes to the expression during evaluation (at worst give wrong
> results or error out rather than risk of stack overflow, out-of-bounds
> reads, etc.) since I'm aiming to make the whole system safe against
> malicious translation files (assuming the caller doesn't use the
> results in unsafe ways like as a format string).
> 

ok i did something

i parse the expression once and then do the eval separately so
"changes to the expression during evaluation" does not apply
(i expected the expr to be const and evaluated several times
with different n)

currently it checks if base[length-1] == ';' and then does not
care about the length anymore, the first unexpected char ends
the parsing

the parser and eval code is about 2k now, i can try to do it
without a separate parsing step (my approach requires a 100-200
byte buffer to store the parsed expr now)


[-- Attachment #2: pl.h --]
[-- Type: text/x-chdr, Size: 207 bytes --]

// parse s into expr, returns -1 on failure
int parse(unsigned char *expr, size_t elen, const char *s, size_t slen);
// eval expr with input n
unsigned long eval(const unsigned char *expr, unsigned long n);

[-- Attachment #3: pl.c --]
[-- Type: text/x-csrc, Size: 5628 bytes --]

#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include "pl.h"

/*
grammar:

Start = Expr ';'
Expr  = Or | Or '?' Expr ':' Expr
Or    = And | Or '||' And
And   = Rel | And '&&' Rel
Rel   = Add | Add '==' Add | Add '!=' Add | Add '<=' Add | Add '>=' Add | Add '<' Add | Add '>' Add
Add   = Mul | Add '+' Mul | Add '-' Mul
Mul   = Term | Mul '*' Term | Mul '/' decimal | Mul '%' decimal
Term  = '(' Expr ')' | '!' Term | decimal | 'n'

compared to gnu gettext:
	right side of / and % must be const (and non-zero),
	chained relational/eq operators are not allowed,
	decimal is at most 255

internals:

parser is recursive descent, terminals are pushed on a stack
that grows down, a binary op "Left op Right" is parsed into
"op length-of-Right Right Left" so eval is easy to implement.

op chars on the stack
	n c ! | & = < > + - * / % ?
are
	var, const, neg, or, and, eq, less, greater, add, sub, mul, div, mod, cond

parse* functions push the parsed rule on the stack and return
a pointer to the next non-space char
*/

#include <stdio.h>

struct st {
	unsigned char *p;
	unsigned char *e;
};

static int ok(struct st *st)
{
	return st->p != st->e;
}

static void fail(struct st *st)
{
	st->p = st->e;
}

static void push(struct st *st, int c)
{
	if (ok(st))
		*--st->p = c;
}

static const char *skipspace(const char *s)
{
	while (isspace(*s)) s++;
	return s;
}

static const char *parseconst(struct st *st, const char *s)
{
	char *e;
	unsigned long n;
	n = strtoul(s, &e, 10);
	if (!isdigit(*s) || e == s || n > 255)
		fail(st);
	push(st, n);
	push(st, 'c');
	return skipspace(e);
}

static const char *parseexpr(struct st *st, const char *s, int d);

static const char *parseterm(struct st *st, const char *s, int d)
{
	if (d <= 0) {
		fail(st);
		return s;
	}
	s = skipspace(s);
	if (*s == '!') {
		s = parseterm(st, s+1, d-1);
		push(st, '!');
		return s;
	}
	if (*s == '(') {
		s = parseexpr(st, s+1, d-1);
		if (*s != ')') {
			fail(st);
			return s;
		}
		return skipspace(s+1);
	}
	if (*s == 'n') {
		push(st, 'n');
		return skipspace(s+1);
	}
	return parseconst(st, s);
}

static const char *parsemul(struct st *st, const char *s, int d)
{
	unsigned char *p;
	int op;
	s = parseterm(st, s, d-1);
	for (;;) {
		op = *s;
		p = st->p;
		if (op == '*') {
			s = parseterm(st, s+1, d-1);
		} else  if (op == '/' || op == '%') {
			s = skipspace(s+1);
			if (*s == '0') {
				fail(st);
				return s;
			}
			s = parseconst(st, s);
		} else
			return s;
		push(st, p - st->p);
		push(st, op);
	}
}

static const char *parseadd(struct st *st, const char *s, int d)
{
	unsigned char *p;
	int op;
	s = parsemul(st, s, d-1);
	for (;;) {
		op = *s;
		if (op != '+' && op != '-')
			return s;
		p = st->p;
		s = parsemul(st, s+1, d-1);
		push(st, p - st->p);
		push(st, op);
	}
}

static const char *parserel(struct st *st, const char *s, int d)
{
	unsigned char *p;
	int neg = 0, op;
	s = parseadd(st, s, d-1);
	if (s[0] == '=' && s[1] == '=') {
		op = '=';
		s++;
	} else if (s[0] == '!' && s[1] == '=') {
		op = '=';
		neg = 1;
		s++;
	} else if (s[0] == '<' && s[1] == '=') {
		op = '>';
		neg = 1;
		s++;
	} else if (s[0] == '<') {
		op = '<';
	} else if (s[0] == '>' && s[1] == '=') {
		op = '<';
		neg = 1;
		s++;
	} else if (s[0] == '>') {
		op = '>';
	} else
		return s;
	p = st->p;
	s = parseadd(st, s+1, d-1);
	push(st, p - st->p);
	push(st, op);
	if (neg)
		push(st, '!');
	return s;
}

static const char *parseand(struct st *st, const char *s, int d)
{
	unsigned char *p;
	s = parserel(st, s, d-1);
	for (;;) {
		if (s[0] != '&' || s[1] != '&')
			return s;
		p = st->p;
		s = parserel(st, s+2, d-1);
		push(st, p - st->p);
		push(st, '&');
	}
}

static const char *parseor(struct st *st, const char *s, int d)
{
	unsigned char *p;
	s = parseand(st, s, d-1);
	for (;;) {
		if (s[0] != '|' || s[1] != '|')
			return s;
		p = st->p;
		s = parseand(st, s+2, --d);
		push(st, p - st->p);
		push(st, '|');
	}
}

static const char *parseexpr(struct st *st, const char *s, int d)
{
	unsigned char *p1, *p2;
	if (d <= 0) {
		fail(st);
		return s;
	}
	s = parseor(st, s, d-1);
	if (*s == '?') {
		p1 = st->p;
		s = parseexpr(st, s+1, d-1);
		p2 = st->p;
		if (*s != ':')
			fail(st);
		else
			s = parseexpr(st, s+1, d-1);
		push(st, p2 - st->p);
		push(st, p1 - st->p);
		push(st, '?');
	}
	return s;
}

int parse(unsigned char *expr, size_t elen, const char *s, size_t slen)
{
	const char *e = s + slen - 1;
	unsigned char *p;
	struct st st;

	if (*e != ';')
		return -1;
	if (elen > 200)
		elen = 200;
	st.e = expr;
	p = st.p = expr + elen;
	s = parseexpr(&st, s, 100);
	if (!ok(&st) || s != e)
		return -1;
	memmove(expr, st.p, p - st.p);
	return 0;
}

static unsigned long evalcond(const unsigned char *e, unsigned long n)
{
	int offcond = *e++;
	unsigned long c = eval(e+offcond, n);
	int offtrue = *e++;
	return eval(c ? e+offtrue : e, n);
}

static unsigned long evalbin(int op, const unsigned char *e, unsigned long n)
{
	int offleft = *e++;
	unsigned long right = eval(e, n);
	unsigned long left = eval(e+offleft, n);
	switch (op) {
	case '|': return left || right;
	case '&': return left && right;
	case '=': return left == right;
	case '<': return left < right;
	case '>': return left > right;
	case '+': return left + right;
	case '-': return left - right;
	case '*': return left * right;
	case '/': return left / right;
	case '%': return left % right;
	}
	return -1;
}

unsigned long eval(const unsigned char *e, unsigned long n)
{
	int op = *e++;
	switch (op) {
	case 'n': return n;
	case 'c': return *e;
	case '!': return !eval(e, n);
	case '?': return evalcond(e, n);
	}
	return evalbin(op, e, n);
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-28 10:18               ` Szabolcs Nagy
@ 2014-07-28 13:00                 ` Szabolcs Nagy
  2014-07-28 14:01                   ` Szabolcs Nagy
  0 siblings, 1 reply; 14+ messages in thread
From: Szabolcs Nagy @ 2014-07-28 13:00 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 362 bytes --]

* Szabolcs Nagy <nsz@port70.net> [2014-07-28 12:18:30 +0200]:
> the parser and eval code is about 2k now, i can try to do it
> without a separate parsing step (my approach requires a 100-200
> byte buffer to store the parsed expr now)
> 

attached a simpler solution without separate parsing
(code is about 1.4k now, and it is more compatible
with gnu gettext)


[-- Attachment #2: pl.c --]
[-- Type: text/x-csrc, Size: 3698 bytes --]

#include <stdlib.h>
#include <ctype.h>
#include "pl.h"

/*
grammar:

Start = Expr ';'
Expr  = Or | Or '?' Expr ':' Expr
Or    = And | Or '||' And
And   = Eq | And '&&' Eq
Eq    = Rel | Eq '==' Rel | Eq '!=' Rel
Rel   = Add | Rel '<=' Add | Rel '>=' Add | Rel '<' Add | Rel '>' Add
Add   = Mul | Add '+' Mul | Add '-' Mul
Mul   = Term | Mul '*' Term | Mul '/' decimal | Mul '%' decimal
Term  = '(' Expr ')' | '!' Term | decimal | 'n'

internals:

recursive descent expression evaluator with stack depth limit.
eval* functions return the value of the subexpression and set
the current string pointer to the next non-space char.
*/

struct st {
	const char *s;
	unsigned long n;
	int err;
};

static const char *skipspace(const char *s)
{
	while (isspace(*s)) s++;
	return s;
}

static unsigned long evalconst(struct st *st)
{
	char *e;
	unsigned long n;
	n = strtoul(st->s, &e, 10);
	if (!isdigit(*st->s) || e == st->s || n == -1)
		st->err = 1;
	st->s = skipspace(e);
	return n;
}

static unsigned long evalexpr(struct st *st, int d);

static unsigned long evalterm(struct st *st, int d)
{
	unsigned long a;
	if (d <= 0) {
		st->err = 1;
		return 0;
	}
	st->s = skipspace(st->s);
	if (*st->s == '!') {
		st->s++;
		return !evalterm(st, d-1);
	}
	if (*st->s == '(') {
		st->s++;
		a = evalexpr(st, d-1);
		if (*st->s != ')') {
			st->err = 1;
			return 0;
		}
		st->s = skipspace(st->s + 1);
		return a;
	}
	if (*st->s == 'n') {
		st->s = skipspace(st->s + 1);
		return st->n;
	}
	return evalconst(st);
}

static unsigned long evalmul(struct st *st, int d)
{
	unsigned long b, a = evalterm(st, d-1);
	int op;
	for (;;) {
		op = *st->s;
		if (op != '*' && op != '/' && op != '%')
			return a;
		st->s++;
		b = evalterm(st, d-1);
		if (op == '*') {
			a *= b;
		} else if (!b) {
			st->err = 1;
			return 0;
		} else if (op == '%') {
			a %= b;
		} else {
			a /= b;
		}
	}
}

static unsigned long evaladd(struct st *st, int d)
{
	unsigned long a = 0;
	int add = 1;
	for (;;) {
		a += (add?1:-1) * evalmul(st, d-1);
		if (*st->s != '+' && *st->s != '-')
			return a;
		add = *st->s == '+';
		st->s++;
	}
}

static unsigned long evalrel(struct st *st, int d)
{
	unsigned long b, a = evaladd(st, d-1);
	int less, eq;
	for (;;) {
		if (*st->s != '<' && *st->s != '>')
			return a;
		less = st->s[0] == '<';
		eq = st->s[1] == '=';
		st->s += 1 + eq;
		b = evaladd(st, d-1);
		a = (less ? a < b : a > b) || (eq && a == b);
	}
}

static unsigned long evaleq(struct st *st, int d)
{
	unsigned long a = evalrel(st, d-1);
	int neg;
	for (;;) {
		if ((st->s[0] != '=' && st->s[0] != '!') || st->s[1] != '=')
			return a;
		neg = st->s[0] == '!';
		st->s += 2;
		a = evalrel(st, d-1) == a;
		a ^= neg;
	}
}

static unsigned long evaland(struct st *st, int d)
{
	unsigned long a = evaleq(st, d-1);
	for (;;) {
		if (st->s[0] != '&' || st->s[1] != '&')
			return a;
		st->s += 2;
		a = evaleq(st, d-1) && a;
	}
}

static unsigned long evalor(struct st *st, int d)
{
	unsigned long a = evaland(st, d-1);
	for (;;) {
		if (st->s[0] != '|' || st->s[1] != '|')
			return a;
		st->s += 2;
		a = evaland(st, d-1) || a;
	}
}

static unsigned long evalexpr(struct st *st, int d)
{
	unsigned long a1, a2, a3;
	if (d <= 0) {
		st->err = 1;
		return 0;
	}
	a1 = evalor(st, d-1);
	if (*st->s != '?')
		return a1;
	st->s++;
	a2 = evalexpr(st, d-1);
	if (*st->s != ':') {
		st->err = 1;
		return 0;
	}
	st->s++;
	a3 = evalexpr(st, d-1);
	return a1 ? a2 : a3;
}

unsigned long eval(const char *s, size_t len, unsigned long n)
{
	unsigned long a;
	const char *e = s+len-1;
	struct st st;

	if (*e != ';')
		return -1;
	st.s = s;
	st.n = n;
	st.err = 0;
	a = evalexpr(&st, 100);
	if (st.err || st.s != e)
		return -1;
	return a;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-28 13:00                 ` Szabolcs Nagy
@ 2014-07-28 14:01                   ` Szabolcs Nagy
  2014-07-28 16:27                     ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Szabolcs Nagy @ 2014-07-28 14:01 UTC (permalink / raw)
  To: musl

* Szabolcs Nagy <nsz@port70.net> [2014-07-28 15:00:17 +0200]:
> * Szabolcs Nagy <nsz@port70.net> [2014-07-28 12:18:30 +0200]:
> > the parser and eval code is about 2k now, i can try to do it
> > without a separate parsing step (my approach requires a 100-200
> > byte buffer to store the parsed expr now)
> > 
> 
> attached a simpler solution without separate parsing
> (code is about 1.4k now, and it is more compatible
> with gnu gettext)
> 

using a complex plural expression (arabic):

"(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);"

the runtime of my preparsed vs interpreted implementation is
0.1-0.5us vs 3us testing on a few small n.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-28 14:01                   ` Szabolcs Nagy
@ 2014-07-28 16:27                     ` Rich Felker
  2014-07-29 13:49                       ` Szabolcs Nagy
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2014-07-28 16:27 UTC (permalink / raw)
  To: musl

On Mon, Jul 28, 2014 at 04:01:52PM +0200, Szabolcs Nagy wrote:
> * Szabolcs Nagy <nsz@port70.net> [2014-07-28 15:00:17 +0200]:
> > * Szabolcs Nagy <nsz@port70.net> [2014-07-28 12:18:30 +0200]:
> > > the parser and eval code is about 2k now, i can try to do it
> > > without a separate parsing step (my approach requires a 100-200
> > > byte buffer to store the parsed expr now)
> > > 
> > 
> > attached a simpler solution without separate parsing
> > (code is about 1.4k now, and it is more compatible
> > with gnu gettext)
> > 
> 
> using a complex plural expression (arabic):
> 
> "(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);"
> 
> the runtime of my preparsed vs interpreted implementation is
> 0.1-0.5us vs 3us testing on a few small n.

My leaning is to go with the version that's smaller and more flexible;
I think the time spent in this function will usually be heavily
dominated by the binary search for the message text. But it's cool to
have both for possible future uses (independent of musl, even).

BTW one way to reduce the cost is to skip the whole plural computation
when msgid1==msgid2 (as pointers). This is always true when dcngettext
is called by one of the "non-n" gettext functions.

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Non-stub gettext API functions committed, ready for testing
  2014-07-28 16:27                     ` Rich Felker
@ 2014-07-29 13:49                       ` Szabolcs Nagy
  0 siblings, 0 replies; 14+ messages in thread
From: Szabolcs Nagy @ 2014-07-29 13:49 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

* Rich Felker <dalias@libc.org> [2014-07-28 12:27:32 -0400]:
> My leaning is to go with the version that's smaller and more flexible;
> I think the time spent in this function will usually be heavily
> dominated by the binary search for the message text. But it's cool to
> have both for possible future uses (independent of musl, even).

i have a bit smaller pleval.c version:
non-pic .o with -Os is 980 vs 826 bytes here,
has about the same speed,
binary op parsing is a bit magical otherwise clean

[-- Attachment #2: pleval.c --]
[-- Type: text/x-csrc, Size: 3284 bytes --]

#include <stdlib.h>
#include <ctype.h>

/*
grammar:

Start = Expr ';'
Expr  = Or | Or '?' Expr ':' Expr
Or    = And | Or '||' And
And   = Eq | And '&&' Eq
Eq    = Rel | Eq '==' Rel | Eq '!=' Rel
Rel   = Add | Rel '<=' Add | Rel '>=' Add | Rel '<' Add | Rel '>' Add
Add   = Mul | Add '+' Mul | Add '-' Mul
Mul   = Term | Mul '*' Term | Mul '/' Term | Mul '%' Term
Term  = '(' Expr ')' | '!' Term | decimal | 'n'

internals:

recursive descent expression evaluator with stack depth limit.
eval* functions return the value of the subexpression and set
the current string pointer to the next non-space char.
*/

struct st {
	const char *s;
	unsigned long n;
	int err;
};

static const char *skipspace(const char *s)
{
	while (isspace(*s)) s++;
	return s;
}

static unsigned long fail(struct st *st)
{
	st->err = 1;
	return 0;
}

static unsigned long evalexpr(struct st *st, int d);

static unsigned long evalterm(struct st *st, int d)
{
	unsigned long a;
	char *e;
	if (--d < 0) return fail(st);
	st->s = skipspace(st->s);
	if (*st->s == '!') {
		st->s++;
		return !evalterm(st, d);
	}
	if (*st->s == '(') {
		st->s++;
		a = evalexpr(st, d);
		if (*st->s != ')') return fail(st);
		st->s = skipspace(st->s + 1);
		return a;
	}
	if (*st->s == 'n') {
		st->s = skipspace(st->s + 1);
		return st->n;
	}
	a = strtoul(st->s, &e, 10);
	if (!isdigit(*st->s) || e == st->s || a == -1) return fail(st);
	st->s = skipspace(e);
	return a;
}

static unsigned long binop(struct st *st, int op, unsigned long a, unsigned long b)
{
	switch (op&0xff) {
	case 0: return a||b;
	case 1: return a&&b;
	case 2: return a==b;
	case 3: return a!=b;
	case 4: return a>=b;
	case 5: return a<=b;
	case 6: return a>b;
	case 7: return a<b;
	case 8: return a-b;
	case 9: return a+b;
	case 10: return b ? a%b : fail(st);
	case 11: return b ? a/b : fail(st);
	case 12: return a*b;
	}
	return fail(st);
}

static int parseop(struct st *st)
{
	static const char opch[18] = "|&=!><-+%/*\0|&====";
	static const char prec[] = {1,2,3,3,4,4,5,5,6,6,6};
	int i, p;
	for (i=0; opch[i]; i++)
		if (*st->s == opch[i]) {
			p = prec[i]<<8;
			if (i<6 && st->s[1] == opch[i+12]) {
				st->s+=2;
				return i | p;
			}
			if (i>=4) {
				st->s++;
				return i+2 | p;
			}
			return 0;
		}
	return 0;
}

static unsigned long evalbinop2(struct st *st, int op, unsigned long a, int d)
{
	unsigned long a2;
	int op2, highprec;
	d--;
	for (;;) {
		a2 = evalterm(st, d);
		op2 = parseop(st);
		highprec = op2>>8 > op>>8;
		if (highprec)
			a2 = evalbinop2(st, op2, a2, d);
		a = binop(st, op, a, a2);
		if (!op2 || highprec)
			return a;
		op = op2;
	}
}

static unsigned long evalbinop(struct st *st, int d)
{
	unsigned long a;
	int op;

	a = evalterm(st, d);
	op = parseop(st);
	if (!op) return a;
	return evalbinop2(st, op, a, d);
}

static unsigned long evalexpr(struct st *st, int d)
{
	unsigned long a1, a2, a3;
	if (--d < 0) return fail(st);
	a1 = evalbinop(st, d);
	if (*st->s != '?')
		return a1;
	st->s++;
	a2 = evalexpr(st, d);
	if (*st->s != ':') return fail(st);
	st->s++;
	a3 = evalexpr(st, d);
	return a1 ? a2 : a3;
}

unsigned long __pleval(const char *s, unsigned long n)
{
	unsigned long a;
	struct st st;
	st.s = s;
	st.n = n;
	st.err = 0;
	a = evalexpr(&st, 100);
	if (st.err || *st.s != ';')
		return -1;
	return a;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-07-29 13:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-27  8:46 Non-stub gettext API functions committed, ready for testing Rich Felker
2014-07-27 10:06 ` Harald Becker
2014-07-27 14:14   ` Szabolcs Nagy
2014-07-27 16:49     ` Rich Felker
2014-07-27 17:23       ` Szabolcs Nagy
2014-07-27 17:36         ` Rich Felker
2014-07-27 17:51           ` Szabolcs Nagy
2014-07-27 18:00             ` Rich Felker
2014-07-28 10:18               ` Szabolcs Nagy
2014-07-28 13:00                 ` Szabolcs Nagy
2014-07-28 14:01                   ` Szabolcs Nagy
2014-07-28 16:27                     ` Rich Felker
2014-07-29 13:49                       ` Szabolcs Nagy
2014-07-27 10:19 ` Harald Becker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).