* Non-stub gettext API functions committed, ready for testing @ 2014-07-27 8:46 Rich Felker 2014-07-27 10:06 ` Harald Becker 2014-07-27 10:19 ` Harald Becker 0 siblings, 2 replies; 14+ messages in thread From: Rich Felker @ 2014-07-27 8:46 UTC (permalink / raw) To: musl As of commit 2068b4e8911a3a49cded44b4568f6c943a8c98f8, it should now be possible to support message translation at the application level using the gettext (libintl.h) functions provided by musl. Feedback from users interested in this functionality would be much appreciated! Note that some (many?) applications may attempt to use their own included gettext rather than the one in libc, so before testing it would be helpful to check this and make sure the functions in musl are actually getting called. I'm not familiar with what types of checks typical autoconf scripts do to choose whether to use libc gettext or their own, so information on this topic would also be helpful, especially if there's anything we could do to get apps to choose the one in musl rather than pulling in their own bloated copy of GNU gettext. As mentioned in the commit message, some functionality is still missing. For the plurals stuff, I can't find the information on how you actually get the plural rules out of the .mo file and apply them. For the LANGUAGE variable, it's just a matter of adding some loop-and-retry logic. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-27 8:46 Non-stub gettext API functions committed, ready for testing Rich Felker @ 2014-07-27 10:06 ` Harald Becker 2014-07-27 14:14 ` Szabolcs Nagy 2014-07-27 10:19 ` Harald Becker 1 sibling, 1 reply; 14+ messages in thread From: Harald Becker @ 2014-07-27 10:06 UTC (permalink / raw) To: musl Hi Rich ! > As mentioned in the commit message, some functionality is still > missing. For the plurals stuff, I can't find the information on how > you actually get the plural rules out of the .mo file and apply them. > For the LANGUAGE variable, it's just a matter of adding some > loop-and-retry logic. Does this text help to clarify the plurals question? http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-27 10:06 ` Harald Becker @ 2014-07-27 14:14 ` Szabolcs Nagy 2014-07-27 16:49 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Szabolcs Nagy @ 2014-07-27 14:14 UTC (permalink / raw) To: musl * Harald Becker <ralda@gmx.de> [2014-07-27 12:06:01 +0200]: > >As mentioned in the commit message, some functionality is still > >missing. For the plurals stuff, I can't find the information on how > >you actually get the plural rules out of the .mo file and apply them. > >For the LANGUAGE variable, it's just a matter of adding some > >loop-and-retry logic. > > Does this text help to clarify the plurals question? > > http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms it shows that a c arithmetic expression parser is needed to handle plurals (and the expression has to be evaluated every time dcngettext is invoked) Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; The nplurals value must be a decimal number which specifies how many different plural forms exist for this language. The string following plural is an expression which is using the C language syntax. Exceptions are that no negative numbers are allowed, numbers must be decimal, and the only variable allowed is n. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-27 14:14 ` Szabolcs Nagy @ 2014-07-27 16:49 ` Rich Felker 2014-07-27 17:23 ` Szabolcs Nagy 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2014-07-27 16:49 UTC (permalink / raw) To: musl On Sun, Jul 27, 2014 at 04:14:18PM +0200, Szabolcs Nagy wrote: > * Harald Becker <ralda@gmx.de> [2014-07-27 12:06:01 +0200]: > > >As mentioned in the commit message, some functionality is still > > >missing. For the plurals stuff, I can't find the information on how > > >you actually get the plural rules out of the .mo file and apply them. > > >For the LANGUAGE variable, it's just a matter of adding some > > >loop-and-retry logic. > > > > Does this text help to clarify the plurals question? > > > > http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms I read that before but thought it shows how the data is written in po files but not where to find it in the mo file... But now I see it's in the "header" that's, by convention, the translation for "". How ugly.. > it shows that a c arithmetic expression parser is needed to handle plurals > (and the expression has to be evaluated every time dcngettext is invoked) Not necessarily. You could cache results. Or (this is likely the more reasonable implementation) just hard-code the expression strings that are actually used for real languages and implement them in C when a match is found. > Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; > > The nplurals value must be a decimal number which specifies how many > different plural forms exist for this language. The string following > plural is an expression which is using the C language syntax. > Exceptions are that no negative numbers are allowed, numbers must be > decimal, and the only variable allowed is n. This is a very poor description. Does it allow casts? Compound literals? Floating point? Function calls? ...? Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-27 16:49 ` Rich Felker @ 2014-07-27 17:23 ` Szabolcs Nagy 2014-07-27 17:36 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Szabolcs Nagy @ 2014-07-27 17:23 UTC (permalink / raw) To: musl * Rich Felker <dalias@libc.org> [2014-07-27 12:49:21 -0400]: > On Sun, Jul 27, 2014 at 04:14:18PM +0200, Szabolcs Nagy wrote: > > it shows that a c arithmetic expression parser is needed to handle plurals > > (and the expression has to be evaluated every time dcngettext is invoked) > > Not necessarily. You could cache results. Or (this is likely the more > reasonable implementation) just hard-code the expression strings that > are actually used for real languages and implement them in C when a > match is found. > hardcoding the strings will fail if .mo files are updated to use different expressions with caching the expr has to be evaluated for every uncached n > > Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; > > > > The nplurals value must be a decimal number which specifies how many > > different plural forms exist for this language. The string following > > plural is an expression which is using the C language syntax. > > Exceptions are that no negative numbers are allowed, numbers must be > > decimal, and the only variable allowed is n. > > This is a very poor description. Does it allow casts? Compound > literals? Floating point? Function calls? ...? > the parser in gnu gettext: http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y so they implement conditional (?:) logic (&&, ||, !) relational (==, !=, <, >, <=, >=) and arithmetic (+, -, *, /, %) operators with unsigned long args only but in practice the only arithmetic operator used is % and operators are not combined arbitrarily ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-27 17:23 ` Szabolcs Nagy @ 2014-07-27 17:36 ` Rich Felker 2014-07-27 17:51 ` Szabolcs Nagy 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2014-07-27 17:36 UTC (permalink / raw) To: musl On Sun, Jul 27, 2014 at 07:23:09PM +0200, Szabolcs Nagy wrote: > * Rich Felker <dalias@libc.org> [2014-07-27 12:49:21 -0400]: > > On Sun, Jul 27, 2014 at 04:14:18PM +0200, Szabolcs Nagy wrote: > > > it shows that a c arithmetic expression parser is needed to handle plurals > > > (and the expression has to be evaluated every time dcngettext is invoked) > > > > Not necessarily. You could cache results. Or (this is likely the more > > reasonable implementation) just hard-code the expression strings that > > are actually used for real languages and implement them in C when a > > match is found. > > > > hardcoding the strings will fail if .mo files are updated to > use different expressions > > with caching the expr has to be evaluated for every uncached n Yes. > > > Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; > > > > > > The nplurals value must be a decimal number which specifies how many > > > different plural forms exist for this language. The string following > > > plural is an expression which is using the C language syntax. > > > Exceptions are that no negative numbers are allowed, numbers must be > > > decimal, and the only variable allowed is n. > > > > This is a very poor description. Does it allow casts? Compound > > literals? Floating point? Function calls? ...? > > > > the parser in gnu gettext: > http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y > > so they implement > conditional (?:) > logic (&&, ||, !) > relational (==, !=, <, >, <=, >=) > and arithmetic (+, -, *, /, %) > operators with unsigned long args only And parentheses? From what I can tell, that's not so bad. Anyone feel like writing an expression evaluator for it? I think recursive descent is fine as long as the length of the string being evaluated is capped at a sane length (or just keep a depth counter and abort the evaluation if it exceeds some reasonable limit). Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-27 17:36 ` Rich Felker @ 2014-07-27 17:51 ` Szabolcs Nagy 2014-07-27 18:00 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Szabolcs Nagy @ 2014-07-27 17:51 UTC (permalink / raw) To: musl * Rich Felker <dalias@libc.org> [2014-07-27 13:36:05 -0400]: > On Sun, Jul 27, 2014 at 07:23:09PM +0200, Szabolcs Nagy wrote: > > so they implement > > conditional (?:) > > logic (&&, ||, !) > > relational (==, !=, <, >, <=, >=) > > and arithmetic (+, -, *, /, %) > > operators with unsigned long args only > > And parentheses? > yes > >From what I can tell, that's not so bad. Anyone feel like writing an > expression evaluator for it? I think recursive descent is fine as long > as the length of the string being evaluated is capped at a sane length > (or just keep a depth counter and abort the evaluation if it exceeds > some reasonable limit). > i can try ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-27 17:51 ` Szabolcs Nagy @ 2014-07-27 18:00 ` Rich Felker 2014-07-28 10:18 ` Szabolcs Nagy 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2014-07-27 18:00 UTC (permalink / raw) To: musl On Sun, Jul 27, 2014 at 07:51:26PM +0200, Szabolcs Nagy wrote: > > >From what I can tell, that's not so bad. Anyone feel like writing an > > expression evaluator for it? I think recursive descent is fine as long > > as the length of the string being evaluated is capped at a sane length > > (or just keep a depth counter and abort the evaluation if it exceeds > > some reasonable limit). > > > > i can try OK. Some thoughts on implementation: It should probably accept the expression as a base+length rather than a C string so it can be used in-place from within the mo file "header" (this design might help for recursion anyway I suppose). And it should be safe against malicious changes to the expression during evaluation (at worst give wrong results or error out rather than risk of stack overflow, out-of-bounds reads, etc.) since I'm aiming to make the whole system safe against malicious translation files (assuming the caller doesn't use the results in unsafe ways like as a format string). Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-27 18:00 ` Rich Felker @ 2014-07-28 10:18 ` Szabolcs Nagy 2014-07-28 13:00 ` Szabolcs Nagy 0 siblings, 1 reply; 14+ messages in thread From: Szabolcs Nagy @ 2014-07-28 10:18 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1635 bytes --] * Rich Felker <dalias@libc.org> [2014-07-27 14:00:41 -0400]: > On Sun, Jul 27, 2014 at 07:51:26PM +0200, Szabolcs Nagy wrote: > > > >From what I can tell, that's not so bad. Anyone feel like writing an > > > expression evaluator for it? I think recursive descent is fine as long > > > as the length of the string being evaluated is capped at a sane length > > > (or just keep a depth counter and abort the evaluation if it exceeds > > > some reasonable limit). > > > > > > > i can try > > OK. Some thoughts on implementation: It should probably accept the > expression as a base+length rather than a C string so it can be used > in-place from within the mo file "header" (this design might help for > recursion anyway I suppose). And it should be safe against malicious > changes to the expression during evaluation (at worst give wrong > results or error out rather than risk of stack overflow, out-of-bounds > reads, etc.) since I'm aiming to make the whole system safe against > malicious translation files (assuming the caller doesn't use the > results in unsafe ways like as a format string). > ok i did something i parse the expression once and then do the eval separately so "changes to the expression during evaluation" does not apply (i expected the expr to be const and evaluated several times with different n) currently it checks if base[length-1] == ';' and then does not care about the length anymore, the first unexpected char ends the parsing the parser and eval code is about 2k now, i can try to do it without a separate parsing step (my approach requires a 100-200 byte buffer to store the parsed expr now) [-- Attachment #2: pl.h --] [-- Type: text/x-chdr, Size: 207 bytes --] // parse s into expr, returns -1 on failure int parse(unsigned char *expr, size_t elen, const char *s, size_t slen); // eval expr with input n unsigned long eval(const unsigned char *expr, unsigned long n); [-- Attachment #3: pl.c --] [-- Type: text/x-csrc, Size: 5628 bytes --] #include <stdlib.h> #include <ctype.h> #include <string.h> #include "pl.h" /* grammar: Start = Expr ';' Expr = Or | Or '?' Expr ':' Expr Or = And | Or '||' And And = Rel | And '&&' Rel Rel = Add | Add '==' Add | Add '!=' Add | Add '<=' Add | Add '>=' Add | Add '<' Add | Add '>' Add Add = Mul | Add '+' Mul | Add '-' Mul Mul = Term | Mul '*' Term | Mul '/' decimal | Mul '%' decimal Term = '(' Expr ')' | '!' Term | decimal | 'n' compared to gnu gettext: right side of / and % must be const (and non-zero), chained relational/eq operators are not allowed, decimal is at most 255 internals: parser is recursive descent, terminals are pushed on a stack that grows down, a binary op "Left op Right" is parsed into "op length-of-Right Right Left" so eval is easy to implement. op chars on the stack n c ! | & = < > + - * / % ? are var, const, neg, or, and, eq, less, greater, add, sub, mul, div, mod, cond parse* functions push the parsed rule on the stack and return a pointer to the next non-space char */ #include <stdio.h> struct st { unsigned char *p; unsigned char *e; }; static int ok(struct st *st) { return st->p != st->e; } static void fail(struct st *st) { st->p = st->e; } static void push(struct st *st, int c) { if (ok(st)) *--st->p = c; } static const char *skipspace(const char *s) { while (isspace(*s)) s++; return s; } static const char *parseconst(struct st *st, const char *s) { char *e; unsigned long n; n = strtoul(s, &e, 10); if (!isdigit(*s) || e == s || n > 255) fail(st); push(st, n); push(st, 'c'); return skipspace(e); } static const char *parseexpr(struct st *st, const char *s, int d); static const char *parseterm(struct st *st, const char *s, int d) { if (d <= 0) { fail(st); return s; } s = skipspace(s); if (*s == '!') { s = parseterm(st, s+1, d-1); push(st, '!'); return s; } if (*s == '(') { s = parseexpr(st, s+1, d-1); if (*s != ')') { fail(st); return s; } return skipspace(s+1); } if (*s == 'n') { push(st, 'n'); return skipspace(s+1); } return parseconst(st, s); } static const char *parsemul(struct st *st, const char *s, int d) { unsigned char *p; int op; s = parseterm(st, s, d-1); for (;;) { op = *s; p = st->p; if (op == '*') { s = parseterm(st, s+1, d-1); } else if (op == '/' || op == '%') { s = skipspace(s+1); if (*s == '0') { fail(st); return s; } s = parseconst(st, s); } else return s; push(st, p - st->p); push(st, op); } } static const char *parseadd(struct st *st, const char *s, int d) { unsigned char *p; int op; s = parsemul(st, s, d-1); for (;;) { op = *s; if (op != '+' && op != '-') return s; p = st->p; s = parsemul(st, s+1, d-1); push(st, p - st->p); push(st, op); } } static const char *parserel(struct st *st, const char *s, int d) { unsigned char *p; int neg = 0, op; s = parseadd(st, s, d-1); if (s[0] == '=' && s[1] == '=') { op = '='; s++; } else if (s[0] == '!' && s[1] == '=') { op = '='; neg = 1; s++; } else if (s[0] == '<' && s[1] == '=') { op = '>'; neg = 1; s++; } else if (s[0] == '<') { op = '<'; } else if (s[0] == '>' && s[1] == '=') { op = '<'; neg = 1; s++; } else if (s[0] == '>') { op = '>'; } else return s; p = st->p; s = parseadd(st, s+1, d-1); push(st, p - st->p); push(st, op); if (neg) push(st, '!'); return s; } static const char *parseand(struct st *st, const char *s, int d) { unsigned char *p; s = parserel(st, s, d-1); for (;;) { if (s[0] != '&' || s[1] != '&') return s; p = st->p; s = parserel(st, s+2, d-1); push(st, p - st->p); push(st, '&'); } } static const char *parseor(struct st *st, const char *s, int d) { unsigned char *p; s = parseand(st, s, d-1); for (;;) { if (s[0] != '|' || s[1] != '|') return s; p = st->p; s = parseand(st, s+2, --d); push(st, p - st->p); push(st, '|'); } } static const char *parseexpr(struct st *st, const char *s, int d) { unsigned char *p1, *p2; if (d <= 0) { fail(st); return s; } s = parseor(st, s, d-1); if (*s == '?') { p1 = st->p; s = parseexpr(st, s+1, d-1); p2 = st->p; if (*s != ':') fail(st); else s = parseexpr(st, s+1, d-1); push(st, p2 - st->p); push(st, p1 - st->p); push(st, '?'); } return s; } int parse(unsigned char *expr, size_t elen, const char *s, size_t slen) { const char *e = s + slen - 1; unsigned char *p; struct st st; if (*e != ';') return -1; if (elen > 200) elen = 200; st.e = expr; p = st.p = expr + elen; s = parseexpr(&st, s, 100); if (!ok(&st) || s != e) return -1; memmove(expr, st.p, p - st.p); return 0; } static unsigned long evalcond(const unsigned char *e, unsigned long n) { int offcond = *e++; unsigned long c = eval(e+offcond, n); int offtrue = *e++; return eval(c ? e+offtrue : e, n); } static unsigned long evalbin(int op, const unsigned char *e, unsigned long n) { int offleft = *e++; unsigned long right = eval(e, n); unsigned long left = eval(e+offleft, n); switch (op) { case '|': return left || right; case '&': return left && right; case '=': return left == right; case '<': return left < right; case '>': return left > right; case '+': return left + right; case '-': return left - right; case '*': return left * right; case '/': return left / right; case '%': return left % right; } return -1; } unsigned long eval(const unsigned char *e, unsigned long n) { int op = *e++; switch (op) { case 'n': return n; case 'c': return *e; case '!': return !eval(e, n); case '?': return evalcond(e, n); } return evalbin(op, e, n); } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-28 10:18 ` Szabolcs Nagy @ 2014-07-28 13:00 ` Szabolcs Nagy 2014-07-28 14:01 ` Szabolcs Nagy 0 siblings, 1 reply; 14+ messages in thread From: Szabolcs Nagy @ 2014-07-28 13:00 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 362 bytes --] * Szabolcs Nagy <nsz@port70.net> [2014-07-28 12:18:30 +0200]: > the parser and eval code is about 2k now, i can try to do it > without a separate parsing step (my approach requires a 100-200 > byte buffer to store the parsed expr now) > attached a simpler solution without separate parsing (code is about 1.4k now, and it is more compatible with gnu gettext) [-- Attachment #2: pl.c --] [-- Type: text/x-csrc, Size: 3698 bytes --] #include <stdlib.h> #include <ctype.h> #include "pl.h" /* grammar: Start = Expr ';' Expr = Or | Or '?' Expr ':' Expr Or = And | Or '||' And And = Eq | And '&&' Eq Eq = Rel | Eq '==' Rel | Eq '!=' Rel Rel = Add | Rel '<=' Add | Rel '>=' Add | Rel '<' Add | Rel '>' Add Add = Mul | Add '+' Mul | Add '-' Mul Mul = Term | Mul '*' Term | Mul '/' decimal | Mul '%' decimal Term = '(' Expr ')' | '!' Term | decimal | 'n' internals: recursive descent expression evaluator with stack depth limit. eval* functions return the value of the subexpression and set the current string pointer to the next non-space char. */ struct st { const char *s; unsigned long n; int err; }; static const char *skipspace(const char *s) { while (isspace(*s)) s++; return s; } static unsigned long evalconst(struct st *st) { char *e; unsigned long n; n = strtoul(st->s, &e, 10); if (!isdigit(*st->s) || e == st->s || n == -1) st->err = 1; st->s = skipspace(e); return n; } static unsigned long evalexpr(struct st *st, int d); static unsigned long evalterm(struct st *st, int d) { unsigned long a; if (d <= 0) { st->err = 1; return 0; } st->s = skipspace(st->s); if (*st->s == '!') { st->s++; return !evalterm(st, d-1); } if (*st->s == '(') { st->s++; a = evalexpr(st, d-1); if (*st->s != ')') { st->err = 1; return 0; } st->s = skipspace(st->s + 1); return a; } if (*st->s == 'n') { st->s = skipspace(st->s + 1); return st->n; } return evalconst(st); } static unsigned long evalmul(struct st *st, int d) { unsigned long b, a = evalterm(st, d-1); int op; for (;;) { op = *st->s; if (op != '*' && op != '/' && op != '%') return a; st->s++; b = evalterm(st, d-1); if (op == '*') { a *= b; } else if (!b) { st->err = 1; return 0; } else if (op == '%') { a %= b; } else { a /= b; } } } static unsigned long evaladd(struct st *st, int d) { unsigned long a = 0; int add = 1; for (;;) { a += (add?1:-1) * evalmul(st, d-1); if (*st->s != '+' && *st->s != '-') return a; add = *st->s == '+'; st->s++; } } static unsigned long evalrel(struct st *st, int d) { unsigned long b, a = evaladd(st, d-1); int less, eq; for (;;) { if (*st->s != '<' && *st->s != '>') return a; less = st->s[0] == '<'; eq = st->s[1] == '='; st->s += 1 + eq; b = evaladd(st, d-1); a = (less ? a < b : a > b) || (eq && a == b); } } static unsigned long evaleq(struct st *st, int d) { unsigned long a = evalrel(st, d-1); int neg; for (;;) { if ((st->s[0] != '=' && st->s[0] != '!') || st->s[1] != '=') return a; neg = st->s[0] == '!'; st->s += 2; a = evalrel(st, d-1) == a; a ^= neg; } } static unsigned long evaland(struct st *st, int d) { unsigned long a = evaleq(st, d-1); for (;;) { if (st->s[0] != '&' || st->s[1] != '&') return a; st->s += 2; a = evaleq(st, d-1) && a; } } static unsigned long evalor(struct st *st, int d) { unsigned long a = evaland(st, d-1); for (;;) { if (st->s[0] != '|' || st->s[1] != '|') return a; st->s += 2; a = evaland(st, d-1) || a; } } static unsigned long evalexpr(struct st *st, int d) { unsigned long a1, a2, a3; if (d <= 0) { st->err = 1; return 0; } a1 = evalor(st, d-1); if (*st->s != '?') return a1; st->s++; a2 = evalexpr(st, d-1); if (*st->s != ':') { st->err = 1; return 0; } st->s++; a3 = evalexpr(st, d-1); return a1 ? a2 : a3; } unsigned long eval(const char *s, size_t len, unsigned long n) { unsigned long a; const char *e = s+len-1; struct st st; if (*e != ';') return -1; st.s = s; st.n = n; st.err = 0; a = evalexpr(&st, 100); if (st.err || st.s != e) return -1; return a; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-28 13:00 ` Szabolcs Nagy @ 2014-07-28 14:01 ` Szabolcs Nagy 2014-07-28 16:27 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Szabolcs Nagy @ 2014-07-28 14:01 UTC (permalink / raw) To: musl * Szabolcs Nagy <nsz@port70.net> [2014-07-28 15:00:17 +0200]: > * Szabolcs Nagy <nsz@port70.net> [2014-07-28 12:18:30 +0200]: > > the parser and eval code is about 2k now, i can try to do it > > without a separate parsing step (my approach requires a 100-200 > > byte buffer to store the parsed expr now) > > > > attached a simpler solution without separate parsing > (code is about 1.4k now, and it is more compatible > with gnu gettext) > using a complex plural expression (arabic): "(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);" the runtime of my preparsed vs interpreted implementation is 0.1-0.5us vs 3us testing on a few small n. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-28 14:01 ` Szabolcs Nagy @ 2014-07-28 16:27 ` Rich Felker 2014-07-29 13:49 ` Szabolcs Nagy 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2014-07-28 16:27 UTC (permalink / raw) To: musl On Mon, Jul 28, 2014 at 04:01:52PM +0200, Szabolcs Nagy wrote: > * Szabolcs Nagy <nsz@port70.net> [2014-07-28 15:00:17 +0200]: > > * Szabolcs Nagy <nsz@port70.net> [2014-07-28 12:18:30 +0200]: > > > the parser and eval code is about 2k now, i can try to do it > > > without a separate parsing step (my approach requires a 100-200 > > > byte buffer to store the parsed expr now) > > > > > > > attached a simpler solution without separate parsing > > (code is about 1.4k now, and it is more compatible > > with gnu gettext) > > > > using a complex plural expression (arabic): > > "(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);" > > the runtime of my preparsed vs interpreted implementation is > 0.1-0.5us vs 3us testing on a few small n. My leaning is to go with the version that's smaller and more flexible; I think the time spent in this function will usually be heavily dominated by the binary search for the message text. But it's cool to have both for possible future uses (independent of musl, even). BTW one way to reduce the cost is to skip the whole plural computation when msgid1==msgid2 (as pointers). This is always true when dcngettext is called by one of the "non-n" gettext functions. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-28 16:27 ` Rich Felker @ 2014-07-29 13:49 ` Szabolcs Nagy 0 siblings, 0 replies; 14+ messages in thread From: Szabolcs Nagy @ 2014-07-29 13:49 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 501 bytes --] * Rich Felker <dalias@libc.org> [2014-07-28 12:27:32 -0400]: > My leaning is to go with the version that's smaller and more flexible; > I think the time spent in this function will usually be heavily > dominated by the binary search for the message text. But it's cool to > have both for possible future uses (independent of musl, even). i have a bit smaller pleval.c version: non-pic .o with -Os is 980 vs 826 bytes here, has about the same speed, binary op parsing is a bit magical otherwise clean [-- Attachment #2: pleval.c --] [-- Type: text/x-csrc, Size: 3284 bytes --] #include <stdlib.h> #include <ctype.h> /* grammar: Start = Expr ';' Expr = Or | Or '?' Expr ':' Expr Or = And | Or '||' And And = Eq | And '&&' Eq Eq = Rel | Eq '==' Rel | Eq '!=' Rel Rel = Add | Rel '<=' Add | Rel '>=' Add | Rel '<' Add | Rel '>' Add Add = Mul | Add '+' Mul | Add '-' Mul Mul = Term | Mul '*' Term | Mul '/' Term | Mul '%' Term Term = '(' Expr ')' | '!' Term | decimal | 'n' internals: recursive descent expression evaluator with stack depth limit. eval* functions return the value of the subexpression and set the current string pointer to the next non-space char. */ struct st { const char *s; unsigned long n; int err; }; static const char *skipspace(const char *s) { while (isspace(*s)) s++; return s; } static unsigned long fail(struct st *st) { st->err = 1; return 0; } static unsigned long evalexpr(struct st *st, int d); static unsigned long evalterm(struct st *st, int d) { unsigned long a; char *e; if (--d < 0) return fail(st); st->s = skipspace(st->s); if (*st->s == '!') { st->s++; return !evalterm(st, d); } if (*st->s == '(') { st->s++; a = evalexpr(st, d); if (*st->s != ')') return fail(st); st->s = skipspace(st->s + 1); return a; } if (*st->s == 'n') { st->s = skipspace(st->s + 1); return st->n; } a = strtoul(st->s, &e, 10); if (!isdigit(*st->s) || e == st->s || a == -1) return fail(st); st->s = skipspace(e); return a; } static unsigned long binop(struct st *st, int op, unsigned long a, unsigned long b) { switch (op&0xff) { case 0: return a||b; case 1: return a&&b; case 2: return a==b; case 3: return a!=b; case 4: return a>=b; case 5: return a<=b; case 6: return a>b; case 7: return a<b; case 8: return a-b; case 9: return a+b; case 10: return b ? a%b : fail(st); case 11: return b ? a/b : fail(st); case 12: return a*b; } return fail(st); } static int parseop(struct st *st) { static const char opch[18] = "|&=!><-+%/*\0|&===="; static const char prec[] = {1,2,3,3,4,4,5,5,6,6,6}; int i, p; for (i=0; opch[i]; i++) if (*st->s == opch[i]) { p = prec[i]<<8; if (i<6 && st->s[1] == opch[i+12]) { st->s+=2; return i | p; } if (i>=4) { st->s++; return i+2 | p; } return 0; } return 0; } static unsigned long evalbinop2(struct st *st, int op, unsigned long a, int d) { unsigned long a2; int op2, highprec; d--; for (;;) { a2 = evalterm(st, d); op2 = parseop(st); highprec = op2>>8 > op>>8; if (highprec) a2 = evalbinop2(st, op2, a2, d); a = binop(st, op, a, a2); if (!op2 || highprec) return a; op = op2; } } static unsigned long evalbinop(struct st *st, int d) { unsigned long a; int op; a = evalterm(st, d); op = parseop(st); if (!op) return a; return evalbinop2(st, op, a, d); } static unsigned long evalexpr(struct st *st, int d) { unsigned long a1, a2, a3; if (--d < 0) return fail(st); a1 = evalbinop(st, d); if (*st->s != '?') return a1; st->s++; a2 = evalexpr(st, d); if (*st->s != ':') return fail(st); st->s++; a3 = evalexpr(st, d); return a1 ? a2 : a3; } unsigned long __pleval(const char *s, unsigned long n) { unsigned long a; struct st st; st.s = s; st.n = n; st.err = 0; a = evalexpr(&st, 100); if (st.err || *st.s != ';') return -1; return a; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Non-stub gettext API functions committed, ready for testing 2014-07-27 8:46 Non-stub gettext API functions committed, ready for testing Rich Felker 2014-07-27 10:06 ` Harald Becker @ 2014-07-27 10:19 ` Harald Becker 1 sibling, 0 replies; 14+ messages in thread From: Harald Becker @ 2014-07-27 10:19 UTC (permalink / raw) To: musl > [gettext plurals] And this may also help: http://localization-guide.readthedocs.org/en/lates/l10n/pluralforms.html ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-07-29 13:49 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-07-27 8:46 Non-stub gettext API functions committed, ready for testing Rich Felker 2014-07-27 10:06 ` Harald Becker 2014-07-27 14:14 ` Szabolcs Nagy 2014-07-27 16:49 ` Rich Felker 2014-07-27 17:23 ` Szabolcs Nagy 2014-07-27 17:36 ` Rich Felker 2014-07-27 17:51 ` Szabolcs Nagy 2014-07-27 18:00 ` Rich Felker 2014-07-28 10:18 ` Szabolcs Nagy 2014-07-28 13:00 ` Szabolcs Nagy 2014-07-28 14:01 ` Szabolcs Nagy 2014-07-28 16:27 ` Rich Felker 2014-07-29 13:49 ` Szabolcs Nagy 2014-07-27 10:19 ` Harald Becker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).