Hello, In TMSR we've made extensive use of musl, due to the very welcome dose of clear and concise code it provides as compared to the competition [1]. For example we have a static Ada compiler [2], the Bitcoin reference implementation [3], a reproducible and self-contained Gentoo system [4], and not least of all my own distribution [5] used in my consulting business [6]. However, the apparent goal of aggressive expansion of Unicode and localization "features" in musl sets off alarms; for instance, on the roadmap [7] I see: > Unicode 12.1 update and related character handling work > Locale support overhaul. > Hostname resolver support for non-ASCII domains (IDN) > LC_COLLATE support for collation orders other than simple codepoint order > Support for LC_MONETARY and LC_NUMERIC properties. > Message translation support for dynamic linker > Locale data and libc message translations We think this is such a bad idea that it threatens to undermine musl's otherwise substantial virtues. This kind of bloat imposes real costs on the users that matter - namely the literate ones, who value predictable, stable and bug-free code - in exchange for entirely unclear benefits. Especially considering the rate at which bugs are still turning up, there is no justification for this added complexity. In any event we will not be using "upgrades" that import additional nonsense into this critical system component. I'll be happy to discuss further here, in my blog comments or on irc [8]. Yours, J. Welsh [1] http://trinque.org/2019/12/29/a-republican-os-part-2/ [2] http://ave1.org/2018/building-gnat-on-musl/ [3] http://therealbitcoin.org/ml/btc-dev/2015-July/000133.html [4] http://trinque.org/2018/11/27/cuntoo-bootstrapper/ [5] http://fixpoint.welshcomputing.com/2019/introducing-gales-linux [6] http://dorion-mode.com/2019/11/jwrd-computing [7] https://wiki.musl-libc.org/roadmap.html [8] #ossasepia or #trilema on freenode; PM me (jfw) or someone talking to ask for voice.
[-- Attachment #1.1: Type: text/plain, Size: 3979 bytes --] On 18/02/2020 13:38, Jacob Welsh wrote: > Hello, > > In TMSR we've made extensive use of musl, due to the very welcome dose > of clear and concise code it provides as compared to the competition > [1]. For example we have a static Ada compiler [2], the Bitcoin > reference implementation [3], a reproducible and self-contained Gentoo > system [4], and not least of all my own distribution [5] used in my > consulting business [6]. > > However, the apparent goal of aggressive expansion of Unicode and > localization "features" in musl sets off alarms; for instance, on the > roadmap [7] I see: Why do you not believe that musl could provide any of these features using clear and concise code? >> Unicode 12.1 update and related character handling work This is necessary for actual real-world users that need to use the symbols added since the last Unicode update. For example, Unicode 12.1 added the symbol for the new Japanese era, Reiwa Era. You will be unable to represent current dates in the Japanese calendar without this update. >> Locale support overhaul. Also very important for real-world users that wish to use languages besides English to communicate with their computer. >> Hostname resolver support for non-ASCII domains (IDN) IDN domains are gaining significant traction, especially in Asia and the Middle East. >> LC_COLLATE support for collation orders other than simple codepoint order I have been personally impacted by the lack of LC_COLLATE support. >> Support for LC_MONETARY and LC_NUMERIC properties. This is necessary for a better desktop experience; especially LC_NUMERIC is egregious since many cultures/countries utilise , as the decimal separator. >> Message translation support for dynamic linker This will allow non-English speakers the ability to understand the errors that are happening on the computers they own. >> Locale data and libc message translations This is somewhat already possible with https://github.com/rilian-la-te/musl-locales - it would basically just be upstreaming the translation files into musl proper (to ensure they are kept up-to-date) and adding messages that are not already translated. > We think this is such a bad idea that it threatens to undermine musl's > otherwise substantial virtues. This kind of bloat imposes real costs on > the users that matter - namely the literate ones, who value predictable, > stable and bug-free code - in exchange for entirely unclear benefits. No one user matters more than another. musl's own self-description is: "musl is lightweight, fast, simple, free, and strives to be correct in the sense of standards-conformance and safety." Locale support can be lightweight, fast, simple, free, and correct. In fact, musl is *not* conformant to the POSIX standard *because* it does not implement the requisite locale support. The benefits are the ability for people in non-English speaking cultures and countries to be able to use systems based on musl instead of being stuck with inferior alternatives. Anglocentrism has no place in Libre software. > Especially considering the rate at which bugs are still turning up, > there is no justification for this added complexity. In any event we > will not be using "upgrades" that import additional nonsense into this > critical system component. There is absolutely justification for these features: Wolfram Alpha[1] quotes the number of English speakers to be approximately 11% of the world population. That means 89% of living people on Earth cannot currently fully utilise musl-based systems the way they could if it was possible to support non-English languages. Adding better locale support will fix this. --arw [1]: https://www.wolframalpha.com/input/?i=number+of+english+speakers -- A. Wilcox (awilfox) Project Lead, Adélie Linux https://www.adelielinux.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --]
>No one user matters more than another.
Except the ones actually doing the work.
Hello i discovered recently a race condition while playing with threads and sem_wait/sem_post sem_wait may fail with errno set EAGAIN which is not valid since only sem_trywait is able to set that errno code. this was causing a bug with a later select() and accept() which failed since accept does not work if errno is set to EAGAIN. from my point of view the bug is in sem_timedwait.c if (!sem_trywait(sem)) return 0; int spins = 100; while (spins-- && sem->__val[0] <= 0 && !sem->__val[1]) a_spin(); while (sem_trywait(sem)) { the fist sem_trywait will fail with -1 and sets EAGAIN. but the second sem_trywait will not fail and does return 0. the problem now is that errno is still present and not reset. this may cause if sem_post is called from a second thread on the same semaphore. of course the same bug affects sem_timedwait itself. so i assume sem_wait is not thread safe which is bad and is not follow the posix specification or am i wrong here? Sebastian
[-- Attachment #1: Type: text/plain, Size: 1047 bytes --] Hello i discovered recently a race condition while playing with threads and sem_wait/sem_post sem_wait may fail with errno set EAGAIN which is not valid since only sem_trywait is able to set that errno code. this was causing a bug with a later select() and accept() which failed since accept does not work if errno is set to EAGAIN. from my point of view the bug is in sem_timedwait.c if (!sem_trywait(sem)) return 0; int spins = 100; while (spins-- && sem->__val[0] <= 0 && !sem->__val[1]) a_spin(); while (sem_trywait(sem)) { the fist sem_trywait will fail with -1 and sets EAGAIN. but the second sem_trywait will not fail and does return 0. the problem now is that errno is still present and not reset. this may cause if sem_post is called from a second thread on the same semaphore. of course the same bug affects sem_timedwait itself. so i assume sem_wait is not thread safe which is bad and is not follow the posix specification or am i wrong here? Sebastian [-- Attachment #2: Type: text/html, Size: 1710 bytes --]
On Tue, Feb 18, 2020 at 07:38:29PM +0000, Jacob Welsh wrote: > Hello, > > In TMSR we've made extensive use of musl, due to the very welcome > dose of clear and concise code it provides as compared to the > competition [1]. For example we have a static Ada compiler [2], the > Bitcoin reference implementation [3], a reproducible and > self-contained Gentoo system [4], and not least of all my own > distribution [5] used in my consulting business [6]. > > However, the apparent goal of aggressive expansion of Unicode and > localization "features" in musl sets off alarms; for instance, on > the roadmap [7] I see: I think you're rather under-informed on this topic. Basically none of the following add any complexity: > >Unicode 12.1 update and related character handling work This was (1) an update of existing tables and (2) throwing out hand-written case mapping code that made lots of fragile assumptions and had to be updated by hand with every addition of new case mappings, and that got slower with each addition, and replacing it with a table-based approach I'd designed a year or so ago that's more like the rest of the character tables and admits automatic generation. > >Locale support overhaul. This is not adding anything new but fixing bugs where the code that's already there doesn't work as intended. > >Hostname resolver support for non-ASCII domains (IDN) > > >LC_COLLATE support for collation orders other than simple codepoint order These have been serious missing functionality since the beginning. There is no change here. If you missed them being on the roadmap for the past 6+ years, you weren't looking very closely. > >Support for LC_MONETARY and LC_NUMERIC properties. This is the only item that's controversial, but you don't seem to be coming from a good position to have input on it. > >Message translation support for dynamic linker This has also been on the agenda for a long time. It's the only place in musl where format strings containing natural-language text are used, and format strings are not candidates for translation because it's unsafe (data can replace format specifiers with incompatible ones), making it inconsistent with the rest of musl which does have message translation support. > >Locale data and libc message translations This is purely a matter of creating data to be used with functionality that already exists. > We think this is such a bad idea that it threatens to undermine > musl's otherwise substantial virtues. This kind of bloat imposes > real costs on the users that matter - namely the literate ones, who > value predictable, stable and bug-free code - in exchange for > entirely unclear benefits. If you think the above imply bloat, musl must already be bloated. You should probably be aware that first-class support for all characters in Unicode (vs glibc's bloated gconv-plugin layer for UTF-8 which originally made GNU grep over 100x slower than in 8-bit codepage locales) was _THE_ original motivation for what became musl. None of this is new. Not treating users like they're "illiterate" if they want to be able to write their own name has always been the most important core value of the project, and your attitude towards the matter here does not make me interested in going out of my way to cater to you. I suspect others in this community feel similarly. > Especially considering the rate at which bugs are still turning up, > there is no justification for this added complexity. In any event we > will not be using "upgrades" that import additional nonsense into > this critical system component. If you want to stick with old versions and maintain them yourself or pay someone else to do so, that's your choice. Rich
On Wed, Feb 19, 2020 at 01:46:34AM +0100, Sebastian Gottschall wrote:
> Hello
>
> i discovered recently a race condition while playing with threads
> and sem_wait/sem_post
> sem_wait may fail with errno set EAGAIN which is not valid since
> only sem_trywait is able to set that errno code.
> this was causing a bug with a later select() and accept() which
> failed since accept does not work if errno is set to EAGAIN.
> from my point of view the bug is in sem_timedwait.c
>
> if (!sem_trywait(sem)) return 0;
>
> int spins = 100;
> while (spins-- && sem->__val[0] <= 0 && !sem->__val[1]) a_spin();
>
> while (sem_trywait(sem)) {
>
>
> the fist sem_trywait will fail with -1 and sets EAGAIN. but the
> second sem_trywait will not fail and does return 0. the problem now
> is that errno is still present and not reset.
> this may cause if sem_post is called from a second thread on the
> same semaphore.
> of course the same bug affects sem_timedwait itself.
> so i assume sem_wait is not thread safe which is bad and is not
> follow the posix specification
>
> or am i wrong here?
errno is only meaningful on failure; unless specified otherwise (a few
functions are special because you can't [easily] distinguish success
from failure for them without examining errno), any standard function
may have changed the value of errno when it returns with success. The
only thing it's not allowed to do is clear it (set it to 0).
Rich
Sebastian Gottschall <s.gottschall@newmedia-net.de> writes: > Hello > > i discovered recently a race condition while playing with threads and > sem_wait/sem_post > sem_wait may fail with errno set EAGAIN which is not valid since only > sem_trywait is able to set that errno code. > this was causing a bug with a later select() and accept() which failed > since accept does not work if errno is set to EAGAIN. Whether select/accept work or not should not be impacted by any existing value in errno. > from my point of view the bug is in sem_timedwait.c > > if (!sem_trywait(sem)) return 0; > > int spins = 100; > while (spins-- && sem->__val[0] <= 0 && !sem->__val[1]) a_spin(); > > while (sem_trywait(sem)) { > > > the fist sem_trywait will fail with -1 and sets EAGAIN. but the second > sem_trywait will not fail and does return 0. the problem now is that > errno is still present and not reset. > this may cause if sem_post is called from a second thread on the same > semaphore. > of course the same bug affects sem_timedwait itself. > so i assume sem_wait is not thread safe which is bad and is not follow > the posix specification To quote POSIX [1]: The value of errno should only be examined when it is indicated to be valid by a function's return value. [...] The setting of errno after a successful call to a function is unspecified unless the description of that function specifies that errno shall not be modified. If sem_wait() returns zero, then the value in errno after the call returns is not meaningful in any way. [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/errno.html > > or am i wrong here? > > > Sebastian Bobby
Am 19.02.2020 um 04:39 schrieb Rich Felker: > On Wed, Feb 19, 2020 at 01:46:34AM +0100, Sebastian Gottschall wrote: >> Hello >> >> i discovered recently a race condition while playing with threads >> and sem_wait/sem_post >> sem_wait may fail with errno set EAGAIN which is not valid since >> only sem_trywait is able to set that errno code. >> this was causing a bug with a later select() and accept() which >> failed since accept does not work if errno is set to EAGAIN. >> from my point of view the bug is in sem_timedwait.c >> >> if (!sem_trywait(sem)) return 0; >> >> int spins = 100; >> while (spins-- && sem->__val[0] <= 0 && !sem->__val[1]) a_spin(); >> >> while (sem_trywait(sem)) { >> >> >> the fist sem_trywait will fail with -1 and sets EAGAIN. but the >> second sem_trywait will not fail and does return 0. the problem now >> is that errno is still present and not reset. >> this may cause if sem_post is called from a second thread on the >> same semaphore. >> of course the same bug affects sem_timedwait itself. >> so i assume sem_wait is not thread safe which is bad and is not >> follow the posix specification >> >> or am i wrong here? > errno is only meaningful on failure; unless specified otherwise (a few > functions are special because you can't [easily] distinguish success > from failure for them without examining errno), any standard function > may have changed the value of errno when it returns with success. The > only thing it's not allowed to do is clear it (set it to 0). the problem is the posix manual specifies exclicit that EAGAIN cannot be returned by sem_wait and in my code sample the following happens sem_wait(semaphort) select(....) socket = accept(....) -> fails accept fails because sem_wait did set errno to EAGAIN and accept will fail if errno is set to EAGAIN i use sem_wait to limit the number of threads in my webserver. on the thread itself i call sem_post. but to make it work correct i have to set errno=0 before calling accept since accept will not work if errno is set to EAGAIN if you read the posix man for accept, you will find out that accept will read errno unconditional and this is also the case for the musl implementation Sebastian > > Rich >
On Wed, Feb 19, 2020 at 09:26:30AM +0100, Sebastian Gottschall wrote:
>
> Am 19.02.2020 um 04:39 schrieb Rich Felker:
> >On Wed, Feb 19, 2020 at 01:46:34AM +0100, Sebastian Gottschall wrote:
> >>Hello
> >>
> >>i discovered recently a race condition while playing with threads
> >>and sem_wait/sem_post
> >>sem_wait may fail with errno set EAGAIN which is not valid since
> >>only sem_trywait is able to set that errno code.
> >>this was causing a bug with a later select() and accept() which
> >>failed since accept does not work if errno is set to EAGAIN.
> >>from my point of view the bug is in sem_timedwait.c
> >>
> >> if (!sem_trywait(sem)) return 0;
> >>
> >> int spins = 100;
> >> while (spins-- && sem->__val[0] <= 0 && !sem->__val[1]) a_spin();
> >>
> >> while (sem_trywait(sem)) {
> >>
> >>
> >>the fist sem_trywait will fail with -1 and sets EAGAIN. but the
> >>second sem_trywait will not fail and does return 0. the problem now
> >>is that errno is still present and not reset.
> >>this may cause if sem_post is called from a second thread on the
> >>same semaphore.
> >>of course the same bug affects sem_timedwait itself.
> >>so i assume sem_wait is not thread safe which is bad and is not
> >>follow the posix specification
> >>
> >>or am i wrong here?
> >errno is only meaningful on failure; unless specified otherwise (a few
> >functions are special because you can't [easily] distinguish success
> >from failure for them without examining errno), any standard function
> >may have changed the value of errno when it returns with success. The
> >only thing it's not allowed to do is clear it (set it to 0).
> the problem is the posix manual specifies exclicit that EAGAIN
> cannot be returned by sem_wait and in my code sample
>
> the following happens
>
> sem_wait(semaphort)
> select(....)
> socket = accept(....) -> fails
>
> accept fails because sem_wait did set errno to EAGAIN and accept
> will fail if errno is set to EAGAIN
> i use sem_wait to limit the number of threads in my webserver. on
> the thread itself i call sem_post.
> but to make it work correct i have to set errno=0 before calling
> accept since accept will not work if errno is set to EAGAIN
> if you read the posix man for accept, you will find out that accept
> will read errno unconditional and this is also the case for the musl
> implementation
accept does not use errno as input. Unless I'm forgetting something,
no interfaces in libc except perror, syslog (%m), and *printf (%m
extension) use errno as input. If accept is failing (returning -1)
with errno==EAGAIN it's not because errno was EAGAIN before you called
it but because your listening socket is in non-blocking mode and there
is no pending connection to accept.
Rich
On Tue, 18 Feb 2020, A. Wilcox wrote: > Why do you not believe that musl could provide any of these features > using clear and concise code? I fully expect it could. The point at issue however is whether it should be done at all. > I have been personally impacted by the lack of LC_COLLATE support. I have been personally "impacted" by its presence in glibc, but perhaps I'm not the sort of "real world" user whose needs you would like to represent. > This will allow non-English speakers the ability to understand the > errors that are happening on the computers they own. You may be overestimating a bit there the abilities of most English speakers to "communicate with their computers" or specifically to decode error messages; anyway, what your approach actually does is to fragment the knowledge base and herd people *away* from where they might find the best information. Now, this dispute is at least as old as the Protestant Reformation so I do not expect or require it to be settled here. > In fact, musl is *not* conformant to the POSIX standard *because* it > does not implement the requisite locale support. We're prepared to fork POSIX or any other document that proves necessary. Not like it's hard. On the other hand, I suppose someone will get right to work translating POSIX and all the musl code and commentary to every presently spoken language, because after all they look mighty Anglocentric to me and no one coder's needs matter more than another's. Sarcasm aside, I'm satisfied that our differences have been made clear and am happy to let it rest. Yours truly, J. Welsh http://fixpoint.welshcomputing.com/
On Wed, Feb 19, 2020 at 09:28:10PM +0000, Jacob Welsh wrote: > On Tue, 18 Feb 2020, A. Wilcox wrote: > > >Why do you not believe that musl could provide any of these features > >using clear and concise code? > > I fully expect it could. The point at issue however is whether it > should be done at all. > > >I have been personally impacted by the lack of LC_COLLATE support. > > I have been personally "impacted" by its presence in glibc, but > perhaps I'm not the sort of "real world" user whose needs you would > like to represent. You avoid this by not setting LANG, LC_COLLATE, or LC_ALL in your environment, or by ensuring that the one that takes precedence yields a result of C or C.UTF-8 for the LC_COLLATE category. Plenty of users, myself included, prefer codepoint order for directory listings and such. This does not conflict in any way with providing support for other collation orders that are useful for things like sorting natural-language CSV tables, etc. > >In fact, musl is *not* conformant to the POSIX standard *because* > >it does not implement the requisite locale support. > > We're prepared to fork POSIX or any other document that proves > necessary. Not like it's hard. No comment. Rich