* Constants to decode __ctype_b_loc() table @ 2014-10-15 10:41 Sergey Dmitrouk 2014-10-15 11:32 ` Szabolcs Nagy 0 siblings, 1 reply; 11+ messages in thread From: Sergey Dmitrouk @ 2014-10-15 10:41 UTC (permalink / raw) To: musl Hi, musl provides symbols for the following functions: - __ctype_b_loc - __ctype_tolower_loc - __ctype_toupper_loc The last two of them return values that do not need special interpretation. The first one returns value pointing to a table of bitmasks, but I'm unable to find where musl defines meaning of the bits. Values seem to be compatible to the ones defined by glibc though. Is this intentional (some kind of compatibility with other libc implementation) or definition of _IS... macros is just missing and you'd accept a patch to add them? Regards, Sergey ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-15 10:41 Constants to decode __ctype_b_loc() table Sergey Dmitrouk @ 2014-10-15 11:32 ` Szabolcs Nagy 2014-10-15 12:05 ` Sergey Dmitrouk 0 siblings, 1 reply; 11+ messages in thread From: Szabolcs Nagy @ 2014-10-15 11:32 UTC (permalink / raw) To: musl * Sergey Dmitrouk <sdmitrouk@accesssoftek.com> [2014-10-15 13:41:42 +0300]: > > musl provides symbols for the following functions: > > - __ctype_b_loc > - __ctype_tolower_loc > - __ctype_toupper_loc > these are nonsense abi required for ctype_base::mask feature of c++ so used by libstdc++ and libc++ > The last two of them return values that do not need special > interpretation. The first one returns value pointing to a table of > bitmasks, but I'm unable to find where musl defines meaning of the bits. > Values seem to be compatible to the ones defined by glibc though. > iirc the meaning of the bits are defined in the c++ standard either that or glibc <-> libstdc++ abi convention > Is this intentional (some kind of compatibility with other libc > implementation) or definition of _IS... macros is just missing and > you'd accept a patch to add them? i think glibc uses them for is* macros internally but musl doesnt ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-15 11:32 ` Szabolcs Nagy @ 2014-10-15 12:05 ` Sergey Dmitrouk 2014-10-15 16:51 ` Rich Felker 0 siblings, 1 reply; 11+ messages in thread From: Sergey Dmitrouk @ 2014-10-15 12:05 UTC (permalink / raw) To: musl On Wed, Oct 15, 2014 at 04:32:07AM -0700, Szabolcs Nagy wrote: > so used by libstdc++ and libc++ libc++ fails to build against musl as libc, because it doesn't know values of masks (for glibc there is an ifdef). > iirc the meaning of the bits are defined in the c++ standard 22.4.1 [category.ctype] of C++ standard doesn't define exact values. Comment before the values says: > numeric values are for exposition only. So it seems to be implementation defined. Regards, Sergey ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-15 12:05 ` Sergey Dmitrouk @ 2014-10-15 16:51 ` Rich Felker 2014-10-15 19:19 ` Sergey Dmitrouk 0 siblings, 1 reply; 11+ messages in thread From: Rich Felker @ 2014-10-15 16:51 UTC (permalink / raw) To: Sergey Dmitrouk; +Cc: musl On Wed, Oct 15, 2014 at 03:05:31PM +0300, Sergey Dmitrouk wrote: > On Wed, Oct 15, 2014 at 04:32:07AM -0700, Szabolcs Nagy wrote: > > so used by libstdc++ and libc++ > > libc++ fails to build against musl as libc, because it doesn't know > values of masks (for glibc there is an ifdef). See the patches for gcc in the musl-cross repo. Basically, libstdc++ should be using the files from the config/os/generic directory, not the config/os/gnu-linux directory. The files in the latter are poking at glibc internals which musl only mimics minimally for the sake of running existing binaries. These are not interfaces supported at the source level. Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-15 16:51 ` Rich Felker @ 2014-10-15 19:19 ` Sergey Dmitrouk 2014-10-16 0:58 ` Rich Felker 0 siblings, 1 reply; 11+ messages in thread From: Sergey Dmitrouk @ 2014-10-15 19:19 UTC (permalink / raw) To: musl On Wed, Oct 15, 2014 at 09:51:36AM -0700, Rich Felker wrote: > Basically, libstdc++ should be using ... Well, I'm talking about libc++, not libstdc++. libc++ doesn't have such headers and all is kept in one big locale.cpp. It's easy to hard-wire these constants for generic case, but is it really correct solution? It doesn't seem to be standardized. Values in os/generic/ctype_base.h differ from those one can find in C++ standard. There is even a comment: // Default information, may not be appropriate for specific host. My point is that musl can have these masks defined to arbitrary values and there is currently no way for a client to know exact values. It just happens to work, no guarantees. The question is whether you want to keep it in this somewhat incomplete state, when particular values of constants are assumed and undocumented (e.g. if this is really just for libstdc++, which can live without constants). Regards, Sergey ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-15 19:19 ` Sergey Dmitrouk @ 2014-10-16 0:58 ` Rich Felker 2014-10-16 1:53 ` Szabolcs Nagy 0 siblings, 1 reply; 11+ messages in thread From: Rich Felker @ 2014-10-16 0:58 UTC (permalink / raw) To: musl On Wed, Oct 15, 2014 at 10:19:46PM +0300, Sergey Dmitrouk wrote: > On Wed, Oct 15, 2014 at 09:51:36AM -0700, Rich Felker wrote: > > Basically, libstdc++ should be using ... > > Well, I'm talking about libc++, not libstdc++. libc++ doesn't have > such headers and all is kept in one big locale.cpp. It's easy to > hard-wire these constants for generic case, but is it really correct > solution? No, using those interfaces AT ALL is incorrect. They are not a public API but glibc implementation internals. The correct way to implement the locale functionality in C++ is to call the ctype.h/wctype.h functions, not using glibc implementation internals. > It doesn't seem to be standardized. Values in > os/generic/ctype_base.h differ from those one can find in C++ standard. The standard has no such thing. These are implementation details. > There is even a comment: > > // Default information, may not be appropriate for specific host. > > My point is that musl can have these masks defined to arbitrary values > and there is currently no way for a client to know exact values. It > just happens to work, no guarantees. No, the generic implementation does not use the glibc internals at all. Since the rest of libstdc++'s locale support is written based on the glibc internals, the generic implementation just provides an interface that looks like the glibc one using standard functions. > The question is whether you want to keep it in this somewhat incomplete > state, when particular values of constants are assumed and undocumented (e.g. > if this is really just for libstdc++, which can live without constants). No, the question is whether we want to provide glibc internals as a public API, and the answer is no. Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-16 0:58 ` Rich Felker @ 2014-10-16 1:53 ` Szabolcs Nagy 2014-10-16 2:07 ` Rich Felker 0 siblings, 1 reply; 11+ messages in thread From: Szabolcs Nagy @ 2014-10-16 1:53 UTC (permalink / raw) To: musl * Rich Felker <dalias@libc.org> [2014-10-15 20:58:43 -0400]: > On Wed, Oct 15, 2014 at 10:19:46PM +0300, Sergey Dmitrouk wrote: > > hard-wire these constants for generic case, but is it really correct > > solution? > > No, using those interfaces AT ALL is incorrect. They are not a public > API but glibc implementation internals. The correct way to implement > the locale functionality in C++ is to call the ctype.h/wctype.h > functions, not using glibc implementation internals. > i think the c++ std lib has a hard time implementing that efficiently (but it should be their problem not ours) it has to parse istreams in terms of ctype<> and there are inefficient apis like ctype<C>::is(const C*,const C*,mask*) which has to calculate the ctype mask for each char in a substring (so calling all is* c apis for each char..) > > The question is whether you want to keep it in this somewhat incomplete > > state, when particular values of constants are assumed and undocumented (e.g. > > if this is really just for libstdc++, which can live without constants). > > No, the question is whether we want to provide glibc internals as a > public API, and the answer is no. > well at least one standard specifies __ctype_b_loc http://refspecs.linux-foundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/baselib---ctype-b-loc.html but it does not specify the bitrepresentation (which is part of the abi) so it is useless i think this symbol has to be provided by glibc forever because of abi compat and the meaning of the flags cannot be changed however the glibc internal ctype implementation may change in the future including the undocumented flags/masks in glibc ctype.h so relying on those in any way is wrong ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-16 1:53 ` Szabolcs Nagy @ 2014-10-16 2:07 ` Rich Felker 2014-10-16 8:37 ` Sergey Dmitrouk 0 siblings, 1 reply; 11+ messages in thread From: Rich Felker @ 2014-10-16 2:07 UTC (permalink / raw) To: musl On Thu, Oct 16, 2014 at 03:53:33AM +0200, Szabolcs Nagy wrote: > * Rich Felker <dalias@libc.org> [2014-10-15 20:58:43 -0400]: > > On Wed, Oct 15, 2014 at 10:19:46PM +0300, Sergey Dmitrouk wrote: > > > hard-wire these constants for generic case, but is it really correct > > > solution? > > > > No, using those interfaces AT ALL is incorrect. They are not a public > > API but glibc implementation internals. The correct way to implement > > the locale functionality in C++ is to call the ctype.h/wctype.h > > functions, not using glibc implementation internals. > > > > i think the c++ std lib has a hard time implementing that efficiently > (but it should be their problem not ours) > > it has to parse istreams in terms of ctype<> and there are inefficient > apis like ctype<C>::is(const C*,const C*,mask*) which has to calculate > the ctype mask for each char in a substring (so calling all is* c apis > for each char..) This sounds like an inefficient API to use... > > > The question is whether you want to keep it in this somewhat incomplete > > > state, when particular values of constants are assumed and undocumented (e.g. > > > if this is really just for libstdc++, which can live without constants). > > > > No, the question is whether we want to provide glibc internals as a > > public API, and the answer is no. > > > > well at least one standard specifies __ctype_b_loc > > http://refspecs.linux-foundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/baselib---ctype-b-loc.html In the link you cited: "This interface is not in the source standard; it is only in the binary standard." Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-16 2:07 ` Rich Felker @ 2014-10-16 8:37 ` Sergey Dmitrouk 2014-10-16 12:21 ` Szabolcs Nagy 2014-10-16 15:37 ` Rich Felker 0 siblings, 2 replies; 11+ messages in thread From: Sergey Dmitrouk @ 2014-10-16 8:37 UTC (permalink / raw) To: musl On Wed, Oct 15, 2014 at 07:07:12PM -0700, Rich Felker wrote: > In the link you cited: > > "This interface is not in the source standard; it is only in the > binary standard." Even if it's a binary interface, it shouldn't be underspecified. Right now __ctype_b_loc.c contains an array of numbers which correspond to what glibc has. Consider the following situation: glibc changes masks at some point, musl doesn't, someone uses masks from new glibc's headers after reading a thread like this one and obtains broken locales. Having this documented in form of a comment instead of public interface would be good as well, in this case clients could consult place where it's documented and be sure that their constants are correct. Say, add a comment to __ctype_b_loc.c to clarify meaning of the table and document masks at the same time. Regards, Sergey ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-16 8:37 ` Sergey Dmitrouk @ 2014-10-16 12:21 ` Szabolcs Nagy 2014-10-16 15:37 ` Rich Felker 1 sibling, 0 replies; 11+ messages in thread From: Szabolcs Nagy @ 2014-10-16 12:21 UTC (permalink / raw) To: musl * Sergey Dmitrouk <sdmitrouk@accesssoftek.com> [2014-10-16 11:37:39 +0300]: > On Wed, Oct 15, 2014 at 07:07:12PM -0700, Rich Felker wrote: > > In the link you cited: > > > > "This interface is not in the source standard; it is only in the > > binary standard." > > Even if it's a binary interface, it shouldn't be underspecified. Right > now __ctype_b_loc.c contains an array of numbers which correspond to > what glibc has. Consider the following situation: glibc changes masks > at some point, musl doesn't, someone uses masks from new glibc's > headers after reading a thread like this one and obtains broken locales. either glibc changes its locale system and keeps __ctype_b_loc around with the old semantics (which they can do any time) or they change the semantics of __ctype_b_loc breaking the current abi (which they should not do) this is why depending anything in source headers is broken when you depend on internal abi: internal details of the source can change, there is no guarantee about them, abi that is visible to existing binaries cannot change (hopefully) so libc++ should not use undocumented masks from glibc ctype.h libc++ should use standard interfaces (that should be the default) and if it must use __ctype_b_loc then recognize that glibc headers are not under their control so they have to replicate the bits they care about > Having this documented in form of a comment instead of public interface > would be good as well, in this case clients could consult place where > it's documented and be sure that their constants are correct. Say, add > a comment to __ctype_b_loc.c to clarify meaning of the table and > document masks at the same time. i think adding a comment makes sense, but i think we should not encourage the dependence on accidentally visible abi details of the libc ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Constants to decode __ctype_b_loc() table 2014-10-16 8:37 ` Sergey Dmitrouk 2014-10-16 12:21 ` Szabolcs Nagy @ 2014-10-16 15:37 ` Rich Felker 1 sibling, 0 replies; 11+ messages in thread From: Rich Felker @ 2014-10-16 15:37 UTC (permalink / raw) To: musl On Thu, Oct 16, 2014 at 11:37:39AM +0300, Sergey Dmitrouk wrote: > On Wed, Oct 15, 2014 at 07:07:12PM -0700, Rich Felker wrote: > > In the link you cited: > > > > "This interface is not in the source standard; it is only in the > > binary standard." > > Even if it's a binary interface, it shouldn't be underspecified. Right > now __ctype_b_loc.c contains an array of numbers which correspond to > what glibc has. Consider the following situation: glibc changes masks > at some point, musl doesn't, someone uses masks from new glibc's > headers after reading a thread like this one and obtains broken locales. glibc can't change these because every existing glibc binary using the ctype functions depends on them. They could do it with a new symbol version, but that would be a lot of gratuitous breakage, and if we wanted to support that it would take a lot more hacks than just "updating" our tables. > Having this documented in form of a comment instead of public interface > would be good as well, in this case clients could consult place where > it's documented and be sure that their constants are correct. Say, add > a comment to __ctype_b_loc.c to clarify meaning of the table and > document masks at the same time. The documentation is purely that this object which __ctype_b_loc returns a pointer to is a binary blob matching what glibc provides for the purpose of running glibc-linked binaries. It's not an API interface but an ABI interface. Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-10-16 15:37 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-10-15 10:41 Constants to decode __ctype_b_loc() table Sergey Dmitrouk 2014-10-15 11:32 ` Szabolcs Nagy 2014-10-15 12:05 ` Sergey Dmitrouk 2014-10-15 16:51 ` Rich Felker 2014-10-15 19:19 ` Sergey Dmitrouk 2014-10-16 0:58 ` Rich Felker 2014-10-16 1:53 ` Szabolcs Nagy 2014-10-16 2:07 ` Rich Felker 2014-10-16 8:37 ` Sergey Dmitrouk 2014-10-16 12:21 ` Szabolcs Nagy 2014-10-16 15:37 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).