mailing list of musl libc
 help / color / mirror / code / Atom feed
* Constants to decode __ctype_b_loc() table
@ 2014-10-15 10:41 Sergey Dmitrouk
  2014-10-15 11:32 ` Szabolcs Nagy
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Dmitrouk @ 2014-10-15 10:41 UTC (permalink / raw)
  To: musl

Hi,

musl provides symbols for the following functions:

 - __ctype_b_loc
 - __ctype_tolower_loc
 - __ctype_toupper_loc

The last two of them return values that do not need special
interpretation.  The first one returns value pointing to a table of
bitmasks, but I'm unable to find where musl defines meaning of the bits.
Values seem to be compatible to the ones defined by glibc though.

Is this intentional (some kind of compatibility with other libc
implementation) or definition of _IS... macros is just missing and
you'd accept a patch to add them?

Regards,
Sergey


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-15 10:41 Constants to decode __ctype_b_loc() table Sergey Dmitrouk
@ 2014-10-15 11:32 ` Szabolcs Nagy
  2014-10-15 12:05   ` Sergey Dmitrouk
  0 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2014-10-15 11:32 UTC (permalink / raw)
  To: musl

* Sergey Dmitrouk <sdmitrouk@accesssoftek.com> [2014-10-15 13:41:42 +0300]:
> 
> musl provides symbols for the following functions:
> 
>  - __ctype_b_loc
>  - __ctype_tolower_loc
>  - __ctype_toupper_loc
> 

these are nonsense abi

required for ctype_base::mask feature of c++
so used by libstdc++ and libc++

> The last two of them return values that do not need special
> interpretation.  The first one returns value pointing to a table of
> bitmasks, but I'm unable to find where musl defines meaning of the bits.
> Values seem to be compatible to the ones defined by glibc though.
> 

iirc the meaning of the bits are defined in the c++ standard

either that or glibc <-> libstdc++ abi convention

> Is this intentional (some kind of compatibility with other libc
> implementation) or definition of _IS... macros is just missing and
> you'd accept a patch to add them?

i think glibc uses them for is* macros internally
but musl doesnt


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-15 11:32 ` Szabolcs Nagy
@ 2014-10-15 12:05   ` Sergey Dmitrouk
  2014-10-15 16:51     ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Dmitrouk @ 2014-10-15 12:05 UTC (permalink / raw)
  To: musl

On Wed, Oct 15, 2014 at 04:32:07AM -0700, Szabolcs Nagy wrote:
> so used by libstdc++ and libc++

libc++ fails to build against musl as libc, because it doesn't know
values of masks (for glibc there is an ifdef).

> iirc the meaning of the bits are defined in the c++ standard

22.4.1 [category.ctype] of C++ standard doesn't define exact values.
Comment before the values says:

 > numeric values are for exposition only.

So it seems to be implementation defined.

Regards,
Sergey


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-15 12:05   ` Sergey Dmitrouk
@ 2014-10-15 16:51     ` Rich Felker
  2014-10-15 19:19       ` Sergey Dmitrouk
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2014-10-15 16:51 UTC (permalink / raw)
  To: Sergey Dmitrouk; +Cc: musl

On Wed, Oct 15, 2014 at 03:05:31PM +0300, Sergey Dmitrouk wrote:
> On Wed, Oct 15, 2014 at 04:32:07AM -0700, Szabolcs Nagy wrote:
> > so used by libstdc++ and libc++
> 
> libc++ fails to build against musl as libc, because it doesn't know
> values of masks (for glibc there is an ifdef).

See the patches for gcc in the musl-cross repo. Basically, libstdc++
should be using the files from the config/os/generic directory, not
the config/os/gnu-linux directory. The files in the latter are poking
at glibc internals which musl only mimics minimally for the sake of
running existing binaries. These are not interfaces supported at the
source level.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-15 16:51     ` Rich Felker
@ 2014-10-15 19:19       ` Sergey Dmitrouk
  2014-10-16  0:58         ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Dmitrouk @ 2014-10-15 19:19 UTC (permalink / raw)
  To: musl

On Wed, Oct 15, 2014 at 09:51:36AM -0700, Rich Felker wrote:
> Basically, libstdc++ should be using ...

Well, I'm talking about libc++, not libstdc++.  libc++ doesn't have
such headers and all is kept in one big locale.cpp.  It's easy to
hard-wire these constants for generic case, but is it really correct
solution?  It doesn't seem to be standardized.  Values in
os/generic/ctype_base.h differ from those one can find in C++ standard.
There is even a comment:

    // Default information, may not be appropriate for specific host.

My point is that musl can have these masks defined to arbitrary values
and there is currently no way for a client to know exact values.  It
just happens to work, no guarantees.

The question is whether you want to keep it in this somewhat incomplete
state, when particular values of constants are assumed and undocumented (e.g.
if this is really just for libstdc++, which can live without constants).

Regards,
Sergey


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-15 19:19       ` Sergey Dmitrouk
@ 2014-10-16  0:58         ` Rich Felker
  2014-10-16  1:53           ` Szabolcs Nagy
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2014-10-16  0:58 UTC (permalink / raw)
  To: musl

On Wed, Oct 15, 2014 at 10:19:46PM +0300, Sergey Dmitrouk wrote:
> On Wed, Oct 15, 2014 at 09:51:36AM -0700, Rich Felker wrote:
> > Basically, libstdc++ should be using ...
> 
> Well, I'm talking about libc++, not libstdc++.  libc++ doesn't have
> such headers and all is kept in one big locale.cpp.  It's easy to
> hard-wire these constants for generic case, but is it really correct
> solution?

No, using those interfaces AT ALL is incorrect. They are not a public
API but glibc implementation internals. The correct way to implement
the locale functionality in C++ is to call the ctype.h/wctype.h
functions, not using glibc implementation internals.

> It doesn't seem to be standardized.  Values in
> os/generic/ctype_base.h differ from those one can find in C++ standard.

The standard has no such thing. These are implementation details.

> There is even a comment:
> 
>     // Default information, may not be appropriate for specific host.
> 
> My point is that musl can have these masks defined to arbitrary values
> and there is currently no way for a client to know exact values.  It
> just happens to work, no guarantees.

No, the generic implementation does not use the glibc internals at
all. Since the rest of libstdc++'s locale support is written based on
the glibc internals, the generic implementation just provides an
interface that looks like the glibc one using standard functions.

> The question is whether you want to keep it in this somewhat incomplete
> state, when particular values of constants are assumed and undocumented (e.g.
> if this is really just for libstdc++, which can live without constants).

No, the question is whether we want to provide glibc internals as a
public API, and the answer is no.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-16  0:58         ` Rich Felker
@ 2014-10-16  1:53           ` Szabolcs Nagy
  2014-10-16  2:07             ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2014-10-16  1:53 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2014-10-15 20:58:43 -0400]:
> On Wed, Oct 15, 2014 at 10:19:46PM +0300, Sergey Dmitrouk wrote:
> > hard-wire these constants for generic case, but is it really correct
> > solution?
> 
> No, using those interfaces AT ALL is incorrect. They are not a public
> API but glibc implementation internals. The correct way to implement
> the locale functionality in C++ is to call the ctype.h/wctype.h
> functions, not using glibc implementation internals.
> 

i think the c++ std lib has a hard time implementing that efficiently
(but it should be their problem not ours)

it has to parse istreams in terms of ctype<> and there are inefficient
apis like ctype<C>::is(const C*,const C*,mask*) which has to calculate
the ctype mask for each char in a substring (so calling all is* c apis
for each char..)

> > The question is whether you want to keep it in this somewhat incomplete
> > state, when particular values of constants are assumed and undocumented (e.g.
> > if this is really just for libstdc++, which can live without constants).
> 
> No, the question is whether we want to provide glibc internals as a
> public API, and the answer is no.
> 

well at least one standard specifies __ctype_b_loc

http://refspecs.linux-foundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/baselib---ctype-b-loc.html

but it does not specify the bitrepresentation (which is part of the
abi) so it is useless

i think this symbol has to be provided by glibc forever because of
abi compat and the meaning of the flags cannot be changed

however the glibc internal ctype implementation may change in the
future including the undocumented flags/masks in glibc ctype.h so
relying on those in any way is wrong


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-16  1:53           ` Szabolcs Nagy
@ 2014-10-16  2:07             ` Rich Felker
  2014-10-16  8:37               ` Sergey Dmitrouk
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2014-10-16  2:07 UTC (permalink / raw)
  To: musl

On Thu, Oct 16, 2014 at 03:53:33AM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2014-10-15 20:58:43 -0400]:
> > On Wed, Oct 15, 2014 at 10:19:46PM +0300, Sergey Dmitrouk wrote:
> > > hard-wire these constants for generic case, but is it really correct
> > > solution?
> > 
> > No, using those interfaces AT ALL is incorrect. They are not a public
> > API but glibc implementation internals. The correct way to implement
> > the locale functionality in C++ is to call the ctype.h/wctype.h
> > functions, not using glibc implementation internals.
> > 
> 
> i think the c++ std lib has a hard time implementing that efficiently
> (but it should be their problem not ours)
> 
> it has to parse istreams in terms of ctype<> and there are inefficient
> apis like ctype<C>::is(const C*,const C*,mask*) which has to calculate
> the ctype mask for each char in a substring (so calling all is* c apis
> for each char..)

This sounds like an inefficient API to use...

> > > The question is whether you want to keep it in this somewhat incomplete
> > > state, when particular values of constants are assumed and undocumented (e.g.
> > > if this is really just for libstdc++, which can live without constants).
> > 
> > No, the question is whether we want to provide glibc internals as a
> > public API, and the answer is no.
> > 
> 
> well at least one standard specifies __ctype_b_loc
> 
> http://refspecs.linux-foundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/baselib---ctype-b-loc.html

In the link you cited:

    "This interface is not in the source standard; it is only in the
    binary standard."

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-16  2:07             ` Rich Felker
@ 2014-10-16  8:37               ` Sergey Dmitrouk
  2014-10-16 12:21                 ` Szabolcs Nagy
  2014-10-16 15:37                 ` Rich Felker
  0 siblings, 2 replies; 11+ messages in thread
From: Sergey Dmitrouk @ 2014-10-16  8:37 UTC (permalink / raw)
  To: musl

On Wed, Oct 15, 2014 at 07:07:12PM -0700, Rich Felker wrote:
> In the link you cited:
> 
>     "This interface is not in the source standard; it is only in the
>     binary standard."

Even if it's a binary interface, it shouldn't be underspecified.  Right
now __ctype_b_loc.c contains an array of numbers which correspond to
what glibc has.  Consider the following situation: glibc changes masks
at some point, musl doesn't, someone uses masks from new glibc's
headers after reading a thread like this one and obtains broken locales.

Having this documented in form of a comment instead of public interface
would be good as well, in this case clients could consult place where
it's documented and be sure that their constants are correct.  Say, add
a comment to __ctype_b_loc.c to clarify meaning of the table and
document masks at the same time.

Regards,
Sergey


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-16  8:37               ` Sergey Dmitrouk
@ 2014-10-16 12:21                 ` Szabolcs Nagy
  2014-10-16 15:37                 ` Rich Felker
  1 sibling, 0 replies; 11+ messages in thread
From: Szabolcs Nagy @ 2014-10-16 12:21 UTC (permalink / raw)
  To: musl

* Sergey Dmitrouk <sdmitrouk@accesssoftek.com> [2014-10-16 11:37:39 +0300]:
> On Wed, Oct 15, 2014 at 07:07:12PM -0700, Rich Felker wrote:
> > In the link you cited:
> > 
> >     "This interface is not in the source standard; it is only in the
> >     binary standard."
> 
> Even if it's a binary interface, it shouldn't be underspecified.  Right
> now __ctype_b_loc.c contains an array of numbers which correspond to
> what glibc has.  Consider the following situation: glibc changes masks
> at some point, musl doesn't, someone uses masks from new glibc's
> headers after reading a thread like this one and obtains broken locales.

either glibc changes its locale system and keeps __ctype_b_loc around
with the old semantics (which they can do any time) or they change the
semantics of __ctype_b_loc breaking the current abi (which they should
not do)

this is why depending anything in source headers is broken when you
depend on internal abi: internal details of the source can change,
there is no guarantee about them, abi that is visible to existing
binaries cannot change (hopefully)

so libc++ should not use undocumented masks from glibc ctype.h

libc++ should use standard interfaces (that should be the default)
and if it must use __ctype_b_loc then recognize that glibc headers
are not under their control so they have to replicate the bits they
care about

> Having this documented in form of a comment instead of public interface
> would be good as well, in this case clients could consult place where
> it's documented and be sure that their constants are correct.  Say, add
> a comment to __ctype_b_loc.c to clarify meaning of the table and
> document masks at the same time.

i think adding a comment makes sense, but i think we should not
encourage the dependence on accidentally visible abi details of
the libc


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Constants to decode __ctype_b_loc() table
  2014-10-16  8:37               ` Sergey Dmitrouk
  2014-10-16 12:21                 ` Szabolcs Nagy
@ 2014-10-16 15:37                 ` Rich Felker
  1 sibling, 0 replies; 11+ messages in thread
From: Rich Felker @ 2014-10-16 15:37 UTC (permalink / raw)
  To: musl

On Thu, Oct 16, 2014 at 11:37:39AM +0300, Sergey Dmitrouk wrote:
> On Wed, Oct 15, 2014 at 07:07:12PM -0700, Rich Felker wrote:
> > In the link you cited:
> > 
> >     "This interface is not in the source standard; it is only in the
> >     binary standard."
> 
> Even if it's a binary interface, it shouldn't be underspecified.  Right
> now __ctype_b_loc.c contains an array of numbers which correspond to
> what glibc has.  Consider the following situation: glibc changes masks
> at some point, musl doesn't, someone uses masks from new glibc's
> headers after reading a thread like this one and obtains broken locales.

glibc can't change these because every existing glibc binary using the
ctype functions depends on them. They could do it with a new symbol
version, but that would be a lot of gratuitous breakage, and if we
wanted to support that it would take a lot more hacks than just
"updating" our tables.

> Having this documented in form of a comment instead of public interface
> would be good as well, in this case clients could consult place where
> it's documented and be sure that their constants are correct.  Say, add
> a comment to __ctype_b_loc.c to clarify meaning of the table and
> document masks at the same time.

The documentation is purely that this object which __ctype_b_loc
returns a pointer to is a binary blob matching what glibc provides for
the purpose of running glibc-linked binaries. It's not an API
interface but an ABI interface.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-10-16 15:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-15 10:41 Constants to decode __ctype_b_loc() table Sergey Dmitrouk
2014-10-15 11:32 ` Szabolcs Nagy
2014-10-15 12:05   ` Sergey Dmitrouk
2014-10-15 16:51     ` Rich Felker
2014-10-15 19:19       ` Sergey Dmitrouk
2014-10-16  0:58         ` Rich Felker
2014-10-16  1:53           ` Szabolcs Nagy
2014-10-16  2:07             ` Rich Felker
2014-10-16  8:37               ` Sergey Dmitrouk
2014-10-16 12:21                 ` Szabolcs Nagy
2014-10-16 15:37                 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).