discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
* Full mandoc locale support committed.
@ 2011-05-17 23:07 Kristaps Dzonsons
  2011-05-18  6:21 ` Yuri Pankov
  2011-05-19 20:28 ` Ulrich Spörlein
  0 siblings, 2 replies; 8+ messages in thread
From: Kristaps Dzonsons @ 2011-05-17 23:07 UTC (permalink / raw)
  To: discuss; +Cc: Stefan Sperling, Hiroki Sato

[-- Attachment #1: Type: text/plain, Size: 2093 bytes --]

Hi,

With this last commit, initial [full] locale support has been fitted 
into mandoc!  Attached is eye-candy: a manual full of random Unicode 
input (\[uNNNN]) first with -Tascii, then with -Tlocale.

 From the manual:

    Locale Output
      Locale-depending output encoding is triggered with -Tlocale.
      This option is not available on all systems: systems without
      locale support, or those whose internal representation is not
      natively UCS-4, will fall back to -Tascii.  See ASCII Output
      for font style specification and available command-line
      arguments.

The check-ins:

   (1) http://mdocml.bsd.lv/archives/source/0920.html
   (2) http://mdocml.bsd.lv/archives/source/0919.html

This support is /very/ fast, and any overhead occurs if and only if 
-Tlocale is selected AND supported.  If this doesn't hold, then mandoc 
runs at -Tascii "native" speed.

The bad news: it comes at a price.  -Tlocale only works if Unicode 
code-point values (defined in the UCS-4 (-2?) standards) can be 
transformed directly into wide-character values usable by the system.  I 
check this with an optional C99 feature, __STDC_ISO_10646__.  See

  http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html

for details.  Unfortunately, this seems only to be exported on glibc. 
I'm told the conditions hold on OpenBSD and FreeBSD.  NetBSD?

If your system abides by these rules and doesn't export this symbol, 
please let me know and we can special-case the macro test for this 
feature.  There is a way to convert from Unicode to a system's 
wide-character support without this feature, but it isn't pretty.  I'll 
probably have to implement this anyway for portability.  For now, if a 
system doesn't do __STDC_ISO_10646__, -Tlocale is a synonym for -Tascii.

Note that if this method is unilaterally hated, it's easy to switch to 
another method.  This was simply the fastest, simplest, and most 
transparent to implement.  All of the logic either way is in one file, 
and easy to manipulate:

  http://mdocml.bsd.lv/cgi-bin/cvsweb/term_ascii.c?cvsroot=mdocml

Thoughts?

Kristaps

[-- Attachment #2: screen.png --]
[-- Type: image/png, Size: 34932 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Full mandoc locale support committed.
  2011-05-17 23:07 Full mandoc locale support committed Kristaps Dzonsons
@ 2011-05-18  6:21 ` Yuri Pankov
  2011-05-18  9:53   ` Kristaps Dzonsons
  2011-05-19 20:28 ` Ulrich Spörlein
  1 sibling, 1 reply; 8+ messages in thread
From: Yuri Pankov @ 2011-05-18  6:21 UTC (permalink / raw)
  To: discuss

On Wed, May 18, 2011 at 01:07:28AM +0200, Kristaps Dzonsons wrote:
> Hi,
> 
> With this last commit, initial [full] locale support has been fitted 
> into mandoc!  Attached is eye-candy: a manual full of random Unicode 
> input (\[uNNNN]) first with -Tascii, then with -Tlocale.

Unicode escapes work for me, but does that mean that existing localized
manpages can't be used? Tried several ja and ru from debian 6, all of
them make mandoc die with "FATAL: line scope broken, syntax violated"
(-Tlint talks a lot about "ERROR: skipping bad character: ignoring
byte").

>  From the manual:
> 
>     Locale Output
>       Locale-depending output encoding is triggered with -Tlocale.
>       This option is not available on all systems: systems without
>       locale support, or those whose internal representation is not
>       natively UCS-4, will fall back to -Tascii.  See ASCII Output
>       for font style specification and available command-line
>       arguments.
> 
> The check-ins:
> 
>    (1) http://mdocml.bsd.lv/archives/source/0920.html

A small typo here in Makefile's comment - USE_CHAR.

>    (2) http://mdocml.bsd.lv/archives/source/0919.html
> 
> This support is /very/ fast, and any overhead occurs if and only if 
> -Tlocale is selected AND supported.  If this doesn't hold, then mandoc 
> runs at -Tascii "native" speed.
> 
> The bad news: it comes at a price.  -Tlocale only works if Unicode 
> code-point values (defined in the UCS-4 (-2?) standards) can be 
> transformed directly into wide-character values usable by the system.  I 
> check this with an optional C99 feature, __STDC_ISO_10646__.  See
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html
> 
> for details.  Unfortunately, this seems only to be exported on glibc. 
> I'm told the conditions hold on OpenBSD and FreeBSD.  NetBSD?
> 
> If your system abides by these rules and doesn't export this symbol, 
> please let me know and we can special-case the macro test for this 
> feature.  There is a way to convert from Unicode to a system's 
> wide-character support without this feature, but it isn't pretty.  I'll 
> probably have to implement this anyway for portability.  For now, if a 
> system doesn't do __STDC_ISO_10646__, -Tlocale is a synonym for -Tascii.

Checked on Solaris 11 and Illumos - both do not export the symbol,
though commenting out #undef USE_WCHAR in term_ascii.c makes -Tlocale
work.

> Note that if this method is unilaterally hated, it's easy to switch to 
> another method.  This was simply the fastest, simplest, and most 
> transparent to implement.  All of the logic either way is in one file, 
> and easy to manipulate:
> 
>   http://mdocml.bsd.lv/cgi-bin/cvsweb/term_ascii.c?cvsroot=mdocml
> 
> Thoughts?


Yuri
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Full mandoc locale support committed.
  2011-05-18  6:21 ` Yuri Pankov
@ 2011-05-18  9:53   ` Kristaps Dzonsons
  0 siblings, 0 replies; 8+ messages in thread
From: Kristaps Dzonsons @ 2011-05-18  9:53 UTC (permalink / raw)
  To: discuss

>> With this last commit, initial [full] locale support has been fitted
>> into mandoc!  Attached is eye-candy: a manual full of random Unicode
>> input (\[uNNNN]) first with -Tascii, then with -Tlocale.
>
> Unicode escapes work for me, but does that mean that existing localized
> manpages can't be used? Tried several ja and ru from debian 6, all of
> them make mandoc die with "FATAL: line scope broken, syntax violated"
> (-Tlint talks a lot about "ERROR: skipping bad character: ignoring
> byte").

Yuri,

Groff doesn't actually accept multi-byte input: it translates with 
preconv from the localised form to the Unicode escapes.  This is hidden 
away with the `-k' groff option.  See:

  http://manpages.ubuntu.com/manpages/lucid/man1/groff.1.html
  http://manpages.ubuntu.com/manpages/lucid/man1/preconv.1.html

(Why doesn't groff have its own manuals online instead of just the 
texinfo?  I bet it's cause grohtml doesn't look so good... ;))

We'll worry about doing preconv-style translation later.  For this 
release, the push is for the Unicode support itself.

>> The check-ins:
>>
>>     (1) http://mdocml.bsd.lv/archives/source/0920.html
>
> A small typo here in Makefile's comment - USE_CHAR.

Fixed---thanks.

> Checked on Solaris 11 and Illumos - both do not export the symbol,
> though commenting out #undef USE_WCHAR in term_ascii.c makes -Tlocale
> work.

Great news!  (The Internet agrees, as there are bug reports regarding 
Solaris and 10646.)  I'm still trying to figure out a good way to let 
the lack of this symbol be ignored when it's supported anyway (as in 
Solaris' case).  For the release itself, I'll probably let USE_WCHAR be 
authoritative and set or unset by downstream.

Thanks again,

Kristaps
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Full mandoc locale support committed.
  2011-05-17 23:07 Full mandoc locale support committed Kristaps Dzonsons
  2011-05-18  6:21 ` Yuri Pankov
@ 2011-05-19 20:28 ` Ulrich Spörlein
  2011-05-19 22:49   ` Kristaps Dzonsons
  2011-05-20 15:52   ` Kristaps Dzonsons
  1 sibling, 2 replies; 8+ messages in thread
From: Ulrich Spörlein @ 2011-05-19 20:28 UTC (permalink / raw)
  To: discuss; +Cc: Stefan Sperling, Hiroki Sato

On Wed, 18.05.2011 at 01:07:28 +0200, Kristaps Dzonsons wrote:
> Hi,
> 
> With this last commit, initial [full] locale support has been fitted 
> into mandoc!  Attached is eye-candy: a manual full of random Unicode 
> input (\[uNNNN]) first with -Tascii, then with -Tlocale.
> 
>  From the manual:
> 
>     Locale Output
>       Locale-depending output encoding is triggered with -Tlocale.
>       This option is not available on all systems: systems without
>       locale support, or those whose internal representation is not
>       natively UCS-4, will fall back to -Tascii.  See ASCII Output
>       for font style specification and available command-line
>       arguments.

Cool stuff! However, and this might be due to a case of "we've always
been doing it that way"-thinking: I think this automagic is in the wrong
place.

There might be cases, where I really want ASCII output no matter what my
locale is (this is covered by -Tascii right now), and there might be
cases where I want UTF-8 output, no matter what the current locale is.
Perhaps because I write the output to disk or to some other
postprocessor.

What I'm arguing is that we need to have a -Tutf8 mode and that perhaps
*not* specifying *any* -T value turns on the automagic? This would make
more sense from a users standpoint, IMHO.

Uli
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Full mandoc locale support committed.
  2011-05-19 20:28 ` Ulrich Spörlein
@ 2011-05-19 22:49   ` Kristaps Dzonsons
  2011-05-20  7:26     ` Ulrich Spörlein
  2011-05-20 15:52   ` Kristaps Dzonsons
  1 sibling, 1 reply; 8+ messages in thread
From: Kristaps Dzonsons @ 2011-05-19 22:49 UTC (permalink / raw)
  To: discuss; +Cc: Ulrich Spörlein, Stefan Sperling, Hiroki Sato

On 19/05/2011 22:28, Ulrich Spörlein wrote:
> On Wed, 18.05.2011 at 01:07:28 +0200, Kristaps Dzonsons wrote:
>> Hi,
>>
>> With this last commit, initial [full] locale support has been fitted
>> into mandoc!  Attached is eye-candy: a manual full of random Unicode
>> input (\[uNNNN]) first with -Tascii, then with -Tlocale.
>>
>>   From the manual:
>>
>>      Locale Output
>>        Locale-depending output encoding is triggered with -Tlocale.
>>        This option is not available on all systems: systems without
>>        locale support, or those whose internal representation is not
>>        natively UCS-4, will fall back to -Tascii.  See ASCII Output
>>        for font style specification and available command-line
>>        arguments.
>
> Cool stuff! However, and this might be due to a case of "we've always
> been doing it that way"-thinking: I think this automagic is in the wrong
> place.
>
> There might be cases, where I really want ASCII output no matter what my
> locale is (this is covered by -Tascii right now), and there might be
> cases where I want UTF-8 output, no matter what the current locale is.
> Perhaps because I write the output to disk or to some other
> postprocessor.
>
> What I'm arguing is that we need to have a -Tutf8 mode and that perhaps
> *not* specifying *any* -T value turns on the automagic? This would make
> more sense from a users standpoint, IMHO.

Ulrich,

I agree.  And we also want -Tutf8 for groff compatibility.

In short, I'll have -Tutf8 done for the next release.  It's simply a 
matter of indicating *which* locale when using setlocale().

(By the way, does FreeBSD have the STDC_ISO_10646, whether explicitly or 
implicitly?)

Thanks,

Kristaps
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Full mandoc locale support committed.
  2011-05-19 22:49   ` Kristaps Dzonsons
@ 2011-05-20  7:26     ` Ulrich Spörlein
  2011-05-20  7:59       ` Yuri Pankov
  0 siblings, 1 reply; 8+ messages in thread
From: Ulrich Spörlein @ 2011-05-20  7:26 UTC (permalink / raw)
  To: discuss

On Fri, 20.05.2011 at 00:49:21 +0200, Kristaps Dzonsons wrote:
> On 19/05/2011 22:28, Ulrich Spörlein wrote:
> > On Wed, 18.05.2011 at 01:07:28 +0200, Kristaps Dzonsons wrote:
> >> Hi,
> >>
> >> With this last commit, initial [full] locale support has been fitted
> >> into mandoc!  Attached is eye-candy: a manual full of random Unicode
> >> input (\[uNNNN]) first with -Tascii, then with -Tlocale.
> >>
> >>   From the manual:
> >>
> >>      Locale Output
> >>        Locale-depending output encoding is triggered with -Tlocale.
> >>        This option is not available on all systems: systems without
> >>        locale support, or those whose internal representation is not
> >>        natively UCS-4, will fall back to -Tascii.  See ASCII Output
> >>        for font style specification and available command-line
> >>        arguments.
> >
> > Cool stuff! However, and this might be due to a case of "we've always
> > been doing it that way"-thinking: I think this automagic is in the wrong
> > place.
> >
> > There might be cases, where I really want ASCII output no matter what my
> > locale is (this is covered by -Tascii right now), and there might be
> > cases where I want UTF-8 output, no matter what the current locale is.
> > Perhaps because I write the output to disk or to some other
> > postprocessor.
> >
> > What I'm arguing is that we need to have a -Tutf8 mode and that perhaps
> > *not* specifying *any* -T value turns on the automagic? This would make
> > more sense from a users standpoint, IMHO.
> 
> Ulrich,
> 
> I agree.  And we also want -Tutf8 for groff compatibility.
> 
> In short, I'll have -Tutf8 done for the next release.  It's simply a 
> matter of indicating *which* locale when using setlocale().
> 
> (By the way, does FreeBSD have the STDC_ISO_10646, whether explicitly or 
> implicitly?)

It certainly isn't defined anywhere, and it also doesn't seem to be
implemented, as a short test using \[uc3bc] didn't produce ü but a
missing-glyph symbol. :(

Uli
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Full mandoc locale support committed.
  2011-05-20  7:26     ` Ulrich Spörlein
@ 2011-05-20  7:59       ` Yuri Pankov
  0 siblings, 0 replies; 8+ messages in thread
From: Yuri Pankov @ 2011-05-20  7:59 UTC (permalink / raw)
  To: discuss

On Fri, May 20, 2011 at 09:26:15AM +0200, Ulrich Spörlein wrote:
> On Fri, 20.05.2011 at 00:49:21 +0200, Kristaps Dzonsons wrote:
> > On 19/05/2011 22:28, Ulrich Spörlein wrote:
> > > On Wed, 18.05.2011 at 01:07:28 +0200, Kristaps Dzonsons wrote:
> > >> Hi,
> > >>
> > >> With this last commit, initial [full] locale support has been fitted
> > >> into mandoc!  Attached is eye-candy: a manual full of random Unicode
> > >> input (\[uNNNN]) first with -Tascii, then with -Tlocale.
> > >>
> > >>   From the manual:
> > >>
> > >>      Locale Output
> > >>        Locale-depending output encoding is triggered with -Tlocale.
> > >>        This option is not available on all systems: systems without
> > >>        locale support, or those whose internal representation is not
> > >>        natively UCS-4, will fall back to -Tascii.  See ASCII Output
> > >>        for font style specification and available command-line
> > >>        arguments.
> > >
> > > Cool stuff! However, and this might be due to a case of "we've always
> > > been doing it that way"-thinking: I think this automagic is in the wrong
> > > place.
> > >
> > > There might be cases, where I really want ASCII output no matter what my
> > > locale is (this is covered by -Tascii right now), and there might be
> > > cases where I want UTF-8 output, no matter what the current locale is.
> > > Perhaps because I write the output to disk or to some other
> > > postprocessor.
> > >
> > > What I'm arguing is that we need to have a -Tutf8 mode and that perhaps
> > > *not* specifying *any* -T value turns on the automagic? This would make
> > > more sense from a users standpoint, IMHO.
> > 
> > Ulrich,
> > 
> > I agree.  And we also want -Tutf8 for groff compatibility.
> > 
> > In short, I'll have -Tutf8 done for the next release.  It's simply a 
> > matter of indicating *which* locale when using setlocale().
> > 
> > (By the way, does FreeBSD have the STDC_ISO_10646, whether explicitly or 
> > implicitly?)
> 
> It certainly isn't defined anywhere, and it also doesn't seem to be
> implemented, as a short test using \[uc3bc] didn't produce ü but a
> missing-glyph symbol. :(

It does work for me on 8.2 and -CURRENT (once I remove the check for
STDC_ISO_10646 from term_ascii.c and resulting #undef), though \[uc3bc]
looks like some chinese or japanese glyph, you probably want \[u00fc]
for the 'ü' :-)


Yuri
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Full mandoc locale support committed.
  2011-05-19 20:28 ` Ulrich Spörlein
  2011-05-19 22:49   ` Kristaps Dzonsons
@ 2011-05-20 15:52   ` Kristaps Dzonsons
  1 sibling, 0 replies; 8+ messages in thread
From: Kristaps Dzonsons @ 2011-05-20 15:52 UTC (permalink / raw)
  To: discuss

> There might be cases, where I really want ASCII output no matter what my
> locale is (this is covered by -Tascii right now), and there might be
> cases where I want UTF-8 output, no matter what the current locale is.
> Perhaps because I write the output to disk or to some other
> postprocessor.
>
> What I'm arguing is that we need to have a -Tutf8 mode and that perhaps
> *not* specifying *any* -T value turns on the automagic? This would make
> more sense from a users standpoint, IMHO.

 From the last two commits:

Log Message:
-----------
Turn on -Tutf8 in the frontend.  Here we go!

Log Message:
-----------
Flip on -Tutf8 backend support.  This forces the UTF-8 LC_CTYPE and does
little else.
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-05-20 15:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-17 23:07 Full mandoc locale support committed Kristaps Dzonsons
2011-05-18  6:21 ` Yuri Pankov
2011-05-18  9:53   ` Kristaps Dzonsons
2011-05-19 20:28 ` Ulrich Spörlein
2011-05-19 22:49   ` Kristaps Dzonsons
2011-05-20  7:26     ` Ulrich Spörlein
2011-05-20  7:59       ` Yuri Pankov
2011-05-20 15:52   ` Kristaps Dzonsons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).