The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: bqt@update.uu.se (Johnny Billquist)
Subject: [TUHS] Character sets
Date: Mon, 28 Mar 2016 01:56:31 +0200	[thread overview]
Message-ID: <56F8732F.4010004@update.uu.se> (raw)
In-Reply-To: <20160327233049.GA11617@mercury.ccil.org>

On 2016-03-28 01:30, John Cowan wrote:
> Johnny Billquist scripsit:
>
>>>> Haha. Yes... Except that you now have multiple representations of each
>>>> character within one character set. So what has improved???
>
> Mojibake, though not unknown, is now much less common, and the number
> of documents on the web that are in UTF-8 (including its ASCII subset)
> is at 85% and rising.
>
>>> In the Good Old Days, characters were all the same size, and you could
>>> do nice, simple things like
>>>
>>>    while (*c && *c++ != " ");
>
> That particular piece of code still works if the encoding is UTF-8.
> Fundamentally, Unicode is complicated because human writing systems
> are complicated.

While true, I do not agree that Unicode is complicated because of 
writing systems. Unicode have surpassed the writing systems...

>> Another one I noted a while ago was that functions and command in
>> Unix, such as lpq, which try to print things in nice columns now
>> fail, because the code don't actually know how many characters have
>> been output.
>
> Well, if the font isn't fixed-width, you're screwed anyway.  But if
> it is, there is information in the Unicode tables that tells you which
> characters have widths of 0, 1, or 2.  Print programs can be modified
> to use that information.

(...or 3)
Yeah, you just need to suck in a few gigabytes of Unicode libraries in 
your 4K program. I'm not sure I agree that this is an acceptable solution.

>> And let's not even talk about such wonderful concepts as colors in
>> the character set definition... Unicode seems to have it all...
>
> Colors are optional.

Really. So how should Green Book (U+1F4D7) be rendered differently than 
Blue Book (U+1F4D8), or Orange Book (U+1F4D9) ?

Curious minds want to know...

>> I wonder how many code points exist for 'A'. It's definitely more than
>> one...
>
> Other than Greek and Cyrillic A letters, there are the math letters, which
> are used *in plain text* to designate semantic differences: plain A,
> italic A, and bold A mean different things mathematically.  Using the
> math italics for emphasis or book titles is a Bad Thing.

And what are your thoughts on FULLWIDTH LATIN CAPITAL LETTER A (U+FF21). 
What is the semantic difference in having more whitespace around the 
letter? (It should semantically be decomposed to LATIN CAPITAL LETTER A 
(U+41), so for all unicode string comparisons, it is equal to A, but 
it's still a different code point.)

	Johnny (Yes, I do not like Unicode...)

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


  reply	other threads:[~2016-03-27 23:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.169.1459059516.15972.tuhs@minnie.tuhs.org>
2016-03-27 10:09 ` [TUHS] Character sets (was: Command-line options) Johnny Billquist
2016-03-27 11:29   ` John Cowan
2016-03-27 11:47     ` [TUHS] Character sets Johnny Billquist
2016-03-27 21:49       ` Greg 'groggy' Lehey
2016-03-27 21:53         ` Johnny Billquist
2016-03-27 21:59           ` Greg 'groggy' Lehey
2016-03-27 22:19             ` Johnny Billquist
2016-03-27 22:21             ` Charles Anthony
2016-03-27 23:23               ` Dave Horsfall
2016-03-28  0:20                 ` John Cowan
2016-03-28  1:02                   ` Dave Horsfall
2016-03-28  0:18               ` Johnny Billquist
2016-03-27 23:30           ` John Cowan
2016-03-27 23:56             ` Johnny Billquist [this message]
2016-03-28  1:54               ` John Cowan
2016-03-28  3:27               ` Steve Nickolas
2016-03-28  1:20             ` Random832
2016-03-28  1:58               ` John Cowan
2016-03-28  5:12                 ` Random832

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56F8732F.4010004@update.uu.se \
    --to=bqt@update.uu.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).