The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: bqt@update.uu.se (Johnny Billquist)
Subject: [TUHS] Character sets
Date: Mon, 28 Mar 2016 00:19:32 +0200	[thread overview]
Message-ID: <56F85C74.2040805@update.uu.se> (raw)
In-Reply-To: <20160327215947.GT3766@eureka.lemis.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2456 bytes --]

On 2016-03-27 23:59, Greg 'groggy' Lehey wrote:
> On Sunday, 27 March 2016 at 23:53:32 +0200, Johnny Billquist wrote:
>> On 2016-03-27 23:49, Greg 'groggy' Lehey wrote:
>>> On Sunday, 27 March 2016 at 13:47:43 +0200, Johnny Billquist wrote:
>>>> On 2016-03-27 13:29, John Cowan wrote:
>>>>> Johnny Billquist scripsit:
>>>>>
>>>>>> On 2016-03-27 08:18, Greg 'groggy' Lehey<grog at lemis.com> wrote:
>>>>>>> Isn't it wonderful that we no longer have issues with character
>>>>>>> representation?
>>>>>>
>>>>>> I hope that comment was meant as a joke, ironic, cynical, or whatever...
>>>>>
>>>>> Undoubtedly.  But things *are* much better than they used to be:
>>>>> we can now do everything within a single character set, and convert
>>>>> only at the boundaries (and increasingly, only in one direction).
>>>>
>>>> Haha. Yes... Except that you now have multiple representations of each
>>>> character within one character set. So what has improved???
>>>
>>> In the Good Old Days, characters were all the same size, and you could
>>> do nice, simple things like
>>>
>>>     while (*c && *c++ != " ");
>>>
>>> Now you need a whole library to do the same thing.
>>
>> Another one I noted a while ago was that functions and command in Unix,
>> such as lpq, which try to print things in nice columns now fail, because
>> the code don't actually know how many characters have been output.
>>
>> And let's not even talk about such wonderful concepts as colors in the
>> character set definition... Unicode seems to have it all... I wonder how
>> many code points exist for 'A'. It's definitely more than one...
>
> For some definition of A, of course.  In addition there's clearly at
> least Α (0x391) and А (0x410).

Oh, definitely. I'm trying to limit myself to Latin-A at the moment. 
Otherwise the list will just be ridiculously long.

You have, of course, U+41, but you also have U+FF21. But if you want to 
go slightly silly, you also have U+1F110, U+1F130, U+1F150, U+1F170, 
U+1F1E6, U+E0041... And god know if I've missed some other ones.
Of course whitespace and other typographic details matters. That's why 
we have different code points for the letter, depending on things like 
whitespace.

	Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


  reply	other threads:[~2016-03-27 22:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.169.1459059516.15972.tuhs@minnie.tuhs.org>
2016-03-27 10:09 ` [TUHS] Character sets (was: Command-line options) Johnny Billquist
2016-03-27 11:29   ` John Cowan
2016-03-27 11:47     ` [TUHS] Character sets Johnny Billquist
2016-03-27 21:49       ` Greg 'groggy' Lehey
2016-03-27 21:53         ` Johnny Billquist
2016-03-27 21:59           ` Greg 'groggy' Lehey
2016-03-27 22:19             ` Johnny Billquist [this message]
2016-03-27 22:21             ` Charles Anthony
2016-03-27 23:23               ` Dave Horsfall
2016-03-28  0:20                 ` John Cowan
2016-03-28  1:02                   ` Dave Horsfall
2016-03-28  0:18               ` Johnny Billquist
2016-03-27 23:30           ` John Cowan
2016-03-27 23:56             ` Johnny Billquist
2016-03-28  1:54               ` John Cowan
2016-03-28  3:27               ` Steve Nickolas
2016-03-28  1:20             ` Random832
2016-03-28  1:58               ` John Cowan
2016-03-28  5:12                 ` Random832

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56F85C74.2040805@update.uu.se \
    --to=bqt@update.uu.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).