From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <7cfc9061f18bd9aba567124d64be1ff5@quanstro.net> References: <20090726090437.GA29868@finiteless.net> <7cfc9061f18bd9aba567124d64be1ff5@quanstro.net> Date: Sun, 26 Jul 2009 11:39:56 -0700 Message-ID: <6e35c0620907261139u610c0431rbc3ecff6b16def29@mail.gmail.com> From: Jack Johnson To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable Subject: Re: [9fans] Woes of New Language Support Topicbox-Message-UUID: 2e1e383e-ead5-11e9-9d60-3106f5b1d025 If I'm reading you right, you're saying it might be easier if everything were encoded as combining (or maybe more aptly non-combining) codes, regardless of language? So, we might encode 'Waffles' as w+upper a f f l e s and let the renderer (if there is one) handle the presentation of the case shift and the potential ligature, but things like grep get noticeably easier with no overlap of =F5 and o+umlaut. Again, oversimplified, with no real understanding on my part of the depth or breadth of the problem space. If this is the case, could it be handled by pushing everything into a subset of unicode rather than use the unallocated space to create a superset? -J On 7/26/09, erik quanstrom wrote: >> to be fair to the unicode people, this decoupling of glyphs and codepoin= ts >> is (i think) the most straightforward way to implement some languages li= ke >> arabic, where the glyphs for characters depend on their position within = a >> word. that is, a letter at the beginning of a word looks different from >> what it would look like if it was in the middle. > > my opinion (not that i'm entitled to one here) is > that the unicode guys screwed up. unicode is not > consistant. explain why there are two code points sigma. > 03c3 greek small letter sigma > 03c2 greek small letter final sigma > why does german get =E4, =F6, =FC? if you want to take > this further, why are there capital forms of latin letters? > can't that also be inferred by the font? > > what's called a ligature in one language is a character > in another. i see no consistency. it seems like the > unicode committee had a problem with too much > knowledge of the specific problems and few actual > unifying (sorry) concepts. > > i think it would make much more sense to put this logic > in editors. this would also allow the freedom to use a > capital, ligature, final form in the wrong place. > like say studlyCaps. i can't imagine english is the only > language in the world that gets abused. > > - erik > > --=20 Sent from my mobile device