From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4d11e29c8ac6819bed2e1a1e6d6da764@quanstro.net> Date: Fri, 19 May 2006 19:44:56 -0500 From: quanstro@quanstro.net To: 9fans@cse.psu.edu Subject: Re: [9fans] combining characters In-Reply-To: <20060520004344.GI14448@submarine> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Topicbox-Message-UUID: 53a4f9ca-ead1-11e9-9d60-3106f5b1d025 On Fri May 19 19:45:43 CDT 2006, rvs@sun.com wrote: > There's no such thing as an accented letter in a Russian language. > That was the exact point of my initial remark. the text was /romanized/ russian names. it was not written in the cyrill= ic=20 alphabet. >=20 > Now, if you allow me to educate myself in Unicode a little bit, > I'm about to follow through with your example. Be patient with me ;-) as long as you're patient with me. >=20 > > suppose that U+x is the cp for the letter. =20 > > suppose U+y is the cp for the accent. >=20 > Ok. >=20 > > suppose that we're lucky and there exists U+w ? U+xU+y. =20 >=20 > Just to make sure I still follow: U+w is supposed to *visually*=20 > look like U+x followed by U+y, right ? yes. they must be the same. >=20 > > then U+w should be the same glyph as U+xU+y. >=20 > The same glyph from a visual standpoint, right ? a glyph IS the visual representation.=20 >=20 > > cannonical composition would yield > > compose(U+xU+y) U+w > > compose(U+w) U+w > > while cannonical decompostion would yield > > decompose(U+xU+y) U+xU+y > > decompose(U+w) U+xU+y >=20 > And that's exactly the place where I think Unicode goes against commo= n > sense and language rules. I would expect it to mandate that a *decomp= osable* > character is supposed to be used over the decomposition. Which in you= r > original example was the case. rob agrees with you. however, there is a big advantage to a composed character -- you don't ha= ve to figure out how to stick the horn, breve, slash, &c on top of, under, on the shoulder= of, through, &c the original character. in plan 9, characters are bitmaps making this op= eration extra annoying. also, there are no rules in unicode preventing /arbitrary/ com= positions. this is valid unicode u+0069 u+0300 u+0301 u+0302 u+0303 all those combining codepoints attach to the base cp u+0069. figure out = how to build that glyph. >=20 > "There are no accents in Russian language" (*) > now you're confusing language and alphabet! =E2=98=BA - erik