From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 19 May 2006 17:43:44 -0700 From: Roman Shaposhnick To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] combining characters Message-ID: <20060520004344.GI14448@submarine> References: <20060520001201.GF14448@submarine> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i Topicbox-Message-UUID: 538b0c4a-ead1-11e9-9d60-3106f5b1d025 On Fri, May 19, 2006 at 07:13:33PM -0500, quanstro@quanstro.net wrote: > On Fri May 19 19:13:39 CDT 2006, rvs@sun.com wrote: > > Since I'm no expert > > in UNICODE I'm quite curious to know how one is supposed to > > tell between a real character and a combination of a diacritic > > and some other character when they are visually indistinguishable ? > > say i have a random accented letter. There's no such thing as an accented letter in a Russian language. That was the exact point of my initial remark. Now, if you allow me to educate myself in Unicode a little bit, I'm about to follow through with your example. Be patient with me ;-) > suppose that U+x is the cp for the letter. > suppose U+y is the cp for the accent. Ok. > suppose that we're lucky and there exists U+w ? U+xU+y. Just to make sure I still follow: U+w is supposed to *visually* look like U+x followed by U+y, right ? > then U+w should be the same glyph as U+xU+y. The same glyph from a visual standpoint, right ? > cannonical composition would yield > compose(U+xU+y) U+w > compose(U+w) U+w > while cannonical decompostion would yield > decompose(U+xU+y) U+xU+y > decompose(U+w) U+xU+y And that's exactly the place where I think Unicode goes against common sense and language rules. I would expect it to mandate that a *decomposable* character is supposed to be used over the decomposition. Which in your original example was the case. > > I would expect unicode to always favor single glyphs from a particular > > page over anything else. > > it's always a single glyph. don't confuse letters, codepoints, and glyphs. It is still a bit hard to not confuse letters and glyphs :-( > i'll send you a png of the character. i don't have the books. > > what language rule are you trying to get at? "There are no accents in Russian language" (*) Thanks, Roman. (*) well, except for a Ukrainian one ;-)