From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 19 May 2006 17:12:01 -0700 From: Roman Shaposhnick To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] combining characters Message-ID: <20060520001201.GF14448@submarine> References: <20060519234025.GB14448@submarine> <3221436ff9c5efcdaa9cc68f2b78bd39@quanstro.net> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <3221436ff9c5efcdaa9cc68f2b78bd39@quanstro.net> User-Agent: Mutt/1.4.2.1i Topicbox-Message-UUID: 536e74e0-ead1-11e9-9d60-3106f5b1d025 On Fri, May 19, 2006 at 06:43:14PM -0500, quanstro@quanstro.net wrote: > eh? you speak russian. ;-) and two versions of it too ;-) > no. the unicode sequences (e.g. U+0069 U+0361) are correct. > i checked this and several other examples with the actual books. How did you check it ? Visual inspection ? Since I'm no expert in UNICODE I'm quite curious to know how one is supposed to tell between a real character and a combination of a diacritic and some other character when they are visually indistinguishable ? I would expect unicode to always favor single glyphs from a particular page over anything else. btw, could you send me a .png with the actual title ? > i think you misunderstand how unicode works. That could very well be the case ;-) But I know how Russian language works regardless of what committee members think. > a base cp like U+0069 followed by a combining cp like U+0361 > make a single character. this identification is called "composition". > unicode contains some precomposed cps, but not U+0069 U+0361. That's ok. My only point is -- I would expect anybody who enters titles into a database adhere to the rules of the language the title is written in. Maybe its too much to expect, though. Thanks, Roman.