From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Fri, 19 May 2006 17:12:01 -0700
From: Roman Shaposhnick <rvs@sun.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu>
Subject: Re: [9fans] combining characters
Message-ID: <20060520001201.GF14448@submarine>
References: <20060519234025.GB14448@submarine>
	<3221436ff9c5efcdaa9cc68f2b78bd39@quanstro.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
In-Reply-To: <3221436ff9c5efcdaa9cc68f2b78bd39@quanstro.net>
User-Agent: Mutt/1.4.2.1i
Topicbox-Message-UUID: 536e74e0-ead1-11e9-9d60-3106f5b1d025

On Fri, May 19, 2006 at 06:43:14PM -0500, quanstro@quanstro.net wrote:
> eh?  you speak russian. ;-)

  and two versions of it too ;-)

> no.  the unicode sequences (e.g. U+0069 U+0361) are correct.
> i checked this and several other examples with the actual books.

  How did you check it ? Visual inspection ? Since I'm no expert
  in UNICODE I'm quite curious to know how one is supposed to
  tell between a real character and a combination of a diacritic
  and some other character when they are visually indistinguishable ?
  I would expect unicode to always favor single glyphs from a particular 
  page over anything else.

  btw, could you send me a .png with the actual title ?

> i think you misunderstand how unicode works.  

  That could very well be the case ;-) But I know how Russian language
  works regardless of what committee members think.

> a base cp like U+0069 followed by a combining cp like U+0361 
> make a single character.  this identification is called "composition".
> unicode contains some precomposed cps, but not U+0069 U+0361.

  That's ok. My only point is -- I would expect anybody who enters 
  titles into a database adhere to the rules of the language the
  title is written in. Maybe its too much to expect, though.

Thanks,
Roman.