From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Mon, 30 Nov 2009 11:00:08 -0500 To: 9fans@9fans.net Message-ID: <22fc94a82c16f8b347bc45dd539b5fc6@coraid.com> In-Reply-To: <3aaafc130911300754i7f244f02j7d161d907d7a8bed@mail.gmail.com> References: <71b1e3b728efbd1b2a2ae2b5b4e2b1d0@coraid.com> <3aaafc130911300754i7f244f02j7d161d907d7a8bed@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Subject: Re: [9fans] =?utf-8?q?gr=C3=ABp_=28rhymes_with_creep=29_and_cptmp?= Topicbox-Message-UUID: a57623ec-ead5-11e9-9d60-3106f5b1d025 > ``unfold turns a character, say ë into the set of > characters that can be folded to the same base > character. so > ; unfold ë > [eèéêëēĕėęěȅȇȩḕḗḙḛḝẹẻẽếềểễệ]'' > > To me, that sounds like [e-f] should be > > [eèéêëēĕėęěȅȇȩḕḗḙḛḝẹẻẽếềểễệfƒ] > > iff e unfolds to the same set as ë. If e only unfolds to [e], then > [e-f] would unfold to [ef]. i don't think that works. consider [e-g]. normally this would match 'f', but under your algorithm it wouldn't. the problem is that [a-z] works because ascii is arranged in alphabetical order. all the various accented characters are not. that's why the folding approach has an advantage [a-z] will work and will do the Right Thing. - erik