9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Woes of New Language Support
@ 2009-07-26  1:55 akumar
  2009-07-26  5:08 ` erik quanstrom
  0 siblings, 1 reply; 17+ messages in thread
From: akumar @ 2009-07-26  1:55 UTC (permalink / raw)
  To: 9fans

I've been trying to add support for Sanskrit derived
languages, but just rendering the characters has halted
progress.  For the currently supported languages,
such as English, Russian, Greek, French, even Japanese, the
characters are more or less statically mapped to the unicode
(looking at my $font again, I see that Kanji bitmaps are
perhaps mapped to unspecified unicode ranges?).

However, in the class of languages for which I am trying to
provide support, certain characters are meant to be produced
by an ordered combination of other characters.  For example,
the general sequence in Devanagari script (and this extends
to the other scripts as well) is that
consonant+virama+consonant produces
half-consonant+consonant, where the half-consonant has no
other unicode specification.  As a concrete case in
Devanagari, na virama sa (viz., \u0928\u094d\u0938) should
produce the nsa character (this sequence can be seen in any
unicode representation of the word "Sanskrit" in Devanagari
script).

It seems to me that TTF font specifications (i.e., those I
converted to subfonts using Federico's ttf2subf) include
these sequence definitions, which are then processed by each
application providing support for the fonts.  Plan 9
subfonts are much too simple for this.

So, in this case, what are some ideas towards representing
the above?  I've thought about a ktrans-alike that perhaps
filters the data rio gets and processes it for these sort of
things, but it doesn't seem to be the best possible way to
proceed.  If we can even get past this hurdle, I'd be more
than happy to patch ktrans for input support for this class
of languages.


Thanks,
ak




^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [9fans] Woes of New Language Support
@ 2009-07-26 11:43 Akshat Kumar
  0 siblings, 0 replies; 17+ messages in thread
From: Akshat Kumar @ 2009-07-26 11:43 UTC (permalink / raw)
  To: 9fans, quanstro

> what is the total number of stealth characters like nsa?
> if it'not too unreasonable, it might be good enough to steal part of
> the operating system or application reserved areas.

Any consonant should be able to become a half-consonant,
but only when followed by another consonant. In the TTF
method, character type checking falls out easily. I'm still up
for your suggestion, which if I understand it correctly, is to
take up parts of the unspecified unicode ranges and dedicate
them to half-consonants? You would then have to do this for
Bengali, Telugu, Tamil, Gujarati, Gurumukhi (I think), and
perhaps a couple of others. It's the fastest implementation, but
has a couple of set backs:
    (a) it is not homogeneous across all Plan 9 distributions, and
    (b) it diverts from general Unicode standards, and thus, the
    problem of reading texts is still present, as everyone else is
    still using the consonant+virama+consonant sequence as
    opposed to following our self-defined code maps.
One can deal with (a) if dedicated enough to language support
for a billion or so people, but (b) is pretty serious and still presents
us with the same full-stop as before.
If there were some way to map unicode sequences to our self-defined
codes, then that could work in this methodology. kbmap perhaps?


Best,
ak



^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [9fans] Woes of New Language Support
@ 2009-07-26 12:01 Akshat Kumar
  0 siblings, 0 replies; 17+ messages in thread
From: Akshat Kumar @ 2009-07-26 12:01 UTC (permalink / raw)
  To: 9fans

Please disregard the question, "kbmap perhaps?" in my
last post.
I quickly realised that kbmap is only for inputs, while
I'm discussing plain old output from every other source.

partying too much
ak



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-07-28 17:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-26  1:55 [9fans] Woes of New Language Support akumar
2009-07-26  5:08 ` erik quanstrom
2009-07-26  7:41   ` andrey mirtchovski
2009-07-26 14:32     ` erik quanstrom
2009-07-28 10:39       ` Charles Forsyth
2009-07-28 14:11         ` Ethan Grammatikidis
2009-07-28 14:52           ` John Floren
2009-07-28 17:46             ` Ethan Grammatikidis
2009-07-26  9:04   ` Salman Aljammaz
2009-07-26 13:48     ` erik quanstrom
2009-07-26 14:12       ` tlaronde
2009-07-26 14:24         ` erik quanstrom
2009-07-26 17:56       ` Nathaniel W Filardo
2009-07-26 18:39       ` Jack Johnson
2009-07-27  0:28         ` erik quanstrom
2009-07-26 11:43 Akshat Kumar
2009-07-26 12:01 Akshat Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).