[9fans] Simplified Chinese plan 9

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] Simplified Chinese  plan 9
@ 2009-09-11  8:40 xiangyu
  2009-09-11 10:23 ` erik quanstrom
  0 siblings, 1 reply; 22+ messages in thread
From: xiangyu @ 2009-09-11  8:40 UTC (permalink / raw)
  To: 9fans

HI..everyone:
       Is there some ways to input Simplified Chinese in plan 9 ? I
know plan 9 supports Unicode, so it is no questions for plan 9 to
display Simplified Chinese....... and i have seen some pictures on
Internet to prove it...so i have a question like that above...
   I'm looking forward for the answer........... Thanks first......!!!!



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese  plan 9
  2009-09-11  8:40 [9fans] Simplified Chinese plan 9 xiangyu
@ 2009-09-11 10:23 ` erik quanstrom
  2009-09-11 11:29   ` Alexander Sychev
  0 siblings, 1 reply; 22+ messages in thread
From: erik quanstrom @ 2009-09-11 10:23 UTC (permalink / raw)
  To: 9fans

> HI..everyone:
>        Is there some ways to input Simplified Chinese in plan 9 ? I
> know plan 9 supports Unicode, so it is no questions for plan 9 to
> display Simplified Chinese....... and i have seen some pictures on
> Internet to prove it...so i have a question like that above...
>    I'm looking forward for the answer........... Thanks first......!!!!

the only way to input simplified chinese currently
is to use the general codepoint input method.
<compose> 'x' + four hex digits.  on a pc compose =
<alt>.  that's probablly not what you're looking for.
i am not aware that anyone has written an input
method specificly for simplified chinese.

- erik



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese  plan 9
  2009-09-11 10:23 ` erik quanstrom
@ 2009-09-11 11:29   ` Alexander Sychev
  2009-09-11 16:13     ` Eris Discordia
  2009-09-11 16:54     ` Anthony Sorace
  0 siblings, 2 replies; 22+ messages in thread
From: Alexander Sychev @ 2009-09-11 11:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello!

Some time ago I wrote for inferno an analog of kbmap with an extention -  
a  possibility to print complex symbols via sequences of more basic  
symbols.
I use it for typing by the russian translit.
Here is a piece of file for my kbmap:
<------------cut --------------->
1       45      0
1       46      'Ц
1       47      'В
1       48      'Б
1       49      'Н
1       50      'М
C       цх      'ч
C       Цх      'Ч
C       сх      'ш
C       Сх      'Ш
C       сцх     'щ
C       Сцх     'Щ
<------------cut--------------->

The latin symbols are mapped to russian when it is possible. Other russian  
symbols are presented via sequences of mapped symbols, e.g. russian  
symbol  'Ч' [ch] is presented like an sequence of 'ц' [c] и 'х' [h].
A sequence can be broken by pressing any non-symbol key.
There is at least one big disadvantage of this method - the input focus   
can be changed, e.g. by mouse. In inferno I didn't resolve this problem,   
because /dev/pointer can be opened only once.

Maybe it makes a sence to make something like this in Plan9 (an analog  
kbmap) for typing complex symbols like an hieroglyph ?

On Fri, 11 Sep 2009 14:23:02 +0400, erik quanstrom <quanstro@quanstro.net>  
wrote:

>> HI..everyone:
>>        Is there some ways to input Simplified Chinese in plan 9 ? I
>> know plan 9 supports Unicode, so it is no questions for plan 9 to
>> display Simplified Chinese....... and i have seen some pictures on
>> Internet to prove it...so i have a question like that above...
>>    I'm looking forward for the answer........... Thanks first......!!!!
>
> the only way to input simplified chinese currently
> is to use the general codepoint input method.
> <compose> 'x' + four hex digits.  on a pc compose =
> <alt>.  that's probablly not what you're looking for.
> i am not aware that anyone has written an input
> method specificly for simplified chinese.
>
> - erik


-- 
Best regards,
   santucco



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese  plan 9
  2009-09-11 11:29   ` Alexander Sychev
@ 2009-09-11 16:13     ` Eris Discordia
  2009-09-11 17:49       ` erik quanstrom
  2009-09-11 16:54     ` Anthony Sorace
  1 sibling, 1 reply; 22+ messages in thread
From: Eris Discordia @ 2009-09-11 16:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Maybe it makes a sence to make something like this in Plan9 (an analog
> kbmap) for typing complex symbols like an hieroglyph ?

Your method is in essence what Microsoft's IME on Windows and various IMEs 
on UNIX-likes (such as SCUM) use. However, an IME for inputting from a list 
of over twenty thousand characters takes quite an effort to devise before 
it can be practical and useful. Right now even display of CJK is not quite 
fully supported on any existing FOSS platform (Ruby character display was 
added to Firefox only somewhere after version 3). Non-integrated pieces of 
FOSS with great capabilities do exist.

In case of (Simplified and Traditional) Chinese there apparently exist only 
two successful IMEs out there: one is Microsoft's, the other belongs to a 
Chinese company that has put lots of money and effort into developing the 
software. I believe both support input by Pinyin romanization, although I 
may be wrong. There's also Google's Pinyin IME which was involved in a 
lawsuit with said Chinese company.

In case of Japanese an IME needs to support three writing systems at once, 
firstly the two kana, and then transforming from kana to kanji. Abundance 
of homonyms in Japanese as well as a certain writing strategy called ateji 
(using kanji for phonetic value rather than semantic value) makes embedding 
of a dictionary into the IME unavoidable. Good dictionaries for this 
purpose don't come free--they must either be bought from professional 
companies or compiled by people who intimately know the language, 
preferably native speakers. This latter, I believe, is how IMEs on 
UNIX-likes came to be. Anyhow, Japanese IMEs, too, rely on input based on a 
romanization of the language. The actual number of distinct kanji required 
for input of text at a high school literate level is around two 
thousand--JLPT Level One roughly corresponds to that--but people, of 
course, expect a much larger dictionary. Microsoft IME also provides 
semantic aid by offering short descriptions of kanji so that people can 
decide which corresponds to the meaning they want to convey. Although 
unnecessary, it is a most welcome addition.

I don't know anything about Korean writing system or IMEs but since CJK 
ideographs (most importantly Han characters) are involved similar 
statements may apply.

Overall, there's no easy way that is light on financial and/or human 
resources--the two types of resources are interchangeable, i.e. if you have 
an active user base you may be able to avoid expenditure--to put CJK input 
support into a UI, which is probably why Plan 9 doesn't have that at the 
moment. It isn't a computer thing--it's a human thing. I might add porting 
IMEs from some UNIX-like system is probably the best option (for those with 
the technical prowess).

**********

DISTRACTION

While googling around for the existence of IMEs on Plan 9 I came across 
this document from 1996 titled "Unicode: Writing in the Global Village:"

> Despite these hurdles, Unicode may soon become the most common
> multilingual character-coding system. Support for multiple-language use
> is quickly growing. New operating systems—AT&T's Plan 9, Windows NT,
> Novell's Netware 4.01 Directory Services, Sybase's Gain Momentum, and
> Apple's Newton already support Unicode.

-- 
<http://www.nyu.edu/its/pubs/connect/archives/96fall/hargitaivillage.html>

It's funny how the author assumes display and input are the same thing 
while they so greatly differ, input being times harder to implement.

--On Friday, September 11, 2009 15:29 +0400 Alexander Sychev 
<santucco@gmail.com> wrote:

> Hello!
>
> Some time ago I wrote for inferno an analog of kbmap with an extention -
> a  possibility to print complex symbols via sequences of more basic
> symbols.
> I use it for typing by the russian translit.
> Here is a piece of file for my kbmap:
> <------------cut --------------->
> 1       45      0
> 1       46      'Ц
> 1       47      'В
> 1       48      'Б
> 1       49      'Н
> 1       50      'М
> C       цх      'ч
> C       Цх      'Ч
> C       сх      'ш
> C       Сх      'Ш
> C       сцх     'щ
> C       Сцх     'Щ
> <------------cut--------------->
>
> The latin symbols are mapped to russian when it is possible. Other
> russian symbols are presented via sequences of mapped symbols, e.g.
> russian symbol  'Ч' [ch] is presented like an sequence of 'ц' [c] и
> 'х' [h].
> A sequence can be broken by pressing any non-symbol key.
> There is at least one big disadvantage of this method - the input focus
> can be changed, e.g. by mouse. In inferno I didn't resolve this problem,
> because /dev/pointer can be opened only once.
>
> Maybe it makes a sence to make something like this in Plan9 (an analog
> kbmap) for typing complex symbols like an hieroglyph ?
>
> On Fri, 11 Sep 2009 14:23:02 +0400, erik quanstrom
> <quanstro@quanstro.net> wrote:
>
>>> HI..everyone:
>>>        Is there some ways to input Simplified Chinese in plan 9 ? I
>>> know plan 9 supports Unicode, so it is no questions for plan 9 to
>>> display Simplified Chinese....... and i have seen some pictures on
>>> Internet to prove it...so i have a question like that above...
>>>    I'm looking forward for the answer........... Thanks first......!!!!
>>
>> the only way to input simplified chinese currently
>> is to use the general codepoint input method.
>> <compose> 'x' + four hex digits.  on a pc compose =
>> <alt>.  that's probably not what you're looking for.
>> i am not aware that anyone has written an input
>> method specificly for simplified chinese.
>>
>> - erik
>
>
> --
> Best regards,
>    santucco
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-11 11:29   ` Alexander Sychev
  2009-09-11 16:13     ` Eris Discordia
@ 2009-09-11 16:54     ` Anthony Sorace
  2009-09-11 18:36       ` Eris Discordia
  1 sibling, 1 reply; 22+ messages in thread
From: Anthony Sorace @ 2009-09-11 16:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

i know very little about existing chinese input methods, so this is more a
question for my own understanding than a suggestion, but:

there is ktrans for Plan 9; the latest version i'm aware of is described here:
	http://basalt.cias.osakafu-u.ac.jp/plan9/s39.html
although that page is a bit hard to read since line breaks are not preserved.
the contents are just the README from the tar file; maybe easier to just
download that and read there.

anyway, the general idea is that it can compose kanji from strings of
hiragana. it's also been used for other languages (although my memory of
that says it was mostly for the transliteration function, rather than the
compositing function). is it possible to do something similar for the hanzi,
composing them up from roots/stems? i've seen reference to the idea in
chinese dictionaries, but have no idea if it's use is widespread.

i've had ktrans working on 4th edition in the past, although i just tried again
(after a long gap), and it blows an assert, which i've not looked into yet.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese  plan 9
  2009-09-11 16:13     ` Eris Discordia
@ 2009-09-11 17:49       ` erik quanstrom
  2009-09-11 19:14         ` Eris Discordia
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: erik quanstrom @ 2009-09-11 17:49 UTC (permalink / raw)
  To: 9fans

> I don't know anything about Korean writing system or IMEs but since CJK
> ideographs (most importantly Han characters) are involved similar
> statements may apply.

for korean per ce, there are only 24 characters:

http://thinkzone.wlonk.com/Language/Korean.htm

one would imagine that han input methods would work
well for han in korean text.

- erik



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-11 16:54     ` Anthony Sorace
@ 2009-09-11 18:36       ` Eris Discordia
  0 siblings, 0 replies; 22+ messages in thread
From: Eris Discordia @ 2009-09-11 18:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> anyway, the general idea is that it can compose kanji from strings of
> hiragana. it's also been used for other languages (although my memory of
> that says it was mostly for the transliteration function, rather than the
> compositing function). is it possible to do something similar for the
> hanzi, composing them up from roots/stems? i've seen reference to the
> idea in chinese dictionaries, but have no idea if it's use is widespread.

Kana to kanji conversion is peculiar to Japanese and that's basically how
all Japanese IMEs work. You input a series of kana (in Roman/Latin letters
converted on-the-fly), then either assert them as they are or accept a
corresponding kanji the IME offers. It's called inline conversion.
Conversion may also be explicitly requested from the software when for some
reason inline conversion results are unsatisfactory. It takes really good
UI design to make the process practical.

For Chinese, input from a standardized romanization is required, Pinyin
being the most widely used (cellphones, computers, people who learn Chinese
as a second language and would have an immensely hard time if they were to
write in ideographs, even many Chinese people). Kana to kanji conversion is
not viable there simply because kana is not the syllabary system used to
express Chinese. Chinese syllables do no correspond to kana, plus Chinese
is tonal while Japanese is not. Phonetically, and therefore input-wise
since practical CJK input is based on sounds rather than meanings, the two
languages are universes apart even though they share Han characters in the
semantic sphere. Actually, any practical input system should rely on sound
representation rather than meaning--there only so many sounds while there
are infinitely many meanings.

Roots/stems you refer to are elements in the ideographs used to classify
Han characters. They are more properly called radicals and are ordered by
stroke count, i.e. the number times you put down the pen to compose one
from the basic strokes. Most IMEs, _besides_ automatic conversion, offer
the option to choose a kanji/hanzi/hanja by any one of various lookup
methods. Radical lookup is one such method. There are other classifications
of Han characters such as Hadamitzky-Spahn (applicable to kanji) which
aren't present in many IMEs.

This is a great example of a full-blown Japanese word processor (it's
Windows freeware):

<http://www.physics.ucla.edu/~grosenth/jwpce.html>

Features nearly everything expected from a CJK input system and works
independent of MS IME although can also be used in conjunction.

At present, Windows and MS Office do an unrivalled job of enabling
multi-lingual input and display. I can't help but feel this is sort of a
lock-in situation for people who need/fancy that sort of capability. This
isn't really something I would revel in but it's at least reassuring that
there is _some_ convenient, stable, uniform way to get these things done.

--On Friday, September 11, 2009 12:54 -0400 Anthony Sorace
<anothy@gmail.com> wrote:

> i know very little about existing chinese input methods, so this is more a
> question for my own understanding than a suggestion, but:
>
> there is ktrans for Plan 9; the latest version i'm aware of is described
> here: 	http://basalt.cias.osakafu-u.ac.jp/plan9/s39.html
> although that page is a bit hard to read since line breaks are not
> preserved. the contents are just the README from the tar file; maybe
> easier to just download that and read there.
>
> anyway, the general idea is that it can compose kanji from strings of
> hiragana. it's also been used for other languages (although my memory of
> that says it was mostly for the transliteration function, rather than the
> compositing function). is it possible to do something similar for the
> hanzi, composing them up from roots/stems? i've seen reference to the
> idea in chinese dictionaries, but have no idea if it's use is widespread.
>
> i've had ktrans working on 4th edition in the past, although i just tried
> again (after a long gap), and it blows an assert, which i've not looked
> into yet.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese  plan 9
  2009-09-11 17:49       ` erik quanstrom
@ 2009-09-11 19:14         ` Eris Discordia
       [not found]         ` <68F5914168759B188DF09A60@192.168.1.2>
  2009-09-14  9:33         ` Paul Donnelly
  2 siblings, 0 replies; 22+ messages in thread
From: Eris Discordia @ 2009-09-11 19:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> http://thinkzone.wlonk.com/Language/Korean.htm

Interesting. I used to think Korean, too, uses a syllabary. Turns out it's
expressed alphabetically. Expressing Japanese that way would create some
space for confusion as there are certain sounds that never combine with
certain other sounds, e.g. there are 'sa,' 'se,' 'so,' and 'su' syllables
in which 's' is heard just like 's' in 'say' but there's no 'si'--there's
only 'shi.' If there existed an 's' character and also characters for
vowels the invalid combination 'si' could be created in writing. I wonder
if Korean alphabet can be used to make invalid combinations or all possible
combinations correspond to existing phonetic constructs.

--On Friday, September 11, 2009 13:49 -0400 erik quanstrom
<quanstro@quanstro.net> wrote:

>> I don't know anything about Korean writing system or IMEs but since CJK
>> ideographs (most importantly Han characters) are involved similar
>> statements may apply.
>
> for korean per ce, there are only 24 characters:
>
> http://thinkzone.wlonk.com/Language/Korean.htm
>
> one would imagine that han input methods would work
> well for han in korean text.
>
> - erik
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
       [not found]         ` <68F5914168759B188DF09A60@192.168.1.2>
@ 2009-09-11 19:53           ` Anthony Sorace
  2009-09-11 21:28             ` Eris Discordia
       [not found]             ` <C890B1F2A8C2EC12D5383D7C@192.168.1.2>
  0 siblings, 2 replies; 22+ messages in thread
From: Anthony Sorace @ 2009-09-11 19:53 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

lots of romance languages have exactly that characteristic, though
(maybe other languages, too). see C and G in italian. "ci" is simply
pronounced "correctly" as "chi".



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-11 19:53           ` Anthony Sorace
@ 2009-09-11 21:28             ` Eris Discordia
  2009-09-11 22:16               ` erik quanstrom
       [not found]             ` <C890B1F2A8C2EC12D5383D7C@192.168.1.2>
  1 sibling, 1 reply; 22+ messages in thread
From: Eris Discordia @ 2009-09-11 21:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> lots of romance languages have exactly that characteristic, though
> (maybe other languages, too). see C and G in italian. "ci" is simply
> pronounced "correctly" as "chi".

That's true but isn't exactly the same thing. "Irregularly" pronounced
combinations are still valid combinations. I'd say the universal example
for languages that are written in Latin alphabet or a variation thereof
would be the (notorious) 'fgsfds.' It's an invalid combination because
there is _no_ pronunciation at all--except 'figgis-fiddis' which is a
really recent, and ground-breaking, invention ;-)

With Japanese syllabaries one cannot produce unpronounceable sequences.
Nonsense, yes, but nothing that cannot be uttered.

--On Friday, September 11, 2009 15:53 -0400 Anthony Sorace
<anothy@gmail.com> wrote:

> lots of romance languages have exactly that characteristic, though
> (maybe other languages, too). see C and G in italian. "ci" is simply
> pronounced "correctly" as "chi".
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
       [not found]             ` <C890B1F2A8C2EC12D5383D7C@192.168.1.2>
@ 2009-09-11 21:59               ` Anthony Sorace
  0 siblings, 0 replies; 22+ messages in thread
From: Anthony Sorace @ 2009-09-11 21:59 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

that's a whole different problem, though.

your first problem was whether japanese would have some sort of
new or unique problem with an alphabet given the absence of certain
syllables (like shi) from the language. the answer is, of course, no:
the language would fall into either of the two extant conventions for
dealing with the syllable: always write "shi", or write "si" and just
change the pronunciation.

no written language stands independent of its pronunciation rules.
alphabets need a somewhat larger set of rules than syllabaries, but
that's true independent of language.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-11 21:28             ` Eris Discordia
@ 2009-09-11 22:16               ` erik quanstrom
  2009-09-12  1:19                 ` Eris Discordia
  0 siblings, 1 reply; 22+ messages in thread
From: erik quanstrom @ 2009-09-11 22:16 UTC (permalink / raw)
  To: 9fans

> That's true but isn't exactly the same thing. "Irregularly" pronounced
> combinations are still valid combinations. I'd say the universal example
> for languages that are written in Latin alphabet or a variation thereof
> would be the (notorious) 'fgsfds.' It's an invalid combination because
> there is _no_ pronunciation at all--except 'figgis-fiddis' which is a
> really recent, and ground-breaking, invention ;-)

by this definition, one could devise a valid input method
with which it would be impossible to type "xyzzy".

> no written language stands independent of its pronunciation rules.
> alphabets need a somewhat larger set of rules than syllabaries, but
> that's true independent of language.

i'm not sure they are fully dependent.  consider acronyms.  or even
variable names.  (sometimes these need to be referred to
in speech.)  there are special hacks for making these
pronouncable.  in mathematics the same symbol can
have many pronunciations that depend entirely on the
context.

i'm not a linguist, but the linguists i know subscribe to the
viewpoint that the written and spoken language are separate.
and evolve separately.  i would derive from this that writability
is independent of pronouncability.

trying to think as a linguist, i would consider spoken acronyms
to be cognates from the written language.

as an homage to j. arthur seebach i'd say, "english is *neat*".

- erik

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-11 22:16               ` erik quanstrom
@ 2009-09-12  1:19                 ` Eris Discordia
  2009-09-12  1:46                   ` erik quanstrom
  0 siblings, 1 reply; 22+ messages in thread
From: Eris Discordia @ 2009-09-12  1:19 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> your first problem was whether japanese would have some sort of
> new or unique problem with an alphabet given the absence of certain
> syllables (like shi) from the language. the answer is, of course, no:
> the language would fall into either of the two extant conventions for
> dealing with the syllable: always write "shi", or write "si" and just
> change the pronunciation.

You're right. There wouldn't be any "new or unique" problems but there
might have been some space for confusion, which is what I asserted. A
gojuuon (kana table) contains all permitted syllables (kana representatives
of _families_ of syllables, actually) while an alphabet would allow many
invalid combinations. For a syllabic-moraic language where there are almost
as many invalid combinations as there are valid ones this method makes good
sense.

> no written language stands independent of its pronunciation rules.
> alphabets need a somewhat larger set of rules than syllabaries, but
> that's true independent of language.

Um, "no written language" would be too strong. Avestan script was invented
to make obsolete pronunciation rules by containing a large enough, but not
too large, set of basic symbols that were to be in one-to-one
correspondence with phonetic constructs of the language(s) that mattered to
its inventors. Since there were no exceptions there was no need for rules
beyond the correspondence between symbols and phonetic constructs. Of
course, the script itself became obsolete in due time. Modern day IPA is a
better informed attempt with an expanded albeit similar goal, although it
still needs to "approximate" sounds of some languages and it is extremely
hard to learn and use for non-phoneticians; or phoneticians for that
matter, but at least learning IPA is part of their job.

**********

> i'm not a linguist, but the linguists i know subscribe to the
> viewpoint that the written and spoken language are separate.
> and evolve separately.  i would derive from this that writability
> is independent of pronouncability.

If a sequence of symbols corresponds to something from a natural language
then it must be pronounceable since it must have been uttered at some time.
The same rule may not apply to "extensions" to natural language (acronyms,
stenography) or artificial languages (mathematics, computer programs).

--On Friday, September 11, 2009 17:59 -0400 Anthony Sorace
<anothy@gmail.com> wrote:

> that's a whole different problem, though.
>
> your first problem was whether japanese would have some sort of
> new or unique problem with an alphabet given the absence of certain
> syllables (like shi) from the language. the answer is, of course, no:
> the language would fall into either of the two extant conventions for
> dealing with the syllable: always write "shi", or write "si" and just
> change the pronunciation.
>
> no written language stands independent of its pronunciation rules.
> alphabets need a somewhat larger set of rules than syllabaries, but
> that's true independent of language.
>

--On Friday, September 11, 2009 18:16 -0400 erik quanstrom
<quanstro@quanstro.net> wrote:

>> That's true but isn't exactly the same thing. "Irregularly" pronounced
>> combinations are still valid combinations. I'd say the universal example
>> for languages that are written in Latin alphabet or a variation thereof
>> would be the (notorious) 'fgsfds.' It's an invalid combination because
>> there is _no_ pronunciation at all--except 'figgis-fiddis' which is a
>> really recent, and ground-breaking, invention ;-)
>
> by this definition, one could devise a valid input method
> with which it would be impossible to type "xyzzy".
>
>> no written language stands independent of its pronunciation rules.
>> alphabets need a somewhat larger set of rules than syllabaries, but
>> that's true independent of language.
>
> i'm not sure they are fully dependent.  consider acronyms.  or even
> variable names.  (sometimes these need to be referred to
> in speech.)  there are special hacks for making these
> pronouncable.  in mathematics the same symbol can
> have many pronunciations that depend entirely on the
> context.
>
> i'm not a linguist, but the linguists i know subscribe to the
> viewpoint that the written and spoken language are separate.
> and evolve separately.  i would derive from this that writability
> is independent of pronouncability.
>
> trying to think as a linguist, i would consider spoken acronyms
> to be cognates from the written language.
>
> as an homage to j. arthur seebach i'd say, "english is *neat*".
>
> - erik
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-12  1:19                 ` Eris Discordia
@ 2009-09-12  1:46                   ` erik quanstrom
  2009-09-12  7:05                     ` Eris Discordia
  0 siblings, 1 reply; 22+ messages in thread
From: erik quanstrom @ 2009-09-12  1:46 UTC (permalink / raw)
  To: 9fans

> > i'm not a linguist, but the linguists i know subscribe to the
> > viewpoint that the written and spoken language are separate.
> > and evolve separately.  i would derive from this that writability
> > is independent of pronouncability.
>
> If a sequence of symbols corresponds to something from a natural language
> then it must be pronounceable since it must have been uttered at some time.
> The same rule may not apply to "extensions" to natural language (acronyms,
> stenography) or artificial languages (mathematics, computer programs).

i believe this distinction between "natural" and "artificial"
languages is, uh, arbitrary.  think of the symbols that people
im each other with.  these are largely unpronouncable.  and
i've only heard a few ever pronunced at all.  (rofl comes to mind,
though that term predates my knowledge of text messaging).

i also am not sure that there is such a thing as an extension to
a language.  natural languages never have sharp boundaries
and are pretty dynamic.  when did "byte" become a word?
when did "gift" become a verb?  look how fast text-ese has
evolved.

my concept of a language looks more like a standard deviation
than a box.

- erik

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-12  1:46                   ` erik quanstrom
@ 2009-09-12  7:05                     ` Eris Discordia
  2009-09-12  8:39                       ` Daniel Lyons
  0 siblings, 1 reply; 22+ messages in thread
From: Eris Discordia @ 2009-09-12  7:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> i believe this distinction between "natural" and "artificial"
> languages is, uh, arbitrary.

Well, I don't think this is true. The distinction is strong enough for
everyone to be able to immediately tell apart a language from a
non-language. Actually, I think the term "artificial language" is kind of a
courtesy. Natural language, to which the term "language" is most properly
applied, is way different in how much more redundant, imprecise, and
semantically potent it is.

Still, final judgment, or any judgment, in this matter is really linguists'
to make so I guess I should better suspend my own while listening to them
:-D

> these are largely unpronouncable.  and i've only heard a few ever
> pronunced at all.  (rofl comes to mind, though that term predates my
> knowledge of text messaging).

They fall into the category of stenography. Circumstances, e.g.
technological burden or limitations, inspire the trend of their creation.
"Coolness" factor creates new ones and sustains some. After many years of
IM (or SMS) they continue to be ad hoc and bound to subcultures--have you
yet seen 'inb4' or 'caek' used? I have--which is why I think their features
can't be used to draw inferences about language (they may be studied for
other purposes, of course). They aren't subject to the same dynamism,
particularly same constraints, the core of language is. Precisely because
they aren't used in actual conversation or any type of text that is worth,
to the writer, more than a throw-away note.

> natural languages never have sharp boundaries and are pretty dynamic.
> when did "byte" become a word? when did "gift" become a verb?  look how
> fast text-ese has evolved.

Sharp boundaries with what? That's some question ;-) Natural languages are
immediately discernible from most communication protocols used by non-human
entities. Byte has a long and confused story that doesn't quite make it
clear what it [byte] was initially meant to mean. Merriam-Webster dates
'gift' as a transitive verb to ca. 1550 CE.

There's a discussion of evolution of languages that involves a language
going from pidgin to creole to full-blown. Maybe "text-ese" is some sort of
pidgin, or more leniently creole, that draws on the "speakers'" native
language but the point here is that it will never evolve into a full-blown
language. All of its "speakers" are speakers of much stronger native
languages. Most of them share proper English as a language of global
communication. "Text-ese" and its (often self-professed) importance seem
like a fad to me. Do you think it will survive fast and reliable
speech-to-text and/or brain-to-computer interfaces, i.e. a time when the
technical burden of typing is removed without one having to expose one's
voice to the insecure Internet and complete strangers (as in voice chat)? I
know English will (because people think in it) but I seriously doubt
"text-ese," essentially required by technological limitations and peer
pressure among teens, will. Teen and other subculture languages, of course,
will continue to exist. Ain't it "magical and rad?"

--On Friday, September 11, 2009 21:46 -0400 erik quanstrom
<quanstro@quanstro.net> wrote:

>> > i'm not a linguist, but the linguists i know subscribe to the
>> > viewpoint that the written and spoken language are separate.
>> > and evolve separately.  i would derive from this that writability
>> > is independent of pronouncability.
>>
>> If a sequence of symbols corresponds to something from a natural
>> language  then it must be pronounceable since it must have been uttered
>> at some time.  The same rule may not apply to "extensions" to natural
>> language (acronyms,  stenography) or artificial languages (mathematics,
>> computer programs).
>
> i believe this distinction between "natural" and "artificial"
> languages is, uh, arbitrary.  think of the symbols that people
> im each other with.  these are largely unpronouncable.  and
> i've only heard a few ever pronunced at all.  (rofl comes to mind,
> though that term predates my knowledge of text messaging).
>
> i also am not sure that there is such a thing as an extension to
> a language.  natural languages never have sharp boundaries
> and are pretty dynamic.  when did "byte" become a word?
> when did "gift" become a verb?  look how fast text-ese has
> evolved.
>
> my concept of a language looks more like a standard deviation
> than a box.
>
> - erik
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-12  7:05                     ` Eris Discordia
@ 2009-09-12  8:39                       ` Daniel Lyons
  2009-09-12 14:22                         ` Eris Discordia
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Lyons @ 2009-09-12  8:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sep 12, 2009, at 1:05 AM, Eris Discordia wrote:

> There's a discussion of evolution of languages that involves a  
> language going from pidgin to creole to full-blown. Maybe "text-ese"  
> is some sort of pidgin, or more leniently creole, that draws on the  
> "speakers'" native language but the point here is that it will never  
> evolve into a full-blown language.

Once again, words you use recklessly turn out to have actual  
definitions. From Wikipedia:

"A pidgin language is a simplified language that develops as a means  
of communication between two or more groups that do not have a  
language in common..."

"A creole language, or simply a creole, is a stable language that  
originates from a mixture of various languages. The lexicon of a  
creole usually consists of words clearly borrowed from the parent  
languages, except for phonetic and semantic shifts. On the other hand,  
the grammar often has original features and may differ substantially  
from those of the parent languages."

I'm sure you'll provide us with the definitions from Merriam-Webster  
as well.

In other words, a pidgin is what you get when you have two groups  
without a common language being forced to communicate. A creole is  
what you get when their kids learn the pidgin as a first language.  
Linguists and physicists have a bad habit of making their jargon  
colorful so I'll only deduct half the usual points.

I agree with your conclusion, but I disagree with a couple steps in  
your reasoning. Namely, I don't think you could discover a systemic  
grammatical deviation from English in leet or text-speak or whatever.  
These are novel and amusing orthographies and in-crowd jargon and  
nothing more—people pronounce ROFL and LOL to be ironic and cute, not  
because they think they're words and would be surprised to learn their  
true origin. My wife and her best friend have a policy of pronouncing  
those abbreviations by spelling them out and then saying the last word  
("double-you tee ef fuck") to be funny. Also, plenty of people think  
in English differently than I do yet we all manage to communicate to  
the same degree (i.e. poorly but well enough to get by).

I also doubt that we'll have the kind of technology you're talking  
about, because I think 90% of the hard part of being a programmer  
comes from learning to think rigorously and that will be the stumbling  
block for anything digital that wants to try and digitize our  
thoughts. This is also the crux of my argument against the idea that  
computers will someday program themselves: the real barrier isn't  
hardware or motivation, it's that by the time you teach someone to be  
explicit enough that a computer can derive what they're trying to do,  
you've made them a programmer already (see Prolog for example). Same  
with strong AI: nobody has a clue how to word the problem precisely  
enough to write a program to solve it.

—
Daniel Lyons

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-12  8:39                       ` Daniel Lyons
@ 2009-09-12 14:22                         ` Eris Discordia
  2009-09-12 14:27                           ` erik quanstrom
  0 siblings, 1 reply; 22+ messages in thread
From: Eris Discordia @ 2009-09-12 14:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Once again, words you use recklessly turn out to have actual definitions.

I am aware of those definitions. Please refer to the Jared Diamond lecture 
titled "The Great Leap Forward" to (gracefully) understand what I am 
talking about. It is supposed in the discussion of language evolution I 
referred to (and Diamond beautifully explains in that lecture) that pidgins 
and creoles may be clues to to the "universal language/grammar" contained 
in human genetic heritage: the innate linguistic capability of humankind. 
Those two categories of "proto-languages" show the emergent nature of 
language and that when confronted with a new medium--on a plantation in a 
community of slaves and masters of various origins or in an electronic 
messaging system--humans tend to rework from scratch or from whatever 
available material a complete language guided by their inborn universal 
language. A few generations is all it takes to go from "proto-language" to 
language.

My argument was that in case of electronic messaging systems the 
"proto-language," while creating new symbols and even new syntax, never 
evolves into a full-blown language no matter how many generations use it 
(to date, at least two consecutive generations). In fact, because it is 
bound to subcultures that come and go, and because it is used to set up 
"cliques" within larger communities of users of the medium its usage never 
becomes effortless and "natural." The effort required to learn and keep up 
with the flavor of the month is part of the price one pays to stay in the 
clique. Hence, what I wrote: "they aren't subject to the same dynamism, 
particularly same constraints, the core of language is."

> Namely, I don't think you could discover a systemic grammatical deviation
> from English in leet or text-speak or whatever.

"Doesn't afraid of anything," eh? Or "inb4 pr0n?" "Amirite desu?" I have 
encountered dozens of those consistently-used constructs but you've been 
coding too much and IRCing too little, apparently, which is appreciable but 
undermines your judgment about "text-speak."

(Just in case, that third example performs at least three contortions at 
once: combines the Japanese SOV sentence order with English's SVO, uses a 
Japanese word in a semantically wrong, subculture-specific manner, and 
employs a "cool" version of "am I right" with only a subset of connotations 
that "am I right" can carry. Syntactic, lexical, and semantic.)

> These are novel and amusing orthographies and in-crowd jargon and nothing
> more [...]

I think we agree there: I said they were fad.

> I also doubt that we'll have the kind of technology you're talking about 
[...]

I cannot guarantee things but I can tell you this: expect speech synthesis 
from neural readings for motor incapacitated (think Stephen Hawking) in one 
decade or less. And, of course, I have my doubts, too, but I also have my 
hopes _and_ my thought experiments.

--On Saturday, September 12, 2009 02:39 -0600 Daniel Lyons 
<fusion@storytotell.org> wrote:

>
> On Sep 12, 2009, at 1:05 AM, Eris Discordia wrote:
>
>> There's a discussion of evolution of languages that involves a
>> language going from pidgin to creole to full-blown. Maybe "text-ese"
>> is some sort of pidgin, or more leniently creole, that draws on the
>> "speakers'" native language but the point here is that it will never
>> evolve into a full-blown language.
>
>
> Once again, words you use recklessly turn out to have actual definitions.
> From Wikipedia:
>
> "A pidgin language is a simplified language that develops as a means of
> communication between two or more groups that do not have a language in
> common..."
>
> "A creole language, or simply a creole, is a stable language that
> originates from a mixture of various languages. The lexicon of a creole
> usually consists of words clearly borrowed from the parent languages,
> except for phonetic and semantic shifts. On the other hand, the grammar
> often has original features and may differ substantially from those of
> the parent languages."
>
> I'm sure you'll provide us with the definitions from Merriam-Webster as
> well.
>
> In other words, a pidgin is what you get when you have two groups without
> a common language being forced to communicate. A creole is what you get
> when their kids learn the pidgin as a first language. Linguists and
> physicists have a bad habit of making their jargon colorful so I'll only
> deduct half the usual points.
>
> I agree with your conclusion, but I disagree with a couple steps in your
> reasoning. Namely, I don't think you could discover a systemic
> grammatical deviation from English in leet or text-speak or whatever.
> These are novel and amusing orthographies and in-crowd jargon and nothing
> more—people pronounce ROFL and LOL to be ironic and cute, not because
> they think they're words and would be surprised to learn their true
> origin. My wife and her best friend have a policy of pronouncing those
> abbreviations by spelling them out and then saying the last word
> ("double-you tee ef fuck") to be funny. Also, plenty of people think in
> English differently than I do yet we all manage to communicate to the
> same degree (i.e. poorly but well enough to get by).
>
> I also doubt that we'll have the kind of technology you're talking about,
> because I think 90% of the hard part of being a programmer comes from
> learning to think rigorously and that will be the stumbling block for
> anything digital that wants to try and digitize our thoughts. This is
> also the crux of my argument against the idea that computers will someday
> program themselves: the real barrier isn't hardware or motivation, it's
> that by the time you teach someone to be explicit enough that a computer
> can derive what they're trying to do, you've made them a programmer
> already (see Prolog for example). Same with strong AI: nobody has a clue
> how to word the problem precisely enough to write a program to solve it.
>
> —
> Daniel Lyons
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-12 14:22                         ` Eris Discordia
@ 2009-09-12 14:27                           ` erik quanstrom
  2009-09-12 14:39                             ` Eris Discordia
       [not found]                             ` <160F5E4B5D4057F12BB54C75@192.168.1.2>
  0 siblings, 2 replies; 22+ messages in thread
From: erik quanstrom @ 2009-09-12 14:27 UTC (permalink / raw)
  To: 9fans

> > These are novel and amusing orthographies and in-crowd jargon and nothing
> > more [...]
>
> I think we agree there: I said they were fad.

i think you need to read some chaucer.  you are
the boiling frog in a pot of words.

- erik



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
  2009-09-12 14:27                           ` erik quanstrom
@ 2009-09-12 14:39                             ` Eris Discordia
       [not found]                             ` <160F5E4B5D4057F12BB54C75@192.168.1.2>
  1 sibling, 0 replies; 22+ messages in thread
From: Eris Discordia @ 2009-09-12 14:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> i think you need to read some chaucer.  you are
> the boiling frog in a pot of words.

English isn't my native tongue. It's a bit too much to expect me to read
14th century "stuff" only to understand what probably amounts to an
affront. You tell me what is "the boiling frog in a pot of words."

--On Saturday, September 12, 2009 10:27 -0400 erik quanstrom
<quanstro@quanstro.net> wrote:

>> > These are novel and amusing orthographies and in-crowd jargon and
>> > nothing more [...]
>>
>> I think we agree there: I said they were fad.
>
> i think you need to read some chaucer.  you are
> the boiling frog in a pot of words.
>
> - erik
>







^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese plan 9
       [not found]                             ` <160F5E4B5D4057F12BB54C75@192.168.1.2>
@ 2009-09-12 20:22                               ` Nick LaForge
  0 siblings, 0 replies; 22+ messages in thread
From: Nick LaForge @ 2009-09-12 20:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

PLEASE ERIS!!  Your cerebral core-dumps are making me claustrophobic!



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese  plan 9
  2009-09-11 17:49       ` erik quanstrom
  2009-09-11 19:14         ` Eris Discordia
       [not found]         ` <68F5914168759B188DF09A60@192.168.1.2>
@ 2009-09-14  9:33         ` Paul Donnelly
  2009-09-14 12:47           ` Eris Discordia
  2 siblings, 1 reply; 22+ messages in thread
From: Paul Donnelly @ 2009-09-14  9:33 UTC (permalink / raw)
  To: 9fans

eris.discordia@gmail.com (Eris Discordia) writes:

>> http://thinkzone.wlonk.com/Language/Korean.htm
>
> Interesting. I used to think Korean, too, uses a syllabary. Turns out
> it's expressed alphabetically. Expressing Japanese that way would
> create some space for confusion as there are certain sounds that never
> combine with certain other sounds, e.g. there are 'sa,' 'se,' 'so,'
> and 'su' syllables in which 's' is heard just like 's' in 'say' but
> there's no 'si'--there's only 'shi.'

Actually, I belive that in Korean, "si" (시, if that displays for you at
all) is pronounced "shi". :P

> If there existed an 's' character and also characters for vowels the
> invalid combination 'si' could be created in writing. I wonder if
> Korean alphabet can be used to make invalid combinations or all
> possible combinations correspond to existing phonetic constructs.

Some combinations don't occur. Especially there are dipthongs that don't
occur. But that's not really strange or a problem. Consider the word:
qimk. It doesn't work in English, but the Latin alphabet still
functions.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [9fans] Simplified Chinese  plan 9
  2009-09-14  9:33         ` Paul Donnelly
@ 2009-09-14 12:47           ` Eris Discordia
  0 siblings, 0 replies; 22+ messages in thread
From: Eris Discordia @ 2009-09-14 12:47 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I've been, for the time being, officially p9-gagged due to "core-dumping" 
on the list. But thanks anyway for the information. And yes, the Latin 
alphabet does function.

--On Monday, September 14, 2009 09:33 +0000 Paul Donnelly 
<paul-donnelly@sbcglobal.net> wrote:

> eris.discordia@gmail.com (Eris Discordia) writes:
>
>>> http://thinkzone.wlonk.com/Language/Korean.htm
>>
>> Interesting. I used to think Korean, too, uses a syllabary. Turns out
>> it's expressed alphabetically. Expressing Japanese that way would
>> create some space for confusion as there are certain sounds that never
>> combine with certain other sounds, e.g. there are 'sa,' 'se,' 'so,'
>> and 'su' syllables in which 's' is heard just like 's' in 'say' but
>> there's no 'si'--there's only 'shi.'
>
> Actually, I belive that in Korean, "si" (시, if that displays for you at
> all) is pronounced "shi". :P
>
>> If there existed an 's' character and also characters for vowels the
>> invalid combination 'si' could be created in writing. I wonder if
>> Korean alphabet can be used to make invalid combinations or all
>> possible combinations correspond to existing phonetic constructs.
>
> Some combinations don't occur. Especially there are dipthongs that don't
> occur. But that's not really strange or a problem. Consider the word:
> qimk. It doesn't work in English, but the Latin alphabet still
> functions.
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-09-14 12:47 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-11  8:40 [9fans] Simplified Chinese plan 9 xiangyu
2009-09-11 10:23 ` erik quanstrom
2009-09-11 11:29   ` Alexander Sychev
2009-09-11 16:13     ` Eris Discordia
2009-09-11 17:49       ` erik quanstrom
2009-09-11 19:14         ` Eris Discordia
     [not found]         ` <68F5914168759B188DF09A60@192.168.1.2>
2009-09-11 19:53           ` Anthony Sorace
2009-09-11 21:28             ` Eris Discordia
2009-09-11 22:16               ` erik quanstrom
2009-09-12  1:19                 ` Eris Discordia
2009-09-12  1:46                   ` erik quanstrom
2009-09-12  7:05                     ` Eris Discordia
2009-09-12  8:39                       ` Daniel Lyons
2009-09-12 14:22                         ` Eris Discordia
2009-09-12 14:27                           ` erik quanstrom
2009-09-12 14:39                             ` Eris Discordia
     [not found]                             ` <160F5E4B5D4057F12BB54C75@192.168.1.2>
2009-09-12 20:22                               ` Nick LaForge
     [not found]             ` <C890B1F2A8C2EC12D5383D7C@192.168.1.2>
2009-09-11 21:59               ` Anthony Sorace
2009-09-14  9:33         ` Paul Donnelly
2009-09-14 12:47           ` Eris Discordia
2009-09-11 16:54     ` Anthony Sorace
2009-09-11 18:36       ` Eris Discordia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).