From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <4B61A280020000CC0001D4A1@wlgw07.wlu.ca> Date: Thu, 28 Jan 2010 14:43:12 -0500 From: "Karljurgen Feuerherm" To: <9fans@9fans.net> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=__Part6A404DC0.0__=" Subject: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9b6534e-ead5-11e9-9d60-3106f5b1d025 This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part6A404DC0.0__= Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Hello, =20 A colleague put me on to Plan9, some of whose online documentation I have read with interest, in particular the =22Hello World=22 discussion as it relates to Unicode/UTF-8. =20 I=27m one of the authors of the Cuneiform proposal now encoded under Unicode (see block U+12000), and I=27m interesting in lex/yacc-like parsing of Unicode input to produce (among other things) Cuneiform output. =20 I realize some of the documentation was written long ago... so I=27m unclear as to whether or not (or how easily) Plan9 (and specifically its lex/yacc software, etc.) handles such things? (this sparked by the references to four hex digits etc.) =20 Many thanks if you can point me in the right direction :) (or to an alternative solution, if need be=21) =20 Best =20 K =20 Karlj=FCrgen G. Feuerherm, PhD Department of Archaeology and Classical Studies Wilfrid Laurier University 75 University Avenue West Waterloo, Ontario N2L 3C5 Tel. (519) 884-1970 x3193 Fax (519) 883-0991 (ATTN Arch. & Classics) --=__Part6A404DC0.0__= Content-Type: text/html; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Content-Description: HTML
Hello,
 
A colleague put me on to Plan9, some of whose online documentation I = have read with interest, in particular the "Hello World" discussion as it = relates to Unicode/UTF-8.
 
I'm one of the authors of the Cuneiform proposal now encoded under = Unicode (see block U+12000), and I'm interesting in lex/yacc-like parsing = of Unicode input to produce (among other things) Cuneiform output.
 
I realize some of the documentation was written long ago... so I'm = unclear as to whether or not (or how easily) Plan9 (and specifically its = lex/yacc software, etc.) handles such things? (this sparked by the = references to four hex digits etc.)
 
Many thanks if you can point me in the right direction :) (or to an = alternative solution, if need be!)
 
Best
 
K
 
Karlj=FCrgen G. Feuerherm, PhD
Department of Archaeology and = Classical Studies
Wilfrid Laurier University
75 University Avenue = West
Waterloo, Ontario N2L 3C5
Tel. (519) 884-1970 x3193
Fax = (519) 883-0991 (ATTN Arch. & Classics)
--=__Part6A404DC0.0__=-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Thu, 28 Jan 2010 15:05:03 -0500 To: 9fans@9fans.net Message-ID: <6277a4dcc738c2eee17e029efeb1b324@ladd.quanstro.net> In-Reply-To: <4B61A280020000CC0001D4A1@wlgw07.wlu.ca> References: <4B61A280020000CC0001D4A1@wlgw07.wlu.ca> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9bd4334-ead5-11e9-9d60-3106f5b1d025 > A colleague put me on to Plan9, some of whose online documentation I > have read with interest, in particular the "Hello World" discussion as > it relates to Unicode/UTF-8. > > I'm one of the authors of the Cuneiform proposal now encoded under > Unicode (see block U+12000), and I'm interesting in lex/yacc-like > parsing of Unicode input to produce (among other things) Cuneiform > output. > > I realize some of the documentation was written long ago... so I'm > unclear as to whether or not (or how easily) Plan9 (and specifically its > lex/yacc software, etc.) handles such things? (this sparked by the > references to four hex digits etc.) that's interesting stuff. lex(1) is generally not used, and doesn't support unicode. yacc(1) does a fine job with unicode. though, to be fair, most of that job falls on the lexer. however this is not hard to do by hand. there are many good examples in the distribution. the bio(2) buffered io library provides a Bgetrune function, which is generally what is desired. (i have some patches, partially stolen from russ, that should support extended plane runes at the cost of double the storage.) - erik From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <982374feab1ff1d8ea2f176256d16934@plan9.bell-labs.com> To: 9fans@9fans.net Date: Thu, 28 Jan 2010 15:46:27 -0500 From: geoff@plan9.bell-labs.com In-Reply-To: <4B61A280020000CC0001D4A1@wlgw07.wlu.ca> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9c5536c-ead5-11e9-9d60-3106f5b1d025 I've extended old code using lex to accept utf by massaging the input stream, before lex sees it, to parse utf and encode non-ascii Runes into '\33' (escape) followed by 4 hex digits. A simple lex rule then decodes for the benefit of yacc. This encodes: /* * lex can't cope with character sets wider than 8 bits, so convert * s to runes and encode non-ascii runes as . * result is malloced. */ char * utf2lex(char *s) { int nb, bytes; Rune r; char *news, *p, *ds; /* pass 1: count bytes needed by the converted string; watch for UTF */ for (p = s, nb = 0; *p != '\0'; p += bytes, nb++) { bytes = chartorune(&r, p); if (bytes > 1) nb += 4; } news = malloc(nb+1); if (news != 0) { /* pass 2: convert s into new string */ news[nb] = '\0'; for (p = s, ds = news; *p != '\0'; p += bytes) { bytes = chartorune(&r, p); if (bytes == 1) *ds++ = r; else ds += sprint(ds, "\33%.4ux", (int)r); } } return news; } and this lex code decodes: %{ char *lex2rune(Rune *rp, char *s); char *estrdup(char *); static Rune inrune; %} E \33 %% {E}.... { yylval.charp = estrdup(lex2rune(&inrune, yytext+1)); return inrune; } %% char * lex2rune(Rune *rp, char *s) { static char utf[UTFmax+1]; *rp = strtoul(s, 0, 16); utf[runetochar(utf, rp)] = '\0'; return utf; } From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> Date: Thu, 28 Jan 2010 15:59:29 -0500 From: "Karljurgen Feuerherm" To: <9fans@9fans.net> References: <4B61A280020000CC0001D4A1@wlgw07.wlu.ca> <982374feab1ff1d8ea2f176256d16934@plan9.bell-labs.com> In-Reply-To: <982374feab1ff1d8ea2f176256d16934@plan9.bell-labs.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=__Part19333EA1.0__=" Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9cdb49e-ead5-11e9-9d60-3106f5b1d025 This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part19333EA1.0__= Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Thanks, Geoff, and Erik. =20 However... (with my 5 minute intro to Runes courtesy of Hello World doc...) we=27re still talking BMP, right? =20 (I programmed in B back in the day... i.e. 1980-ish and due to a career shift have been out of things for a while, so forgive my potential obtuseness as I gradually reintegrate...=21) =20 This reminds me of what I read here: http://www.w3.org/2005/03/23-lex-U =20 K =20 Karlj=FCrgen G. Feuerherm, PhD Department of Archaeology and Classical Studies Wilfrid Laurier University 75 University Avenue West Waterloo, Ontario N2L 3C5 Tel. (519) 884-1970 x3193 Fax (519) 883-0991 (ATTN Arch. & Classics) >>> 28/01/2010 3:46:27 pm >>> I=27ve extended old code using lex to accept utf by massaging the input stream, before lex sees it, to parse utf and encode non-ascii Runes into =27=5C33=27 (escape) followed by 4 hex digits. A simple lex rule then decodes for the benefit of yacc. This encodes: /* * lex can=27t cope with character sets wider than 8 bits, so convert * s to runes and encode non-ascii runes as . * result is malloced. */ char * utf2lex(char *s) =7B int nb, bytes; Rune r; char *news, *p, *ds; /* pass 1: count bytes needed by the converted string; watch for UTF */ for (p =3D s, nb =3D 0; *p =21=3D =27=5C0=27; p +=3D bytes, nb++) =7B bytes =3D chartorune(&r, p); if (bytes > 1) nb +=3D 4; =7D news =3D malloc(nb+1); if (news =21=3D 0) =7B /* pass 2: convert s into new string */ news=5Bnb=5D =3D =27=5C0=27; for (p =3D s, ds =3D news; *p =21=3D =27=5C0=27; p +=3D bytes) =7B bytes =3D chartorune(&r, p); if (bytes =3D=3D 1) *ds++ =3D r; else ds +=3D sprint(ds, =22=5C33%.4ux=22, (int)r); =7D =7D return news; =7D and this lex code decodes: %=7B char *lex2rune(Rune *rp, char *s); char *estrdup(char *); static Rune inrune; %=7D E=5C33 %% =7BE=7D....=7B yylval.charp =3D estrdup(lex2rune(&inrune, yytext+1)); return inrune; =7D %% char * lex2rune(Rune *rp, char *s) =7B static char utf=5BUTFmax+1=5D; *rp =3D strtoul(s, 0, 16); utf=5Brunetochar(utf, rp)=5D =3D =27=5C0=27; return utf; =7D --=__Part19333EA1.0__= Content-Type: text/html; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Content-Description: HTML
Thanks, Geoff, and Erik.
 
However... (with my 5 minute intro to Runes courtesy of Hello World = doc...) we're still talking BMP, right?
 
(I programmed in B back in the day... i.e. 1980-ish and due to a = career shift have been out of things for a while, so forgive my potential = obtuseness as I gradually reintegrate...!)
 
This reminds me of what I read here: http://www.w3.org/2005/03/23-lex-U
 
K
 
Karlj=FCrgen G. Feuerherm, PhD
Department of Archaeology and = Classical Studies
Wilfrid Laurier University
75 University Avenue = West
Waterloo, Ontario N2L 3C5
Tel. (519) 884-1970 x3193
Fax = (519) 883-0991 (ATTN Arch. & Classics)

>>> <geoff@pl= an9.bell-labs.com> 28/01/2010 3:46:27 pm >>>
I've extended = old code using lex to accept utf by massaging the input
stream, before = lex sees it, to parse utf and encode non-ascii Runes
into '\33' = (escape) followed by 4 hex digits. A simple lex rule then
decodes for = the benefit of yacc.

This encodes:

/*
* lex can't cope = with character sets wider than 8 bits, so convert
* s to runes and = encode non-ascii runes as <esc><hex><hex><hex><h= ex>.
* result is malloced.
*/
char *
utf2lex(char *s)
{int nb, bytes;
Rune r;
char *news, *p, *ds;

/* pass 1: = count bytes needed by the converted string; watch for UTF */
for (p =3D = s, nb =3D 0; *p !=3D '\0'; p +=3D bytes, nb++) {
bytes =3D chartorune(&a= mp;r, p);
if (bytes > 1)
nb +=3D 4;
}
news =3D malloc(nb+1);=
if (news !=3D 0) {
/* pass 2: convert s into new string */
news[n= b] =3D '\0';
for (p =3D s, ds =3D news; *p !=3D '\0'; p +=3D bytes) = {
bytes =3D chartorune(&r, p);
if (bytes =3D=3D 1)
*ds++ =3D = r;
else
ds +=3D sprint(ds, "\33%.4ux", (int)r);
}
}
return = news;
}

and this lex code decodes:

%{
char *lex2rune(Ru= ne *rp, char *s);
char *estrdup(char *);

static Rune inrune;
%= }
E\33
%%
{E}....{
yylval.charp =3D estrdup(lex2rune(&inrun= e, yytext+1));
return inrune;
}
%%
char *
lex2rune(Rune = *rp, char *s)
{
static char utf[UTFmax+1];

*rp =3D strtoul(s, = 0, 16);
utf[runetochar(utf, rp)] =3D '\0';
return utf;
}

--=__Part19333EA1.0__=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> To: 9fans@9fans.net Date: Thu, 28 Jan 2010 16:20:41 -0500 From: geoff@plan9.bell-labs.com In-Reply-To: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9d3ee04-ead5-11e9-9d60-3106f5b1d025 Yes, we only support the 16-bit runes of Unicode plane 0. That really should be enough space, except for bungling by the Unicode Consortium. From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <4B61C07B020000CC0001D530@wlgw07.wlu.ca> Date: Thu, 28 Jan 2010 16:51:07 -0500 From: "Karljurgen Feuerherm" To: <9fans@9fans.net> References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> In-Reply-To: <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=__Part9CB6B8DB.0__=" Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9dee458-ead5-11e9-9d60-3106f5b1d025 This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part9CB6B8DB.0__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Well.... having worked with the Unicode Consortium, I know there's a = little more to it than that... :) =20 But it's ok. If I have to write a preprocessor to make it work on Plan9, I = might as well stay with the Unix system currently available to me and = write a preprocessor for that. =20 I suppose I could use the PUA and write a postprocessor.... =20 Anyhow, thanks for the info. Now, at least, I know more clearly what my = options are. =20 K >>> 28/01/2010 4:20:41 pm >>> Yes, we only support the 16-bit runes of Unicode plane 0. That really should be enough space, except for bungling by the Unicode Consortium. --=__Part9CB6B8DB.0__= Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Description: HTML
Well.... having worked with the Unicode Consortium, I know there's a = little more to it than that... :)
 
But it's ok. If I have to write a preprocessor to make it work on = Plan9, I might as well stay with the Unix system currently available to me = and write a preprocessor for that.
 
I suppose I could use the PUA and write a postprocessor....
 
Anyhow, thanks for the info. Now, at least, I know more clearly what = my options are.
 
K

>>> <geoff@plan9.bell-labs.com> 28/01/2010 = 4:20:41 pm >>>
Yes, we only support the 16-bit runes of = Unicode plane 0. That really
should be enough space, except for = bungling by the Unicode Consortium.


--=__Part9CB6B8DB.0__=-- From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <4B61C07B020000CC0001D530@wlgw07.wlu.ca> References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <4B61C07B020000CC0001D530@wlgw07.wlu.ca> Date: Thu, 28 Jan 2010 14:07:41 -0800 Message-ID: <13426df11001281407j6d8e5cecy413a9fbd9a707cca@mail.gmail.com> From: ron minnich To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9e85222-ead5-11e9-9d60-3106f5b1d025 On Thu, Jan 28, 2010 at 1:51 PM, Karljurgen Feuerherm wrote: > Well.... having worked with the Unicode Consortium, I know there's a little > more to it than that... :) I'm curious because I don't know much about all this stuff, I'm just grateful I can live in the low 7 bits ... what more is there? thanks ron From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <13426df11001281407j6d8e5cecy413a9fbd9a707cca@mail.gmail.com> References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <4B61C07B020000CC0001D530@wlgw07.wlu.ca> <13426df11001281407j6d8e5cecy413a9fbd9a707cca@mail.gmail.com> Date: Thu, 28 Jan 2010 23:19:58 +0100 Message-ID: From: hiro <23hiro@googlemail.com> To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=UTF-8 Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9ee7f44-ead5-11e9-9d60-3106f5b1d025 extinct languages, mahjong tiles,... On Thu, Jan 28, 2010 at 11:07 PM, ron minnich wrote: > On Thu, Jan 28, 2010 at 1:51 PM, Karljurgen Feuerherm wrote: >> Well.... having worked with the Unicode Consortium, I know there's a little >> more to it than that... :) > > > I'm curious because I don't know much about all this stuff, I'm just > grateful I can live in the low 7 bits ... what more is there? > > thanks > > ron > > From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <4B61CABC020000CC0001D558@wlgw07.wlu.ca> Date: Thu, 28 Jan 2010 17:34:52 -0500 From: "Karljurgen Feuerherm" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net> References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <4B61C07B020000CC0001D530@wlgw07.wlu.ca> <13426df11001281407j6d8e5cecy413a9fbd9a707cca@mail.gmail.com> In-Reply-To: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=__Part4C66681C.0__=" Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9f3b73e-ead5-11e9-9d60-3106f5b1d025 This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part4C66681C.0__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable There's a lot more than that! See http://www.unicode.org/charts/=20 =20 K =20 >>> hiro <23hiro@googlemail.com> 28/01/2010 5:19:58 pm >>> extinct languages, mahjong tiles,... On Thu, Jan 28, 2010 at 11:07 PM, ron minnich < rminnich@gmail.com > = wrote: > On Thu, Jan 28, 2010 at 1:51 PM, Karljurgen Feuerherm < kfeuerherm@wlu.ca= > wrote: >> Well.... having worked with the Unicode Consortium, I know there's a = little >> more to it than that... :) > > > I'm curious because I don't know much about all this stuff, I'm just > grateful I can live in the low 7 bits ... what more is there? > > thanks > > ron > > --=__Part4C66681C.0__= Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Description: HTML
There's a lot more than that! See http://www.unicode.org/charts/
 
K
 
>>> hiro <23hiro@googlemail.com> 28/01/2010 5:19:58 pm = >>>
extinct languages, mahjong tiles,...

On Thu, Jan = 28, 2010 at 11:07 PM, ron minnich < rminnich@gmail.com > wrote:
> On Thu, Jan 28, 2010 = at 1:51 PM, Karljurgen Feuerherm < kfeuerherm@wlu.ca > wrote:
>> Well.... having = worked with the Unicode Consortium, I know there's a little
>> = more to it than that... :)
>
>
> I'm curious because I = don't know much about all this stuff, I'm just
> grateful I can live = in the low 7 bits ... what more is there?
>
> thanks
>> ron
>
>

--=__Part4C66681C.0__=-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Thu, 28 Jan 2010 17:56:06 -0500 To: 9fans@9fans.net Message-ID: In-Reply-To: <13426df11001281407j6d8e5cecy413a9fbd9a707cca@mail.gmail.com> References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <4B61C07B020000CC0001D530@wlgw07.wlu.ca> <13426df11001281407j6d8e5cecy413a9fbd9a707cca@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca029b0a-ead5-11e9-9d60-3106f5b1d025 On Thu Jan 28 17:09:45 EST 2010, rminnich@gmail.com wrote: > On Thu, Jan 28, 2010 at 1:51 PM, Karljurgen Feuerherm wrote: > > Well.... having worked with the Unicode Consortium, I know there's a little > > more to it than that... :) > > > I'm curious because I don't know much about all this stuff, I'm just > grateful I can live in the low 7 bits ... what more is there? ☺ there, fixed that for ya! - erik From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <4B61C07B020000CC0001D530@wlgw07.wlu.ca> <13426df11001281407j6d8e5cecy413a9fbd9a707cca@mail.gmail.com> Date: Thu, 28 Jan 2010 21:38:55 -0200 Message-ID: <32d987d51001281538h5046c850n898b0628962b5824@mail.gmail.com> From: "Federico G. Benavento" To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca0fcf00-ead5-11e9-9d60-3106f5b1d025 as erik mentioned earlier, plan9port has 32bit runes... On Thu, Jan 28, 2010 at 8:56 PM, erik quanstrom wro= te: > On Thu Jan 28 17:09:45 EST 2010, rminnich@gmail.com wrote: >> On Thu, Jan 28, 2010 at 1:51 PM, Karljurgen Feuerherm wrote: >> > Well.... having worked with the Unicode Consortium, I know there's a l= ittle >> > more to it than that... :) >> >> >> I'm curious because I don't know much about all this stuff, I'm just >> grateful I can live in the low 7 bits ... what more is there? =E2=98=BA > > there, fixed that for ya! > > - erik > > --=20 Federico G. Benavento From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Thu, 28 Jan 2010 18:42:45 -0500 To: 9fans@9fans.net Message-ID: <1f4d3cc302892f8e9d9c788a0c3a7145@ladd.quanstro.net> In-Reply-To: <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> References: <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca16190a-ead5-11e9-9d60-3106f5b1d025 On Thu Jan 28 16:22:58 EST 2010, geoff@plan9.bell-labs.com wrote: > Yes, we only support the 16-bit runes of Unicode plane 0. That really > should be enough space, except for bungling by the Unicode Consortium. good point. at this point only ~21829 codepoints are assigned, depending on your definition of assigned. and i agree that the unicode consortium has taken a number decisions that make life difficult (unnecessary combiners and font-encodings for math characters are my pet peeves) . but now that the decision has been made, i think it makes sense to adapt, or at least put ourselves in the position to adapt. such a principled stance doesn't help someone who needs codepoints outside the basic plane. otherwise we become the 64000 characters ought to be enough for everyone guys. - erik From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <4B61E094020000CC0001D591@wlgw07.wlu.ca> Date: Thu, 28 Jan 2010 19:08:04 -0500 From: "Karljurgen Feuerherm" To: <9fans@9fans.net> References: <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <1f4d3cc302892f8e9d9c788a0c3a7145@ladd.quanstro.net> In-Reply-To: <1f4d3cc302892f8e9d9c788a0c3a7145@ladd.quanstro.net> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=__Part93B9B7F4.0__=" Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca1d2a7e-ead5-11e9-9d60-3106f5b1d025 This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part93B9B7F4.0__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable thanks, erik. that's exactly right--we may or may not agree with Unicode = decisions, but the fact is some of poor slobs are stuck in the higher = planes whether we like it or not, so are looking for help... =20 erik, should we pursue the plan9port question offlist? it is going to take = me time to integrate info, no need to bog the list down in what is likely = to be simple for most folk... =20 (apologies if i'm not catching on as quickly as one might like...) =20 K >>> erik quanstrom 28/01/2010 6:42:45 pm >>> On Thu Jan 28 16:22:58 EST 2010, geoff@plan9.bell-labs.com wrote: > Yes, we only support the 16-bit runes of Unicode plane 0. That really > should be enough space, except for bungling by the Unicode Consortium. good point. at this point only ~21829 codepoints are assigned, depending on your definition of assigned. and i agree that the unicode consortium has taken a number decisions that make life difficult (unnecessary combiners and font-encodings for math characters are my pet peeves) . but now that the decision has been made, i think it makes sense to adapt, or at least put ourselves in the position to adapt. such a principled stance doesn't help someone who needs codepoints outside the basic plane. otherwise we become the 64000 characters ought to be enough for everyone guys. - erik --=__Part93B9B7F4.0__= Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Description: HTML
thanks, erik. that's exactly right--we may or may not agree with = Unicode decisions, but the fact is some of poor slobs are stuck in the = higher planes whether we like it or not, so are looking for help...
 
erik, should we pursue the plan9port question offlist? it is = going to take me time to integrate info, no need to bog the list down in = what is likely to be simple for most folk...
 
(apologies if i'm not catching on as quickly as one might like...)
 
K

>>> erik quanstrom <quanstro@quanstro.net> = 28/01/2010 6:42:45 pm >>>
On Thu Jan 28 16:22:58 EST 2010, = geoff@plan9.bell-labs.com wrote:
> Yes, we only support the 16-bit runes of Unicode = plane 0. That really
> should be enough space, except for bungling = by the Unicode Consortium.

good point.

at this point only = ~21829 codepoints are assigned, depending
on your definition of = assigned. and i agree that the unicode
consortium has taken a number = decisions that make life difficult
(unnecessary combiners and font-encod= ings for math characters
are my pet peeves) .

but now that the = decision has been made, i think it makes sense
to adapt, or at least = put ourselves in the position to adapt.
such a principled stance = doesn't help someone who needs
codepoints outside the basic plane.
otherwise we become the 64000 characters ought to
be enough for = everyone guys.

- erik

--=__Part93B9B7F4.0__=-- From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> From: Rob Pike Date: Fri, 29 Jan 2010 11:19:37 +1100 Message-ID: <7359f0491001281619m45734186l181d8380b024d04f@mail.gmail.com> To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca25c9fe-ead5-11e9-9d60-3106f5b1d025 On Fri, Jan 29, 2010 at 8:20 AM, wrote: > Yes, we only support the 16-bit runes of Unicode plane 0. Really? They're 32 bits in plan9port and, although there are a few things that need to be patched, we know what they are. Rune should be 32 bits by now. -rob From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Thu, 28 Jan 2010 19:24:14 -0500 To: 9fans@9fans.net Message-ID: In-Reply-To: <7359f0491001281619m45734186l181d8380b024d04f@mail.gmail.com> References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <7359f0491001281619m45734186l181d8380b024d04f@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca2b0392-ead5-11e9-9d60-3106f5b1d025 > Really? They're 32 bits in plan9port and, although there are a few > things that need to be patched, we know what they are. > > Rune should be 32 bits by now. there is a patch for plan 9. actually 2 that enable one to set UTFmax = 4 and Runemax = 0x10ffff: /n/sources/patch/saved/runesize /n/sources/patch/saved/runesize2 - erik From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <7359f0491001281619m45734186l181d8380b024d04f@mail.gmail.com> Date: Thu, 28 Jan 2010 16:36:30 -0800 Message-ID: Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 From: Russ Cox To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: multipart/alternative; boundary=000e0cd32b749db68f047e42d364 Topicbox-Message-UUID: ca30653a-ead5-11e9-9d60-3106f5b1d025 --000e0cd32b749db68f047e42d364 Content-Type: text/plain; charset=UTF-8 Specifically: http://code.swtch.com/plan9port/changeset/3095 http://code.swtch.com/plan9port/changeset/3102 http://code.swtch.com/plan9port/changeset/3103 http://code.swtch.com/plan9port/changeset/3104 http://code.swtch.com/plan9port/changeset/3110 http://code.swtch.com/plan9port/changeset/3121 That should cover the bulk of the Plan 9 libraries and commands but omits the kernel. Russ --000e0cd32b749db68f047e42d364 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Specifically:


That should cover the bulk of the Plan 9 libraries
and commands=C2=A0but omits the kernel.

Russ=

--000e0cd32b749db68f047e42d364-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Thu, 28 Jan 2010 19:42:35 -0500 To: 9fans@9fans.net Message-ID: <08b26d23eca5c5d9ce909b869bf73928@ladd.quanstro.net> In-Reply-To: References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <7359f0491001281619m45734186l181d8380b024d04f@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca362254-ead5-11e9-9d60-3106f5b1d025 > Specifically: > > http://code.swtch.com/plan9port/changeset/3095 > http://code.swtch.com/plan9port/changeset/3102 > http://code.swtch.com/plan9port/changeset/3103 > http://code.swtch.com/plan9port/changeset/3104 > http://code.swtch.com/plan9port/changeset/3110 > http://code.swtch.com/plan9port/changeset/3121 > > That should cover the bulk of the Plan 9 libraries > and commands but omits the kernel. plan 9 port rejects fonts with characters at codepoints outside the basic plane. - erik From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <08b26d23eca5c5d9ce909b869bf73928@ladd.quanstro.net> References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <7359f0491001281619m45734186l181d8380b024d04f@mail.gmail.com> <08b26d23eca5c5d9ce909b869bf73928@ladd.quanstro.net> Date: Thu, 28 Jan 2010 16:58:55 -0800 Message-ID: Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 From: Russ Cox To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: multipart/alternative; boundary=000e0cd32b74ccb546047e43231d Topicbox-Message-UUID: ca3b637c-ead5-11e9-9d60-3106f5b1d025 --000e0cd32b74ccb546047e43231d Content-Type: text/plain; charset=UTF-8 On Thu, Jan 28, 2010 at 4:42 PM, erik quanstrom wrote: > > Specifically: > > > > http://code.swtch.com/plan9port/changeset/3095 > > http://code.swtch.com/plan9port/changeset/3102 > > http://code.swtch.com/plan9port/changeset/3103 > > http://code.swtch.com/plan9port/changeset/3104 > > http://code.swtch.com/plan9port/changeset/3110 > > http://code.swtch.com/plan9port/changeset/3121 > > > > That should cover the bulk of the Plan 9 libraries > > and commands but omits the kernel. > > plan 9 port rejects fonts with characters at codepoints > outside the basic plane. > http://code.swtch.com/plan9port/changeset/3140 Russ --000e0cd32b74ccb546047e43231d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On Thu, Jan 28, 2010 at 4:42 PM, erik quanstrom = <quanstro@qua= nstro.net> wrote:
plan 9 port rejects fonts with characters at codepoints
outside the basic plane.


Russ=C2=A0

--000e0cd32b74ccb546047e43231d-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Fri, 29 Jan 2010 01:08:25 -0500 To: 9fans@9fans.net Message-ID: <2179391c5e79f180197d72711e5fe00a@ladd.quanstro.net> In-Reply-To: References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <7359f0491001281619m45734186l181d8380b024d04f@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca4142ba-ead5-11e9-9d60-3106f5b1d025 if you're doing this in plan 9, bootstrapping the compiler is a bit of a pain. this could save some hassle: /n/sources/contrib/quanstro/8c-32bitrune these are the patches it turns out were missing /n/sources/patch/cc-32bitrune /n/sources/patch/sed-32bitrune /n/sources/patch/ed-32bitrune /n/sources/patch/libdraw-32bitrune /n/sources/patch/sambufsz there doesn't appear to be a convention for entering 32-bit runes in p9p yet, as in compose + X + hhhh. i propose by silly extension compose + Y + hhhhhh. i've got my system working to the point where i can type compose + Y01d510 with the clarisr font and get a fraktur m on the screen. nothing like a useless demo. one thing i really love about plan 9 is the ability to make big changes like this without having as step 1: boil the oceans. - erik From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <2179391c5e79f180197d72711e5fe00a@ladd.quanstro.net> References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <7359f0491001281619m45734186l181d8380b024d04f@mail.gmail.com> <2179391c5e79f180197d72711e5fe00a@ladd.quanstro.net> Date: Fri, 29 Jan 2010 06:18:06 +0000 Message-ID: From: Justin Jackson To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca4669de-ead5-11e9-9d60-3106f5b1d025 >one thing i really love about plan 9 is the ability to make >big changes like this without having as step 1: boil the >oceans. That's the quote of the week. Love it. From mboxrd@z Thu Jan 1 00:00:00 1970 Mime-Version: 1.0 (Apple Message framework v753.1) In-Reply-To: References: <4B61B461020000CC0001D4FA@wlgw07.wlu.ca> <3156d55fd27c66805eb5621e34222bb6@plan9.bell-labs.com> <7359f0491001281619m45734186l181d8380b024d04f@mail.gmail.com> <2179391c5e79f180197d72711e5fe00a@ladd.quanstro.net> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <613540B9-177C-49EE-BAF9-59379AAA5C34@fastmail.fm> Content-Transfer-Encoding: 7bit From: Ethan Grammatikidis Date: Fri, 29 Jan 2010 14:36:35 +0000 To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: ca52b5d6-ead5-11e9-9d60-3106f5b1d025 On 29 Jan 2010, at 6:18 am, Justin Jackson wrote: >> one thing i really love about plan 9 is the ability to make >> big changes like this without having as step 1: boil the >> oceans. > > That's the quote of the week. Love it. > I keep saying I like namespaces and file interfaces, but really this is _the_ reason I use Plan 9. :D -- http://xkcd.com/676/ Ethan Grammatikidis eekee57@fastmail.fm