9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: [9fans] hard-coded UTF-8 in wc.c
Date: Tue, 28 Dec 2010 23:40:43 -0500	[thread overview]
Message-ID: <a851adfdb809ddbea3aa2c161c75ee60@plug.quanstro.net> (raw)

this just popped up when i was searching the archive.

On Mon 15 Mar 2010 18:44:41 EST 2010, quanstro@quanstro.net wrote:
> On Mon Mar 15 17:46:11 EDT 2010, aim0shei@lav... wrote:
> > Yes, but why wc utility counts runes (wc(1) call them runes) manually
> > using huge table instead of using functions from rune(3) such as utflen?
>
> i didn't write wc, but i would imagine that it's for speed.

i took some time a few weeks ago to extend wc to handle runes
up to 0x10ffff which ment adding 3 states for 4-byte runes and
adding an additional table.  with that perspective ...

wc is a big state machine.  using the rune functions would hide
a good deal of the state machine, which would make the states
harder to understand and some of this work would need to be redone.
the tables are actually really easy to understand and generate.
wikipedia has a discussion of the bit patterns which can help.

- erik



             reply	other threads:[~2010-12-29  4:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-29  4:40 erik quanstrom [this message]
  -- strict thread matches above, loose matches on Subject: below --
2010-03-15 21:02 anonymous
2010-03-15 21:13 ` erik quanstrom
2010-03-15 21:35   ` anonymous
2010-03-15 22:44     ` erik quanstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a851adfdb809ddbea3aa2c161c75ee60@plug.quanstro.net \
    --to=quanstro@quanstro.net \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).