tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Subject: Re: Need hash: uthash?
Date: Sun, 31 Jul 2011 12:42:23 +0200	[thread overview]
Message-ID: <20110731104223.GB1831@iris.usta.de> (raw)
In-Reply-To: <4E313AC0.2010106@bsd.lv>

Hi Kristaps,

Kristaps Dzonsons wrote on Thu, Jul 28, 2011 at 12:32:32PM +0200:

> mandoc is getting a `tr' implementation*, needed primarily for
> perlpod.

It's astounding how fast this happened.
I can't even keep up with reading the code right now,
however, i'm sure it's good to have it.

> This is expensive as it involves iterating over each character in
> each text string, then each element in an array of `tr' characters
[...]

Good that Joerg's suggestions mitigated this problem.

> For the time being, this is implemented with the same linear search
> of the `ds' and `de' macro keys.  This means O(mn) performance over
> the number of words/characters (!) and keys.  In practise, when
> profiling the code, libroff spends a lot of time running through
> these lists. -mdoc gets away without even touching this code, but
> -man (especially perlpod) is smacked hard.

Is that still true with Joerg's suggestion?

> In short, the `ds', `de', and now `tr' macros can really benefit
> from a hash table.  I'm settling on using uthash** due to its
> license---I can directly include it in the code.

All the required tools for developing mandoc ought to be in
OpenBSD base.  So, *if* we settle on uthash, we should first
discuss that with the other developers, and even if people
agree, we will likely have to maintain it in tree ourselves.

The first things we will be asked is:  Why not use the existing
tools?  If i understand correctly, uthash is a dynamic hasher
(as opposed to a perfect static hasher).  What about hash(3) in
/usr/src/lib/libc/db/hash/ ?  Is that useable?

But even if we use tools from base, i think in the current
phase of development, we should only do optimizations that
require additional libraries if they really cause HUGE
performance gain in practice.  All of the places where this
might be used are still covered in dust that is hardly starting
to settle down, so simplicity and flexibility are still top
priorities, and will stay for some time.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

  parent reply	other threads:[~2011-07-31 10:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-28 10:32 Kristaps Dzonsons
2011-07-28 15:04 ` Joerg Sonnenberger
2011-07-28 15:12   ` Kristaps Dzonsons
2011-07-28 15:18     ` Joerg Sonnenberger
2011-07-28 19:14       ` Kristaps Dzonsons
2011-07-31 10:42 ` Ingo Schwarze [this message]
2011-07-31 11:29   ` Joerg Sonnenberger
2011-07-31 12:00   ` Kristaps Dzonsons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110731104223.GB1831@iris.usta.de \
    --to=schwarze@usta.de \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).