Re: Need hash: uthash? - Kristaps Dzonsons

tech@mandoc.bsd.lv
 help / color / mirror / Atom feed

From: Kristaps Dzonsons <kristaps@bsd.lv>
To: tech@mdocml.bsd.lv
Cc: Ingo Schwarze <schwarze@usta.de>
Subject: Re: Need hash: uthash?
Date: Sun, 31 Jul 2011 14:00:45 +0200	[thread overview]
Message-ID: <4E3543ED.5030103@bsd.lv> (raw)
In-Reply-To: <20110731104223.GB1831@iris.usta.de>

>> For the time being, this is implemented with the same linear search
>> of the `ds' and `de' macro keys.  This means O(mn) performance over
>> the number of words/characters (!) and keys.  In practise, when
>> profiling the code, libroff spends a lot of time running through
>> these lists. -mdoc gets away without even touching this code, but
>> -man (especially perlpod) is smacked hard.
>
> Is that still true with Joerg's suggestion?

Ingo,

No---well, not really.  As you can see, the code switches between using 
a byte-table for lookups (regular characters) and a linear search 
(escape sequences).  Cf. roff_strdup().  Again, most mdoc manuals never 
touch this code.

The topic of hashes is important and could make a big difference, but I 
don't yet have the benchmarks to substantiate this, so I'm not pushing 
it too hard.  I'm mainly bothered by the many diverse tables and methods 
that are floating around in the codebase.

The situation is as follows.

Perfect [minimal?] hashing.

   Usage: escape table, mdoc macro table, man macro table, roff macro table.

   Pros: removes a big chunk of code/space (depending on the hash 
implementation of course) and initialisation overhead (static), some 
speed-ups in lookups.

   Cons: machine-generated code, needs Yet Another Tool.  NetBSD has one 
(nmbperf), GNU has one (gperf).

   Alternatives: use a dynamic hash and eat the overhead.  Since I 
already manually "hash" these, this is an acceptable start.

Dynamic hashing.

   Usage: `de' and `ds' symbol table, `tr' symbol table.

   Pros: cleans up interfaces, significantly speeds up some areas (the 
linear lookup).  Will speed up -man documents by quite a bit.

   Cons: needs a hash mechanism.  Some space overhead for the table vs. 
the linked list.

   Alternatives: continue being slow.

What I'm going to do is implement most of these and run benchmarks to 
see what we actually get.  I'd done this before for a small part but 
have forgotten the results, so it's time to do it again.  I'm interested 
in both the single-manual improvement and the many-manual improvement.

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

     prev parent reply	other threads:[~2011-07-31 12:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-28 10:32 Kristaps Dzonsons
2011-07-28 15:04 ` Joerg Sonnenberger
2011-07-28 15:12   ` Kristaps Dzonsons
2011-07-28 15:18     ` Joerg Sonnenberger
2011-07-28 19:14       ` Kristaps Dzonsons
2011-07-31 10:42 ` Ingo Schwarze
2011-07-31 11:29   ` Joerg Sonnenberger
2011-07-31 12:00   ` Kristaps Dzonsons [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E3543ED.5030103@bsd.lv \
    --to=kristaps@bsd.lv \
    --cc=schwarze@usta.de \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).