From: Kristaps Dzonsons <kristaps@bsd.lv>
To: tech@mdocml.bsd.lv
Subject: Improve catman mandocdb(8) heuristics.
Date: Wed, 07 Dec 2011 15:56:52 +0100 [thread overview]
Message-ID: <4EDF7EB4.7040906@bsd.lv> (raw)
[-- Attachment #1: Type: text/plain, Size: 763 bytes --]
Hi,
Enclosed is a patch to de-backspace Nm/Nd lines for mandocdb(8). This
arose from seeing the results for some LAPACK manuals, which are
notoriously shitty. It also cleans up handling of the non-terminated
string a bit and adds a quick check to see if SYNOPSIS has been reached
right after the NAME. This occurs when manuals look like this:
NAME
SYNOPSIS
Blah blah blah
Again, LAPACK...
If it's relevant, a check for NAME could also occur, then loop back into
the fgets(). Thoughts?
This area still needs a bit more attention to handle situations like:
foo -[\n]?
[whitespace]foo - bar[whitespace][\n]?
I need to check this over a bit more carefully to see if I'm not
trampling past the array, but this is a start.
Best,
Kristaps
[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 1954 bytes --]
Index: mandocdb.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandocdb.c,v
retrieving revision 1.25
diff -u -p -r1.25 mandocdb.c
--- mandocdb.c 7 Dec 2011 01:57:20 -0000 1.25
+++ mandocdb.c 7 Dec 2011 14:55:39 -0000
@@ -1301,31 +1301,69 @@ pformatted(DB *hash, struct buf *buf, st
}
fclose(stream);
+ /*
+ * Strip out backspace-encoding.
+ * Also handle the bogus case where the backspace is malformed
+ * at the beginning or end of the line.
+ */
+
+ while (NULL != (p = memchr(line, '\b', len))) {
+ plen = p - line;
+ if (plen == --len)
+ continue;
+ if (plen > 0) {
+ memmove(p - 1, p + 1, len - plen);
+ len--;
+ } else
+ memmove(p, p + 1, len);
+ }
+
+ /*
+ * Check if there's no name/description information. This
+ * happens with some manuals, e.g., LAPACK. If not, reuse our
+ * title.
+ */
+
+ if (len > 0 && '\n' == line[len - 1]) {
+ line[--len] = '\0';
+ if (0 == strcmp(line, "SYNOPSIS")) {
+ buf_appendb(dbuf, buf->cp, buf->size);
+ hash_put(hash, buf, TYPE_Nd);
+ return;
+ }
+ } else if (0 == len) {
+ buf_appendb(dbuf, buf->cp, buf->size);
+ hash_put(hash, buf, TYPE_Nd);
+ return;
+ }
+
/*
* If there is a dash, skip to the text following it.
*/
- for (p = line, plen = len; plen; p++, plen--)
- if ('-' == *p)
- break;
+ p = memchr(line, '-', len);
+ plen = len - (p - line);
+
for ( ; plen; p++, plen--)
- if ('-' != *p && ' ' != *p && 8 != *p)
+ if ('-' != *p && ' ' != *p)
break;
- if (0 == plen) {
- p = line;
- plen = len;
- }
/*
* Copy the rest of the line, but no more than 70 bytes.
*/
- if (70 < plen)
+ if (0 == plen) {
+ p = line;
+ plen = len;
+ } else if (70 < plen)
plen = 70;
- p[plen-1] = '\0';
+
buf_appendb(dbuf, p, plen);
+ buf_appendb(dbuf, "", 1);
+
buf->len = 0;
buf_appendb(buf, p, plen);
+ buf_appendb(buf, "", 1);
hash_put(hash, buf, TYPE_Nd);
}
next reply other threads:[~2011-12-07 14:57 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-07 14:56 Kristaps Dzonsons [this message]
2011-12-08 1:10 ` Ingo Schwarze
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EDF7EB4.7040906@bsd.lv \
--to=kristaps@bsd.lv \
--cc=tech@mdocml.bsd.lv \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).