tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Kristaps Dzonsons <kristaps@bsd.lv>
To: tech@mdocml.bsd.lv
Subject: Improve catman mandocdb(8) heuristics.
Date: Wed, 07 Dec 2011 15:56:52 +0100	[thread overview]
Message-ID: <4EDF7EB4.7040906@bsd.lv> (raw)

[-- Attachment #1: Type: text/plain, Size: 763 bytes --]

Hi,

Enclosed is a patch to de-backspace Nm/Nd lines for mandocdb(8).  This 
arose from seeing the results for some LAPACK manuals, which are 
notoriously shitty.  It also cleans up handling of the non-terminated 
string a bit and adds a quick check to see if SYNOPSIS has been reached 
right after the NAME.  This occurs when manuals look like this:

  NAME
  SYNOPSIS
    Blah blah blah

Again, LAPACK...

If it's relevant, a check for NAME could also occur, then loop back into 
the fgets().  Thoughts?

This area still needs a bit more attention to handle situations like:

  foo -[\n]?
  [whitespace]foo - bar[whitespace][\n]?

I need to check this over a bit more carefully to see if I'm not 
trampling past the array, but this is a start.

Best,

Kristaps

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 1954 bytes --]

Index: mandocdb.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandocdb.c,v
retrieving revision 1.25
diff -u -p -r1.25 mandocdb.c
--- mandocdb.c	7 Dec 2011 01:57:20 -0000	1.25
+++ mandocdb.c	7 Dec 2011 14:55:39 -0000
@@ -1301,31 +1301,69 @@ pformatted(DB *hash, struct buf *buf, st
 	}
 	fclose(stream);
 
+	/* 
+	 * Strip out backspace-encoding.
+	 * Also handle the bogus case where the backspace is malformed
+	 * at the beginning or end of the line.
+	 */
+
+	while (NULL != (p = memchr(line, '\b', len))) {
+		plen = p - line;
+		if (plen == --len)
+			continue;
+		if (plen > 0) {
+			memmove(p - 1, p + 1, len - plen);
+			len--;
+		} else
+			memmove(p, p + 1, len);
+	}
+
+	/*
+	 * Check if there's no name/description information.  This
+	 * happens with some manuals, e.g., LAPACK.  If not, reuse our
+	 * title.
+	 */
+
+	if (len > 0 && '\n' == line[len - 1]) {
+		line[--len] = '\0';
+		if (0 == strcmp(line, "SYNOPSIS")) {
+			buf_appendb(dbuf, buf->cp, buf->size);
+			hash_put(hash, buf, TYPE_Nd);
+			return;
+		}
+	} else if (0 == len) {
+		buf_appendb(dbuf, buf->cp, buf->size);
+		hash_put(hash, buf, TYPE_Nd);
+		return;
+	}
+
 	/*
 	 * If there is a dash, skip to the text following it.
 	 */
 
-	for (p = line, plen = len; plen; p++, plen--)
-		if ('-' == *p)
-			break;
+	p = memchr(line, '-', len);
+	plen = len - (p - line);
+
 	for ( ; plen; p++, plen--)
-		if ('-' != *p && ' ' != *p && 8 != *p)
+		if ('-' != *p && ' ' != *p)
 			break;
-	if (0 == plen) {
-		p = line;
-		plen = len;
-	}
 
 	/*
 	 * Copy the rest of the line, but no more than 70 bytes.
 	 */
 
-	if (70 < plen)
+	if (0 == plen) {
+		p = line;
+		plen = len;
+	} else if (70 < plen)
 		plen = 70;
-	p[plen-1] = '\0';
+
 	buf_appendb(dbuf, p, plen);
+	buf_appendb(dbuf, "", 1);
+
 	buf->len = 0;
 	buf_appendb(buf, p, plen);
+	buf_appendb(buf, "", 1);
 	hash_put(hash, buf, TYPE_Nd);
 }
 

             reply	other threads:[~2011-12-07 14:57 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-07 14:56 Kristaps Dzonsons [this message]
2011-12-08  1:10 ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EDF7EB4.7040906@bsd.lv \
    --to=kristaps@bsd.lv \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).