* roff_getstr() and input characters
@ 2010-07-06 23:08 Kristaps Dzonsons
[not found] ` <20100706231643.GC32413@bramka.kerhand.co.uk>
0 siblings, 1 reply; 2+ messages in thread
From: Kristaps Dzonsons @ 2010-07-06 23:08 UTC (permalink / raw)
To: tech, Jason McIntyre
[-- Attachment #1: Type: text/plain, Size: 1165 bytes --]
Hi,
(Jason, the bits I'd like you to weigh in on are a few paragraphs down.)
Enclosed is a patch pushing the roff_getstr functionality directly into
libmdoc. It works by testing against roff_getstr() in-band and splicing
together a new buffer if necessary.
I thought about putting the entire mandoc_special() check in libroff,
but don't want to cause yet another scan over the line buffer.
check_text() needs to warn against '\t' and '\b' anyway. This is an
open question I'll answer later when I start looking at performance.
The reason I want to air it with you (I know it works: I've tested it
across all manuals) is because it also removes the check for isprint(),
using strcspn() instead. As you can see, the rej filter is only for
'\b', which we must prohibit else we boff output encoding; '\t' for
non-literals (warning); and '\\' for the specials check.
I argue for lifting the ASCII-constraint because (1) there's nothing in
mdoc/groff/etc that disallows non-ASCII (e.g., Latin-1) characters and
(2) it makes the code much cleaner.
Thoughts?
Kristaps
PS, the patch doesn't mandate '\b': I just caught that now and will fix
it later.
[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 7005 bytes --]
? DONTDELETE.c
? config.h
? config.log
? foo.1
? foo.1.html
? mandoc
? mandoc.core
? mdoc.7.pdf
? patch.txt
? ssh.1.html
? user.8
? regress/mandoc.core
? regress/output
Index: libmandoc.h
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/libmandoc.h,v
retrieving revision 1.8
diff -u -r1.8 libmandoc.h
--- libmandoc.h 19 Jun 2010 20:46:27 -0000 1.8
+++ libmandoc.h 6 Jul 2010 23:06:02 -0000
@@ -19,7 +19,7 @@
__BEGIN_DECLS
-int mandoc_special(char *);
+int mandoc_special(char *, char **, size_t *);
void *mandoc_calloc(size_t, size_t);
char *mandoc_strdup(const char *);
void *mandoc_malloc(size_t);
Index: man_validate.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/man_validate.c,v
retrieving revision 1.45
diff -u -r1.45 man_validate.c
--- man_validate.c 28 Jun 2010 14:39:17 -0000 1.45
+++ man_validate.c 6 Jul 2010 23:06:02 -0000
@@ -204,14 +204,15 @@
static int
check_text(CHKARGS)
{
- char *p;
+ char *p, *spec;
+ size_t specsz;
int pos, c;
assert(n->string);
for (p = n->string, pos = n->pos + 1; *p; p++, pos++) {
if ('\\' == *p) {
- c = mandoc_special(p);
+ c = mandoc_special(p, &spec, &specsz);
if (c) {
p += c - 1;
pos += c - 1;
Index: mandoc.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc.c,v
retrieving revision 1.21
diff -u -r1.21 mandoc.c
--- mandoc.c 6 Jul 2010 22:04:31 -0000 1.21
+++ mandoc.c 6 Jul 2010 23:06:02 -0000
@@ -1,4 +1,4 @@
-/* $Id: mandoc.c,v 1.21 2010/07/06 22:04:31 kristaps Exp $ */
+/* $Id: libmandoc.c,v 1.1 2010/07/05 20:00:55 kristaps Exp $ */
/*
* Copyright (c) 2008, 2009 Kristaps Dzonsons <kristaps@bsd.lv>
*
@@ -52,7 +52,7 @@
int
-mandoc_special(char *p)
+mandoc_special(char *p, char **v, size_t *vsz)
{
int terminator; /* Terminator for \s. */
int lim; /* Limit for N in \s. */
@@ -60,6 +60,8 @@
char *sv;
sv = p;
+ *v = NULL;
+ *vsz = 0;
if ('\\' != *p++)
return(spec_norm(sv, 0));
@@ -181,8 +183,12 @@
case ('*'):
if ('\0' == *++p || isspace((u_char)*p))
return(spec_norm(sv, 0));
+ *v = p + 1;
switch (*p) {
case ('('):
+ *vsz = 2;
+ if ('\0' == *++p || isspace((u_char)*p))
+ return(spec_norm(sv, 0));
if ('\0' == *++p || isspace((u_char)*p))
return(spec_norm(sv, 0));
return(spec_norm(sv, 4));
@@ -190,10 +196,12 @@
for (c = 3, p++; *p && ']' != *p; p++, c++)
if (isspace((u_char)*p))
break;
+ *vsz = (size_t)c - 3;
return(spec_norm(sv, *p == ']' ? c : 0));
default:
break;
}
+ *vsz = 1;
return(spec_norm(sv, 3));
case ('('):
if ('\0' == *++p || isspace((u_char)*p))
Index: mdoc_validate.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc_validate.c,v
retrieving revision 1.109
diff -u -r1.109 mdoc_validate.c
--- mdoc_validate.c 4 Jul 2010 21:59:30 -0000 1.109
+++ mdoc_validate.c 6 Jul 2010 23:06:02 -0000
@@ -47,7 +47,7 @@
static int check_parent(PRE_ARGS, enum mdoct, enum mdoc_type);
static int check_stdarg(PRE_ARGS);
-static int check_text(struct mdoc *, int, int, char *);
+static int check_text(struct mdoc *, int, int, char **);
static int check_argv(struct mdoc *,
struct mdoc_node *, struct mdoc_argv *);
static int check_args(struct mdoc *, struct mdoc_node *);
@@ -275,13 +275,11 @@
{
v_pre *p;
int line, pos;
- char *tp;
if (MDOC_TEXT == n->type) {
- tp = n->string;
line = n->line;
pos = n->pos;
- return(check_text(mdoc, line, pos, tp));
+ return(check_text(mdoc, line, pos, &n->string));
}
if ( ! check_args(mdoc, n))
@@ -439,7 +437,7 @@
int i;
for (i = 0; i < (int)v->sz; i++)
- if ( ! check_text(m, v->line, v->pos, v->value[i]))
+ if ( ! check_text(m, v->line, v->pos, &v->value[i]))
return(0);
if (MDOC_Std == v->arg) {
@@ -454,43 +452,95 @@
static int
-check_text(struct mdoc *mdoc, int line, int pos, char *p)
+check_text(struct mdoc *m, int ln, int pos, char **pp)
{
int c;
+ size_t sz, specsz, cpsz;
+ char *p, *spec, *cp;
+ const char *res;
+
+ for (p = *pp; *p; p++, pos++) {
+ sz = strcspn(p, "\t\b\\");
+
+ p += (int)sz;
+
+ if ('\0' == *p)
+ break;
+
+ pos += (int)sz;
+
+ /*
+ * Filter backspace (not allowed, as it will screw up
+ * our output formatting) and tabs, which are only
+ * suggested in literal contexts. Also halt at escapes
+ * so we can check that they're acceptable.
+ */
+
+ switch (*p) {
+ case ('\t'):
+ if (MDOC_LITERAL & m->flags)
+ continue;
+ /* FALLTHROUGH */
+ case ('\b'):
+ if (mdoc_pmsg(m, ln, pos, MANDOCERR_BADCHAR))
+ continue;
+ return(0);
+ default:
+ break;
+ }
+
+ /* Check the special character. */
- /*
- * FIXME: we absolutely cannot let \b get through or it will
- * destroy some assumptions in terms of format.
- */
-
- for ( ; *p; p++, pos++) {
- if ('\t' == *p) {
- if ( ! (MDOC_LITERAL & mdoc->flags))
- if ( ! mdoc_pmsg(mdoc, line, pos, MANDOCERR_BADCHAR))
- return(0);
- } else if ( ! isprint((u_char)*p) && ASCII_HYPH != *p)
- if ( ! mdoc_pmsg(mdoc, line, pos, MANDOCERR_BADCHAR))
+ c = mandoc_special(p, &spec, &specsz);
+
+ if (0 == c) {
+ c = mdoc_pmsg(m, ln, pos, MANDOCERR_BADESCAPE);
+ if ( ! (MDOC_IGN_ESCAPE & m->pflags) && ! c)
return(0);
+ continue;
+ }
- if ('\\' != *p)
+ if (NULL == spec) {
+ p += c - 1;
+ pos += c - 1;
continue;
+ }
+
+ /* Reserved word. Was it defined using `ds'? */
- c = mandoc_special(p);
- if (c) {
+ if (NULL == (res = roff_getstrn(spec, specsz))) {
+ c = mdoc_pmsg(m, ln, pos, MANDOCERR_BADESCAPE);
+ if ( ! (MDOC_IGN_ESCAPE & m->pflags) && ! c)
+ return(0);
p += c - 1;
pos += c - 1;
continue;
}
- c = mdoc_pmsg(mdoc, line, pos, MANDOCERR_BADESCAPE);
- if ( ! (MDOC_IGN_ESCAPE & mdoc->pflags) && ! c)
- return(c);
+ /* Replace the roff-defined string with our own. */
+
+ cpsz = strlen(res) + strlen(*pp) + 1;
+ cp = mandoc_malloc(cpsz);
+ *cp = '\0';
+
+ /* Force only p - *pp + '\0' chars. */
+ strlcat(cp, *pp, (size_t)(p - *pp + 1));
+ strlcat(cp, res, cpsz);
+ strlcat(cp, p + c + 1, cpsz);
+
+ cpsz = (size_t)(p - *pp);
+
+ free(*pp);
+ *pp = cp;
+
+ /* Remember to readjust our position. */
+
+ p = *pp + (int)cpsz - 1;
+ pos = (int)cpsz - 1;
}
return(1);
}
-
-
static int
Index: term.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/term.c,v
retrieving revision 1.159
diff -u -r1.159 term.c
--- term.c 4 Jul 2010 22:04:04 -0000 1.159
+++ term.c 6 Jul 2010 23:06:02 -0000
@@ -379,11 +379,6 @@
size_t sz;
rhs = chars_a2res(p->symtab, word, len, &sz);
- if (NULL == rhs) {
- rhs = roff_getstrn(word, len);
- if (rhs)
- sz = strlen(rhs);
- }
if (rhs)
encode(p, rhs, sz);
}
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: roff_getstr() and input characters
[not found] ` <20100706231643.GC32413@bramka.kerhand.co.uk>
@ 2010-07-07 8:49 ` Kristaps Dzonsons
0 siblings, 0 replies; 2+ messages in thread
From: Kristaps Dzonsons @ 2010-07-07 8:49 UTC (permalink / raw)
To: Jason McIntyre; +Cc: tech
>> The reason I want to air it with you (I know it works: I've tested it
>> across all manuals) is because it also removes the check for isprint(),
>> using strcspn() instead. As you can see, the rej filter is only for
>> '\b', which we must prohibit else we boff output encoding; '\t' for
>> non-literals (warning); and '\\' for the specials check.
>>
>> I argue for lifting the ASCII-constraint because (1) there's nothing in
>> mdoc/groff/etc that disallows non-ASCII (e.g., Latin-1) characters and
>> (2) it makes the code much cleaner.
>>
>> Thoughts?
>>
>
> i don;t really know what you mean, to be honest. you'll have to dumb
> down your question a bit, i'm afraid...
Jason, right now, mandoc spits out a warning for any non-printable ASCII
character.
This patch lifts this restriction, instead warning only about tabs and
the "backspace" character.
We'd spoken about this before, but seeing it in action, I'm no longer
sure. The killer points are that -Tps will throw away all non-ASCII
characters as it can't calculate their glyph widths, and -Thtml
stipulates UTF-8 encoding, so anything but UTF-8 input will be gobbledygock.
In effect, once one uses a non-ASCII encoding, the rendered output will
be irregular across output modes and, more importantly, user environment
(terminals, etc.). This is, in my opinion, a Bad Thing (tm).
Thoughts?
Kristaps
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-07-07 8:49 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-06 23:08 roff_getstr() and input characters Kristaps Dzonsons
[not found] ` <20100706231643.GC32413@bramka.kerhand.co.uk>
2010-07-07 8:49 ` Kristaps Dzonsons
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).