source@mandoc.bsd.lv
 help / color / mirror / Atom feed
* mandoc: Support some escape sequences, in particular character escape
@ 2023-10-23 20:25 schwarze
  0 siblings, 0 replies; only message in thread
From: schwarze @ 2023-10-23 20:25 UTC (permalink / raw)
  To: source

Log Message:
-----------
Support some escape sequences, in particular character escape sequences,
inside \w arguments, and skip most other escape sequences when measuring
the output length in this way because most escape sequences contribute
little or nothing to text width: for example, consider font escapes in
terminal output.

This implementation is very rudimentary.  In particular, it assumes that
every character has the same width.  No attempt is made to detect 
double-width or zero-width Unicode characters or to take dependencies on
output devices or fonts into account.  These limitations are hard to
avoid because mandoc has to interpolate \w at the parsing stage when the
output device is not yet known.  I really do not want the content of the
syntax tree to depend on the output device.

Feature requested by Paul <Eggert at cs dot ucla dot edu>, who also
submitted a patch, but i chose to commit this very different patch
with almost the same functionality.
His input was still very valuable because complete support for \w is
out of the question, and consequently, the main task is identifying
subsets of the feature that are needed for real-world manual pages 
and can be supported without uprooting the whole forest.

Modified Files:
--------------
    mandoc:
        roff.7
        roff.c
    mandoc/regress/roff/esc:
        w.in
        w.out_ascii
        w.out_lint

Revision Data
-------------
Index: roff.7
===================================================================
RCS file: /home/cvs/mandoc/mandoc/roff.7,v
retrieving revision 1.120
retrieving revision 1.121
diff -Lroff.7 -Lroff.7 -u -p -r1.120 -r1.121
--- roff.7
+++ roff.7
@@ -1,6 +1,6 @@
 .\" $Id$
 .\"
-.\" Copyright (c) 2010-2019, 2022 Ingo Schwarze <schwarze@openbsd.org>
+.\" Copyright (c) 2010-2019, 2022-2023 Ingo Schwarze <schwarze@openbsd.org>
 .\" Copyright (c) 2010, 2011, 2012 Kristaps Dzonsons <kristaps@bsd.lv>
 .\"
 .\" Permission to use, copy, modify, and distribute this software for any
@@ -2224,7 +2224,8 @@ The
 .Xr mandoc 1
 implementation assumes that after expansion of user-defined strings, the
 .Ar string
-only contains normal characters, no escape sequences, and that each
+only contains normal characters, characters expressed as escape sequences,
+and zero-width escape sequences, and that each
 character has a width of 24 basic units.
 .It Ic \eX\(aq Ns Ar string Ns Ic \(aq
 Output
Index: roff.c
===================================================================
RCS file: /home/cvs/mandoc/mandoc/roff.c,v
retrieving revision 1.398
retrieving revision 1.399
diff -Lroff.c -Lroff.c -u -p -r1.398 -r1.399
--- roff.c
+++ roff.c
@@ -1,6 +1,6 @@
 /* $Id$ */
 /*
- * Copyright (c) 2010-2015, 2017-2022 Ingo Schwarze <schwarze@openbsd.org>
+ * Copyright (c) 2010-2015, 2017-2023 Ingo Schwarze <schwarze@openbsd.org>
  * Copyright (c) 2008-2012, 2014 Kristaps Dzonsons <kristaps@bsd.lv>
  *
  * Permission to use, copy, modify, and distribute this software for any
@@ -1362,6 +1362,7 @@ roff_expand(struct roff *r, struct buf *
 	const char	*res;		/* the string to be pasted */
 	const char	*src;		/* source for copying */
 	char		*dst;		/* destination for copying */
+	enum mandoc_esc	 subtype;	/* return value from roff_escape */
 	int		 iesc;		/* index of leading escape char */
 	int		 inam;		/* index of the escape name */
 	int		 iarg;		/* index beginning the argument */
@@ -1551,8 +1552,34 @@ roff_expand(struct roff *r, struct buf *
 			res = ubuf;
 			break;
 		case 'w':
-			(void)snprintf(ubuf, sizeof(ubuf),
-			    "%d", (iendarg - iarg) * 24);
+			rsz = 0;
+			subtype = ESCAPE_UNDEF;
+			while (iarg < iendarg) {
+				asz = subtype == ESCAPE_SKIPCHAR ? 0 : 1;
+				if (buf->buf[iarg] != '\\') {
+					rsz += asz;
+					iarg++;
+					continue;
+				}
+				switch ((subtype = roff_escape(buf->buf, 0,
+				    iarg, NULL, NULL, NULL, NULL, &iarg))) {
+				case ESCAPE_SPECIAL:
+				case ESCAPE_NUMBERED:
+				case ESCAPE_UNICODE:
+				case ESCAPE_OVERSTRIKE:
+				case ESCAPE_UNDEF:
+					break;
+				case ESCAPE_DEVICE:
+					asz *= 8;
+					break;
+				case ESCAPE_EXPAND:
+					abort();
+				default:
+					continue;
+				}
+				rsz += asz;
+			}
+			(void)snprintf(ubuf, sizeof(ubuf), "%d", rsz * 24);
 			res = ubuf;
 			break;
 		default:
Index: w.out_ascii
===================================================================
RCS file: /home/cvs/mandoc/mandoc/regress/roff/esc/w.out_ascii,v
retrieving revision 1.3
retrieving revision 1.4
diff -Lregress/roff/esc/w.out_ascii -Lregress/roff/esc/w.out_ascii -u -p -r1.3 -r1.4
--- regress/roff/esc/w.out_ascii
+++ regress/roff/esc/w.out_ascii
@@ -8,6 +8,13 @@ D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN
      character: 24
      blank: 24
      text: 96
+     special: 24
+     numbered: 24
+     Unicode: 24
+     overstrike: 24
+     undefined: 24
+     zero-width: 0
+     skipchar: 48
 
    A\bAr\brg\bgu\bum\bme\ben\bnt\bt d\bde\bel\bli\bim\bmi\bit\bte\ber\brs\bs
      unsupported \r: 24u
@@ -27,4 +34,4 @@ D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN
      overstrike: 24u
      unterminated: 72
 
-OpenBSD                          June 8, 2022                          OpenBSD
+OpenBSD                        October 23, 2023                        OpenBSD
Index: w.out_lint
===================================================================
RCS file: /home/cvs/mandoc/mandoc/regress/roff/esc/w.out_lint,v
retrieving revision 1.7
retrieving revision 1.8
diff -Lregress/roff/esc/w.out_lint -Lregress/roff/esc/w.out_lint -u -p -r1.7 -r1.8
--- regress/roff/esc/w.out_lint
+++ regress/roff/esc/w.out_lint
@@ -1,4 +1,5 @@
-mandoc: w.in:17:20: UNSUPP: unsupported escape sequence: \r
-mandoc: w.in:17:23: UNSUPP: unsupported escape sequence: \r
-mandoc: w.in:23:16: WARNING: undefined escape, printing literally: \G
-mandoc: w.in:51:15: ERROR: incomplete escape sequence: \w'foo
+mandoc: w.in:25:15: WARNING: undefined escape, printing literally: \G
+mandoc: w.in:31:20: UNSUPP: unsupported escape sequence: \r
+mandoc: w.in:31:23: UNSUPP: unsupported escape sequence: \r
+mandoc: w.in:37:16: WARNING: undefined escape, printing literally: \G
+mandoc: w.in:65:15: ERROR: incomplete escape sequence: \w'foo
Index: w.in
===================================================================
RCS file: /home/cvs/mandoc/mandoc/regress/roff/esc/w.in,v
retrieving revision 1.3
retrieving revision 1.4
diff -Lregress/roff/esc/w.in -Lregress/roff/esc/w.in -u -p -r1.3 -r1.4
--- regress/roff/esc/w.in
+++ regress/roff/esc/w.in
@@ -1,4 +1,4 @@
-.\" $OpenBSD: w.in,v 1.4 2022/06/08 13:08:00 schwarze Exp $
+.\" $OpenBSD: w.in,v 1.5 2023/10/23 20:07:19 schwarze Exp $
 .Dd $Mdocdate$
 .Dt ESC-W 1
 .Os
@@ -13,6 +13,20 @@ character: \w'n'
 blank: \w' '
 .br
 text: \w'text'
+.br
+special: \w'\(bu'
+.br
+numbered: \w'\N'100''
+.br
+Unicode: \w'\[u2013]'
+.br
+overstrike: \w'\o'ab''
+.br
+undefined: \w'\G'
+.br
+zero-width: \w'\fB\&\fP'
+.br
+skipchar: \w'a\zb\z\(buc'
 .Ss Argument delimiters
 unsupported \er: \w\rM\ru
 .br
--
 To unsubscribe send an email to source+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-10-23 20:25 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-23 20:25 mandoc: Support some escape sequences, in particular character escape schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).