* [TUHS] Trying to date "A Supplemental Document For Awk" @ 2023-06-28 6:26 Aharon Robbins 2023-06-28 6:45 ` [TUHS] " arnold ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Aharon Robbins @ 2023-06-28 6:26 UTC (permalink / raw) To: tuhs [-- Attachment #1: Type: text/plain, Size: 558 bytes --] Hi All. Attached is "A Supplemental Document For Awk". This circulated on USENET in the 80s. My copy is dated January 18, 1989, but I'm sure it's older than that. One clue is the reference to the 4.2 BSD manual, and 4.3 came out already in 1986 or so. Does anyone else have a copy of this with perhaps an older date? As far as I can tell from a short search, the author is no longer living. If someone knows better and can provide contact info for him, that'd be great. In the meantime, Warren, do you want to add it to the archives? Thanks! Arnold [-- Attachment #2: awkdoc --] [-- Type: text/plain, Size: 19193 bytes --] .RP .TL .B A Supplemental Document For AWK .sp .R - or - .sp .I Things Al, Pete, And Brian Didn't Mention Much .R .AU John W. Pierce .AI Department of Chemistry University of California, San Diego La Jolla, California 92093 jwp%chem@sdcsvax.ucsd.edu .AB As .B awk and its documentation are distributed with .I 4.2 BSD UNIX* .R there are a number of bugs, undocumented features, and features that are touched on so briefly in the documentation that the casual user may not realize their full significance. While this document applies primarily to the \fI4.2 BSD\fR version of \fIUNIX\fR, it is known that the \fI4.3 BSD\fR version does not have all of the bugs fixed, and that it does not have updated documentation. The situation with respect to the versions of \fBawk\fR disitributed with other versions \fIUNIX\fR and similar systems is unknown to the author. .FS *UNIX is a trademark of AT&T .FE .AE .LP In this document references to "the user manual" mean .I Awk - A Pattern Scanning and Processing Language (Second Edition) .R by Aho, Kernighan, and Weinberger. References to "awk(1)" mean the entry for .B awk in the .I UNIX Programmer's Manual, 4th Berkeley Distribution. .R References to "the documentation" mean both of those. .LP In most examples, the outermost set of braces ('{ }') have been ommitted. They would, of course, be necessary in real scripts. .NH Known Bugs .LP There are three main bugs known to me. They involve: .IP Assignment to input fields. .IP Piping output to a program from within an \fBawk\fR script. .IP Using '*' in \fIprintf\fR field width and precision specifications. .NH 2 Assignment to Input Fields .LP [This problem is partially fixed in \fI4.3BSD\fR; see the last paragraph of this section regarding the unfixed portion.] .LP The user manual states that input fields may be objects of assignment statements. Given the input line .DS field_one field_two field_three .DE the script .DS $2 = "new_field_2" print $0 .DE should print .DS field_one new_field_2 field_three .DE .LP This does not work; it will print .DS field_one field_two field_three .DE That is, the script will behave as if the assignment to $2 had not been made. However, explicitly referencing an "assigned to" field .I does recognize that the assignment has been made. If the script .DS $2 = "new_field_2" print $1, $2, $3 .DE is given the same input it will [properly] print .DS field_one new_field_2 field_three .DE Therefore, you can get around this bug with, e.g., .DS $2 = "new_field_2" output = $1 # Concatenate output fields for(i = 2; i <= NF; ++i) # into a single output line output = output OFS $i # with OFS between fields print output .DE .LP In \fI4.3BSD\fR, this bug has been fixed to the extent that the failing example above works correctly. However, a script like .DS $2 = "new_field_2" var = $0 print var .DE still gives incorrect output. This problem can be bypassed by using .DS \fIvar\fR = sprintf("%s", $0) .DE instead of "\fIvar\fR = $0"; \fIvar\fR will have the correct value. .NH 2 Piping Output to a Program .LP [This problem appears to have been fixed in \fI4.3BSD\fR, but that has not been exhaustively tested.] .LP The user manual states that .I print and .I printf statements may write to a program using, e.g., .DS print | "\fIcommand\fR" .DE This would pipe the output into \fIcommand\fR, and it does work. However, you should be aware that this causes .B awk to spawn a child process (\fIcommand\fR), and that it .I does not .R wait for the child to exit before it exits itself. In the case of a "slow" command like .B sort, .B awk may exit before .I command has finished. .LP This can cause problems in, for example, a shell script that depends on everything done by .B awk being finished before the next shell command is executed. Consider the shell script .DS awk -f awk_script input_file mv sorted_output somewhere_else .DE and the .B awk script .DS print output_line | "sort -o sorted_output" .DE If .I input_file is large .B awk will exit long before .B sort is finished. That means that the .B mv command will be executed before .B sort is finished, and the result is unlikely to be what you wanted. Other than fixing the source, there is no way to avoid this problem except to handle such pipes outside of the awk script, e.g. .DS awk -f awk_file input_file | sort -o sorted_output mv sorted_output somewhere_else .DE which is not wholly satisfactory. .LP See .I Sketchily Documented Features .R below for other considerations in redirecting output from within an .B awk script. .NH 2 Printf Field Width and Precision Specification With '*' .LP The document says that the \fIprintf\fR function provided is identical to the \fIprintf\fR provided by the \fIC\fR language \fBstdio\fR package. This is not true for the case of using '*' to specify a field width or precision. The command .DS printf("%*.s", len, string) .DE will cause a core dump. Given \fBawk\fR's age, it is likely that its \fIprintf\fR was written well before the use of '*' for specifying field width and precision appeared in the \fBstdio\fR library's \fIprintf\fR. Another possibility is that it wasn't implemented because it isn't really needed to achieve the same effect. .LP To accomplish this effect, you can utilize the fact that \fBawk\fR concatenates variables before it does any other processing on them. For example, assume a script has two variables \fIwid\fR and \fIprec\fR which control the width and precision used for printing another variable \fIval\fI: .DS [code to set "wid", "prec", and "val"] printf("%" wid "." prec "d\en", val) .DE If, for example, \fIwid\fR is 8 and \fIprec\fR is 3, then /fBawk\fR will concatenate everything to the left of the comma in the \fIprintf\fR statement, and the statement will really be .DS printf(%8.3d\en, val) .DE These could, of course, been assigned to some variable \fIfmt\fR before being used: .DS fmt = "%" wid "." prec "d" printf(fmt "\en", val) .DE Note, however, that the newline ("\en") in the second form \fIcannot\fR be included in the assignment to \fIfmt\fR. .bp .NH Undocumented Features .LP There are several undocumented features: .IP Variable values may be established on the command line. .IP A .B getline function exists that reads the next input line and starts processing it immediately. .IP Regular expressions accept octal representations of characters. .IP A .B -d flag argument produces debugging output if .B awk was compiled with "DEBUG" defined. .IP Scripts may be "compiled" and run later (providing the installer did what is necessary to make this work). .NH 2 Defining Variables On The Command Line .LP To pass variable values into a script at run time, you may use .IP .I variable=value .LP (as many as you like) between any "\fB-f \fIscriptname\fR" or .I program and the names of any files to be processed. For example, .DS awk -f awkscript today=\e"`date`\e" infile .DE would establish for .I awkscript a variable named .B today that had as its value the output of the .B date command. .LP There are a number of caveats: .IP Such assignments may appear only between .B -f .I awkscript (or \fIprogram\fR or [see below] \fB-R\fIawk.out\fR) and the name of any input file (or '-'). .IP Each .I variable=value combination must be a single argument (i.e. there must not be spaces around the '=' sign); .I value may be either a numeric value or a string. If it is a string, it must be enclosed in double quotes at the time \fBawk\fR reads the argument. That means that the double quotes enclosing \fIvalue\fR on the command line must be protected from the shell as in the example above or it will remove them. .IP .I Variable is not available for use within the script until after the first record has been read and parsed, but it is available as soon as that has occurred so that it may be used before any other processing begins. It does not exist at the time the .B BEGIN block is executed, and if there was no input it will not exist in the .B END block (if any). .NH 2 Getline Function .LP .B Getline immediately reads the next input line (which is parsed into \fI$1\fR, \fI$2\fR, etc) and starts processing it at the location of the call (as opposed to .B next which immediately reads the next input line but starts processing from the start of the script). .LP .B Getline facilitates performing some types of tasks such as processing files with multiline records and merging information from several files. To use the latter as an example, consider a case where two files, whose lines do not share a common format, must be processed together. Shell and \fBawk\fR scripts to do this might look something like .sp In the shell script .DS ( echo DATA1; cat datafile1; echo ENDdata1 \e echo DATA2; cat datafile2; echo ENDdata2 \e ) | \e awk -f awkscript - > awk_output_file .DE In the .B awk script .DS /^DATA1/ { # Next input line starts datafile1 while (getline && $1 !~ /^ENDdata1$/) { [processing for \fIdata1\fR lines] } } .sp 1 /^DATA2/ { # Next input line starts datafile2 while (getline && $1 !~ /^ENDdata2$/) { [processing for \fIdata2\fR lines] } } .DE There are, of course, other ways of accomplishing this particular task (primarily using \fBsed\fR to preprocess the information), but they are generally more difficult to write and more subject to logic errors. Many cases arising in practice are significantly more difficult, if not impossible, to handle without \fBgetline\fR. .NH 2 Regular Expressions .LP The sequence "\fI\eddd\fR" (where 'd' is a digit) may be used to include explicit octal values in regular expressions. This is often useful if "nonprinting" characters have been used as "markers" in a file. It has not been tested for ASCII values outside the range 01 through 0127. .NH 2 Debugging output .LP [This is unlikely to be of interest to the casual user.] .sp If \fBawk\fR was compiled with "DEBUG" defined, then giving it a .B -d flag argument will cause it to produce debugging output when it is run. This is sometimes useful in finding obscure problems in scripts, though it is primarily intended for tracking down problems with \fBawk\fR itself. .NH 2 Script "Compilation" .LP [It is likely that this does not work at most sites. If it does not, the following will probably not be of interest to the casual user.] .sp The command .DS awk -S -f script.awk .DE produces a file named .B awk.out. This is a core image of .B awk after parsing the file .I script.awk. The command .DS awk -Rawk.out datafile .DE causes .B awk.out to be applied to \fIdatafile\fR (or the standard input if no input file is given). This avoids having to reparse large scripts each time they are used. Unfortunately, the way this is implemented requires some special action on the part of the person installing \fBawk\fR. .LP As \fBawk\fR is delivered with \fI4.2 BSD\fR (and \fI4.3 BSD\fR), .I awk.out is created by the \fBawk -S ...\fR process by calling .B sbrk() with '0', writing out the returned value, then writing out the core image from location 0 to the returned address. The \fBawk -R...\fR process reads the first word of .I awk.out to get the length of the image, calls .B brk() with that length, and then reads the image into itself starting at location 0. For this to work, \fBawk\fR must have been loaded with its text segment writeable. Unfortunately, the \fIBSD\fR default for \fBld\fR is to load with the text read-only and shareable. Thus, the installer must remember to take special action (e.g. "cc -N ..." [equivalently "ld -N ..."] for \fI4BSD\fR) if these flags are to work. .LP [Personally, I don't think it is a very good idea to give \fBawk\fR the opportunity to write on its text segment; I changed it so that only the data segment is overwritten.] .LP Also, due to what appears to be a lapse in logic, the first non-flag argument following \fB-R\fIawk.out\fR is discarded. [Disliking that behavior, the I changed it so that the \fB-R\fR flag is treated like the \fB-f\fR flag: no flag arguments may follow it.] .bp .NH Sketchily Documented Features .LP .NH 2 Exit .LP The user manual says that using the .B exit function causes the script to behave as if end-of-input has been reached. Not menitoned explicitly is the fact that this will cause the .B END block to be executed if it exists. Also, two things are ommitted: .IP \fBexit(\fIexpr\fB)\fR causes the script's exit status to be set to the value of \fIexpr\fR. .IP If .B exit is called within the .B END block, the script exits immediately. .NH 2 Mathematical Functions .LP The following builtin functions exist and are mentioned in .I awk(1) but not in the user manual. .IP \fBint(\fIx\fB)\fR 10 \fIx\fR trunctated to an integer. .IP \fBsqrt(\fIx\fB)\fR 10 the square root of \fIx\fR for \fIx\fR >= 0, otherwise zero. .IP \fBexp(\fIx\fB)\fR 10 \fBe\fR-to-the-\fIx\fR for -88 <= \fIx\fR <= 88, zero for \fIx\fR < -88, and dumps core for \fIx\fR > 88. .IP \fBlog(\fIx\fB)\fR 10 the natural log of \fIx\fR. .NH 2 OFMT Variable .LP The variable .B OFMT may be set to, e.g. "%.2f", and purely numerical output will be bound by that restriction in .B print statements. The default value is "%.6g". Again, this is mentioned in .I awk(1) but not in the user manual. .NH 2 Array Elements .LP The user manual states that "Array elements ... spring into existence by being mentioned." This is literally true; .I any reference to an array element causes it to exist. ("I was thought about, therefore I am.") Take, for example, .DS if(array[$1] == "blah") { [process blah lines] } .DE If there is not an existing element of .B array whose subscript is the same as the contents of the current line's first field, .I one is created .R and its value (null, of course) is then compared with "blah". This can be a bit disconcerting, particularly when later processing is using .DS for (i in \fBarray\fR) { [do something with result of processing "blah" lines] } .DE to walk the array and expects all the elements to be non-null. Succinct practical examples are difficult to construct, but when this happens in a 500 line script it can be difficult to determine what has gone wrong. .NH 2 FS and Input Fields .LP By default any number of spaces or tabs can separate fields (i.e. there are no null input fields) and trailing spaces and tabs are ignored. However, if .B FS is explicitly set to any character other than a space (e.g., a tab: \fBFS = "\et"\fR), then a field is defined by each such character and trailing field separator characters are not ignored. For example, if '>' represents a tab then .DS one>>three>>five> .DE defines six fields, with fields two, four, and six being empty. .LP If .B FS is explicitly set to a space (\fBFS\fR = "\ "), then the default behavior obtains (this may be a bug); that is, both spaces and tabs are taken as field separators, there can be no null input fields, and trailing spaces and tabs are ignored. .NH 2 RS and Input Records .LP If .B RS is explicitly set to the null string (\fBRS\fR = ""), then the input record separator becomes a blank line, and the newlines at the end of input lines is a field separator. This facilitates handling multiline records. .NH 2 "Fall Through" .LP This is mentioned in the user manual, but it is important enough that it is worth pointing out here, also. .LP In the script .DS /\fIpattern_1\fR/ { [do something] } .sp /\fIpattern_2\fR/ { [do something] } .DE all input lines will be compared with both .I pattern_1 and .I pattern_2 unless the .B next function is used before the closing '}' in the .I pattern_1 portion. .NH 2 Output Redirection .LP Once a file (or pipe) is opened by .B awk it is not closed until .B awk exits. This can occassionally cause problems. For example, it means that a script that sorts its input lines into output files named by the contents of their first fields (similar to an example in the user manual) .DS { print $0 > $1 } .DE is going to fail if the number of different first fields exceeds about 10. This problem .I cannot be avoided by using something like .DS { command = "cat >> " $1 print $0 | command } .DE as the value of the variable .B command is different for each different value of .I $1 and is therefore treated as a different output "file". .LP [I have not been able to create a truly satisfactory fix for this that doesn't involve having \fBawk\fR treat output redirection to pipes differently from output to files; I would greatly appreciate hearing of one.] .NH 2 Field and Variable Types, Values, and Comparisons .LP The following is a synopsis of notes included with \fBawk\fR's source code. .NH 3 Types .LP Variables and fields can be strings or numbers or both. .NH 4 Variable Types .LP When a variable is set by the assignment .DS \fIvar\fR = \fIexpr\fR .DE its type is set to the type of .I expr (this includes +=, ++, etc). An arithmetic expression is of type .I number, a concatenation is of type .I string, etc. If the assignment is a simple copy, e.g. .DS \fIvar1\fR = \fIvar2\fR .DE then the type of .I var1 becomes that of .I var2. .LP Type is determined by context; rarely, but always very inconveniently, this context-determined type is incorrect. As mentioned in .I awk(1) the type of an expression can be coerced to that desired. E.g. .DS { \fIexpr1\fR + 0 .sp 1 \fIexpr2\fR "" # Concatenate with a null string } .DE coerces .I expr1 to numeric type and .I expr2 to string type. .NH 4 Field Types .LP As with variables, the type of a field is determined by context when possible, e.g. .RS .IP $1++ 8 clearly implies that \fI$1\fR is to be numeric, and .IP $1\ =\ $1\ ","\ $2 16 implies that $1 and $2 are both to be strings. .RE .LP Coercion is done as needed. In contexts where types cannot be reliably determined, e.g., .DS if($1 == $2) ... .DE the type of each field is determined on input by inspection. All fields are strings; in addition, each field that contains only a number is also considered numeric. Thus, the test .DS if($1 == $2) ... .DE will succeed on the inputs .DS 0 0.0 100 1e2 +100 100 1e-3 1e-3 .DE and fail on the inputs .DS (null) 0 (null) 0.0 2E-518 6E-427 .DE "only a number" in this case means matching the regular expression .DS ^[+-]?[0-9]*\e.?[0-9]+(e[+-]?[0-9]+)?$ .DE .NH 3 Values .LP Uninitialized variables have the numeric value 0 and the string value "". Therefore, if \fIx\fR is uninitialized, .DS if(x) ... if (x == "0") ... .DE are false, and .DS if(!x) ... if(x == 0) ... if(x == "") ... .DE are true. .LP Fields which are explicitly null have the string value "", and are not numeric. Non-existent fields (i.e., fields past \fBNF\fR) are also treated this way. .NH 3 Types of Comparisons .LP If both operands are numeric, the comparison is made numerically. Otherwise, operands are coerced to type string if necessary, and the comparison is made on strings. .NH 3 Array Elements .LP Array elements created by .B split are treated in the same way as fields. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-28 6:26 [TUHS] Trying to date "A Supplemental Document For Awk" Aharon Robbins @ 2023-06-28 6:45 ` arnold 2023-06-28 17:48 ` Adam Sampson 2023-06-29 0:26 ` Jeremy C. Reed 2 siblings, 0 replies; 20+ messages in thread From: arnold @ 2023-06-28 6:45 UTC (permalink / raw) To: tuhs, arnold Hmmm, skimming the file for the first time in a long time, I see that he references 4.3 BSD as well. Clearly, this document evolved over time. I would still be interested in earlier versions if anyone has. Thanks, Arnold Aharon Robbins <arnold@skeeve.com> wrote: > Hi All. > > Attached is "A Supplemental Document For Awk". This circulated on USENET > in the 80s. My copy is dated January 18, 1989, but I'm sure it's > older than that. One clue is the reference to the 4.2 BSD manual, > and 4.3 came out already in 1986 or so. > > Does anyone else have a copy of this with perhaps an older date? > > As far as I can tell from a short search, the author is no > longer living. If someone knows better and can provide contact > info for him, that'd be great. > > In the meantime, Warren, do you want to add it to the archives? > > Thanks! > > Arnold ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-28 6:26 [TUHS] Trying to date "A Supplemental Document For Awk" Aharon Robbins 2023-06-28 6:45 ` [TUHS] " arnold @ 2023-06-28 17:48 ` Adam Sampson 2023-06-28 18:03 ` KenUnix 2023-06-29 0:26 ` Jeremy C. Reed 2 siblings, 1 reply; 20+ messages in thread From: Adam Sampson @ 2023-06-28 17:48 UTC (permalink / raw) To: tuhs [-- Attachment #1: Type: text/plain, Size: 1169 bytes --] On Wed, Jun 28, 2023 at 09:26:02AM +0300, Aharon Robbins wrote: > Attached is "A Supplemental Document For Awk". This circulated on > USENET in the 80s. My copy is dated January 18, 1989, but I'm sure > it's older than that. In the utzoo Usenet archive, there are two versions of this document and a few mentions of it... John Pierce posted to comp.unix.questions on 1989-04-02, saying he'd written it "four or five years ago". Stu Heiss, in comp.unix.questions on 1989-03-06, said it was "posted to net.sources 18 Jun 86 with message-id 238@sdchema.sdchem.uucp". Unfortunately this isn't in the utzoo archive or the net.sources.mbox in archive.org's Usenet Historical Collection. A copy identical to yours was posted by Jim Harkins to comp.unix.questions on 1990-03-29. There's a later version, fixing a typo and some formatting and adding a mention of \f and \b in printf, which was posted by Brian Kantor to comp.doc on 1987-10-11 -- I've attached this. The same file (with two .bps commented out) was reposted in comp.unix.questions on 1989-11-16 by Francois-Michel Lang. Thanks, -- Adam Sampson <ats@offog.org> <http://offog.org/> [-- Attachment #2: supplemental-19871011-brian --] [-- Type: text/plain, Size: 20115 bytes --] Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!esosun!ucsdhub!sdcsvax!brian From: brian@sdcsvax.UCSD.EDU (Brian Kantor) Newsgroups: comp.doc Subject: AWK supplementary document - troff with 'ms' macros Message-ID: <4070@sdcsvax.UCSD.EDU> Date: Sun, 11-Oct-87 02:40:02 EDT Article-I.D.: sdcsvax.4070 Posted: Sun Oct 11 02:40:02 1987 Date-Received: Mon, 12-Oct-87 21:20:14 EDT Sender: root@sdcsvax.UCSD.EDU Organization: UCSD wombat breeding society Lines: 745 Approved: brian@cyberpunk.ucsd.edu .RP .TL .B A Supplemental Document For AWK .sp .R - or - .sp .I Things Al, Pete, And Brian Didn't Mention Much .R .AU John W. Pierce .AI Department of Chemistry University of California, San Diego La Jolla, California 92093 jwp%chem@sdcsvax.ucsd.edu .AB As .B awk and its documentation are distributed with .I 4.2 BSD UNIX* .R there are a number of bugs, undocumented features, and features that are touched on so briefly in the documentation that the casual user may not realize their full significance. While this document applies primarily to the \fI4.2 BSD\fR version of \fIUNIX\fR, it is known that the \fI4.3 BSD\fR version does not have all of the bugs fixed, and that it does not have updated documentation. The situation with respect to the versions of \fBawk\fR distributed with other versions \fIUNIX\fR and similar systems is unknown to the author. .FS *UNIX is a trademark of AT&T .FE .AE .LP In this document references to "the user manual" mean .I Awk - A Pattern Scanning and Processing Language (Second Edition) .R by Aho, Kernighan, and Weinberger. References to "awk(1)" mean the entry for .B awk in the .I UNIX Programmer's Manual, 4th Berkeley Distribution. .R References to "the documentation" mean both of those. .LP In most examples, the outermost set of braces ('{ }') have been ommitted. They would, of course, be necessary in real scripts. .NH Known Bugs .LP There are three main bugs known to me. They involve: .IP Assignment to input fields. .IP Piping output to a program from within an \fBawk\fR script. .IP Using '*' in \fIprintf\fR field width and precision specifications does not work, nor do '\\f' and '\\b' print formfeed and backspace respectively. .NH 2 Assignment to Input Fields .LP [This problem is partially fixed in \fI4.3BSD\fR; see the last paragraph of this section regarding the unfixed portion.] .LP The user manual states that input fields may be objects of assignment statements. Given the input line .DS field_one field_two field_three .DE the script .DS $2 = "new_field_2" print $0 .DE should print .DS field_one new_field_2 field_three .DE .LP This does not work; it will print .DS field_one field_two field_three .DE That is, the script will behave as if the assignment to $2 had not been made. However, explicitly referencing an "assigned to" field .I does recognize that the assignment has been made. If the script .DS $2 = "new_field_2" print $1, $2, $3 .DE is given the same input it will [properly] print .DS field_one new_field_2 field_three .DE Therefore, you can get around this bug with, e.g., .DS $2 = "new_field_2" output = $1 # Concatenate output fields for(i = 2; i <= NF; ++i) # into a single output line output = output OFS $i # with OFS between fields print output .DE .LP In \fI4.3BSD\fR, this bug has been fixed to the extent that the failing example above works correctly. However, a script like .DS $2 = "new_field_2" var = $0 print var .DE still gives incorrect output. This problem can be bypassed by using .DS \fIvar\fR = sprintf("%s", $0) .DE instead of "\fIvar\fR = $0"; \fIvar\fR will have the correct value. .NH 2 Piping Output to a Program .LP [This problem appears to have been fixed in \fI4.3BSD\fR, but that has not been exhaustively tested.] .LP The user manual states that .I print and .I printf statements may write to a program using, e.g., .DS print | "\fIcommand\fR" .DE This would pipe the output into \fIcommand\fR, and it does work. However, you should be aware that this causes .B awk to spawn a child process (\fIcommand\fR), and that it .I does not .R wait for the child to exit before it exits itself. In the case of a "slow" command like .B sort, .B awk may exit before .I command has finished. .LP This can cause problems in, for example, a shell script that depends on everything done by .B awk being finished before the next shell command is executed. Consider the shell script .DS awk -f awk_script input_file mv sorted_output somewhere_else .DE and the .B awk script .DS print output_line | "sort -o sorted_output" .DE If .I input_file is large .B awk will exit long before .B sort is finished. That means that the .B mv command will be executed before .B sort is finished, and the result is unlikely to be what you wanted. Other than fixing the source, there is no way to avoid this problem except to handle such pipes outside of the awk script, e.g. .DS awk -f awk_file input_file | sort -o sorted_output mv sorted_output somewhere_else .DE which is not wholly satisfactory. .LP See .I Sketchily Documented Features .R below for other considerations in redirecting output from within an .B awk script. .NH 2 Printf and '*', '\\f', and '\\b' .LP The document says that the \fIprintf\fR function provided is identical to the \fIprintf\fR provided by the \fIC\fR language \fBstdio\fR package. This is incorrect: '*' cannot be used to specify a field width or precision, and '\\f' and '\\b' cannot be used to print formfeeds and backspaces. .LP The command .DS printf("%*.s", len, string) .DE will cause a core dump. Given \fBawk\fR's age, it is likely that its \fIprintf\fR was written well before the use of '*' for specifying field width and precision appeared in the \fBstdio\fR library's \fIprintf\fR. Another possibility is that it wasn't implemented because it isn't really needed to achieve the same effect. .LP To accomplish this effect, you can utilize the fact that \fBawk\fR concatenates variables before it does any other processing on them. For example, assume a script has two variables \fIwid\fR and \fIprec\fR which control the width and precision used for printing another variable \fIval\fI: .DS [code to set "wid", "prec", and "val"] printf("%" wid "." prec "d\en", val) .DE If, for example, \fIwid\fR is 8 and \fIprec\fR is 3, then /fBawk\fR will concatenate everything to the left of the comma in the \fIprintf\fR statement, and the statement will really be .DS printf(%8.3d\en, val) .DE These could, of course, been assigned to some variable \fIfmt\fR before being used: .DS fmt = "%" wid "." prec "d" printf(fmt "\en", val) .DE Note, however, that the newline ("\en") in the second form \fIcannot\fR be included in the assignment to \fIfmt\fR. .LP To allow use of '\\f' and '\\b', \fBawk\fR's \fIlex\fR script must be changed. This is trivial to do (it is done at the point where '\\n' and '\\t' are processed), but requires having source code. [I have fixed this and have not seen any unwanted effects.] .bp .NH Undocumented Features .LP There are several undocumented features: .IP Variable values may be established on the command line. .IP A .B getline function exists that reads the next input line and starts processing it immediately. .IP Regular expressions accept octal representations of characters. .IP A .B -d flag argument produces debugging output if .B awk was compiled with "DEBUG" defined. .IP Scripts may be "compiled" and run later (providing the installer did what is necessary to make this work). .NH 2 Defining Variables On The Command Line .LP To pass variable values into a script at run time, you may use .IP .I variable=value .LP (as many as you like) between any "\fB-f \fIscriptname\fR" or .I program and the names of any files to be processed. For example, .DS awk -f awkscript today=\e"`date`\e" infile .DE would establish for .I awkscript a variable named .B today that had as its value the output of the .B date command. .LP There are a number of caveats: .IP Such assignments may appear only between .B -f .I awkscript (or \fIprogram\fR or [see below] \fB-R\fIawk.out\fR) and the name of any input file (or '-'). .IP Each .I variable=value combination must be a single argument (i.e. there must not be spaces around the '=' sign); .I value may be either a numeric value or a string. If it is a string, it must be enclosed in double quotes at the time \fBawk\fR reads the argument. That means that the double quotes enclosing \fIvalue\fR on the command line must be protected from the shell as in the example above or it will remove them. .IP .I Variable is not available for use within the script until after the first record has been read and parsed, but it is available as soon as that has occurred so that it may be used before any other processing begins. It does not exist at the time the .B BEGIN block is executed, and if there was no input it will not exist in the .B END block (if any). .NH 2 Getline Function .LP .B Getline immediately reads the next input line (which is parsed into \fI$1\fR, \fI$2\fR, etc) and starts processing it at the location of the call (as opposed to .B next which immediately reads the next input line but starts processing from the start of the script). .LP .B Getline facilitates performing some types of tasks such as processing files with multiline records and merging information from several files. To use the latter as an example, consider a case where two files, whose lines do not share a common format, must be processed together. Shell and \fBawk\fR scripts to do this might look something like .sp In the shell script .DS ( echo DATA1; cat datafile1; echo ENDdata1 \e echo DATA2; cat datafile2; echo ENDdata2 \e ) | \e awk -f awkscript - > awk_output_file .DE In the .B awk script .DS /^DATA1/ { # Next input line starts datafile1 while (getline && $1 !~ /^ENDdata1$/) { [processing for \fIdata1\fR lines] } } .sp 1 /^DATA2/ { # Next input line starts datafile2 while (getline && $1 !~ /^ENDdata2$/) { [processing for \fIdata2\fR lines] } } .DE There are, of course, other ways of accomplishing this particular task (primarily using \fBsed\fR to preprocess the information), but they are generally more difficult to write and more subject to logic errors. Many cases arising in practice are significantly more difficult, if not impossible, to handle without \fBgetline\fR. .NH 2 Regular Expressions .LP The sequence "\fI\eddd\fR" (where 'd' is a digit) may be used to include explicit octal values in regular expressions. This is often useful if "nonprinting" characters have been used as "markers" in a file. It has not been tested for ASCII values outside the range 01 through 0127. .NH 2 Debugging output .LP [This is unlikely to be of interest to the casual user.] .sp If \fBawk\fR was compiled with "DEBUG" defined, then giving it a .B -d flag argument will cause it to produce debugging output when it is run. This is sometimes useful in finding obscure problems in scripts, though it is primarily intended for tracking down problems with \fBawk\fR itself. .NH 2 Script "Compilation" .LP [It is likely that this does not work at most sites. If it does not, the following will probably not be of interest to the casual user.] .sp The command .DS awk -S -f script.awk .DE produces a file named .B awk.out. This is a core image of .B awk after parsing the file .I script.awk. The command .DS awk -Rawk.out datafile .DE causes .B awk.out to be applied to \fIdatafile\fR (or the standard input if no input file is given). This avoids having to reparse large scripts each time they are used. Unfortunately, the way this is implemented requires some special action on the part of the person installing \fBawk\fR. .LP As \fBawk\fR is delivered with \fI4.2 BSD\fR (and \fI4.3 BSD\fR), .I awk.out is created by the \fBawk -S ...\fR process by calling .B sbrk() with '0', writing out the returned value, then writing out the core image from location 0 to the returned address. The \fBawk -R...\fR process reads the first word of .I awk.out to get the length of the image, calls .B brk() with that length, and then reads the image into itself starting at location 0. For this to work, \fBawk\fR must have been loaded with its text segment writeable. Unfortunately, the \fIBSD\fR default for \fBld\fR is to load with the text read-only and shareable. Thus, the installer must remember to take special action (e.g. "cc -N ..." [equivalently "ld -N ..."] for \fI4BSD\fR) if these flags are to work. .LP [Personally, I don't think it is a very good idea to give \fBawk\fR the opportunity to write on its text segment; I changed it so that only the data segment is overwritten.] .LP Also, due to what appears to be a lapse in logic, the first non-flag argument following \fB-R\fIawk.out\fR is discarded. [Disliking that behavior, the I changed it so that the \fB-R\fR flag is treated like the \fB-f\fR flag: no flag arguments may follow it.] .bp .NH Sketchily Documented Features .LP .NH 2 Exit .LP The user manual says that using the .B exit function causes the script to behave as if end-of-input has been reached. Not menitoned explicitly is the fact that this will cause the .B END block to be executed if it exists. Also, two things are ommitted: .IP \fBexit(\fIexpr\fB)\fR causes the script's exit status to be set to the value of \fIexpr\fR. .IP If .B exit is called within the .B END block, the script exits immediately. .NH 2 Mathematical Functions .LP The following builtin functions exist and are mentioned in .I awk(1) but not in the user manual. .IP \fBint(\fIx\fB)\fR 10 \fIx\fR trunctated to an integer. .IP \fBsqrt(\fIx\fB)\fR 10 the square root of \fIx\fR for \fIx\fR >= 0, otherwise zero. .IP \fBexp(\fIx\fB)\fR 10 \fBe\fR-to-the-\fIx\fR for -88 <= \fIx\fR <= 88, zero for \fIx\fR < -88, and dumps core for \fIx\fR > 88. .IP \fBlog(\fIx\fB)\fR 10 the natural log of \fIx\fR. .NH 2 OFMT Variable .LP The variable .B OFMT may be set to, e.g. "%.2f", and purely numerical output will be bound by that restriction in .B print statements. The default value is "%.6g". Again, this is mentioned in .I awk(1) but not in the user manual. .NH 2 Array Elements .LP The user manual states that "Array elements ... spring into existence by being mentioned." This is literally true; .I any reference to an array element causes it to exist. ("I was thought about, therefore I am.") Take, for example, .DS if(array[$1] == "blah") { [process blah lines] } .DE If there is not an existing element of .B array whose subscript is the same as the contents of the current line's first field, .I one is created .R and its value (null, of course) is then compared with "blah". This can be a bit disconcerting, particularly when later processing is using .DS for (i in \fBarray\fR) { [do something with result of processing "blah" lines] } .DE to walk the array and expects all the elements to be non-null. Succinct practical examples are difficult to construct, but when this happens in a 500 line script it can be difficult to determine what has gone wrong. .NH 2 FS and Input Fields .LP By default any number of spaces or tabs can separate fields (i.e. there are no null input fields) and trailing spaces and tabs are ignored. However, if .B FS is explicitly set to any character other than a space (e.g., a tab: \fBFS = "\et"\fR), then a field is defined by each such character and trailing field separator characters are not ignored. For example, if '>' represents a tab then .DS one>>three>>five> .DE defines six fields, with fields two, four, and six being empty. .LP If .B FS is explicitly set to a space (\fBFS\fR = "\ "), then the default behavior obtains (this may be a bug); that is, both spaces and tabs are taken as field separators, there can be no null input fields, and trailing spaces and tabs are ignored. .NH 2 RS and Input Records .LP If .B RS is explicitly set to the null string (\fBRS\fR = ""), then the input record separator becomes a blank line, and the newlines at the end of input lines is a field separator. This facilitates handling multiline records. .NH 2 "Fall Through" .LP This is mentioned in the user manual, but it is important enough that it is worth pointing out here, also. .LP In the script .DS /\fIpattern_1\fR/ { [do something] } .sp /\fIpattern_2\fR/ { [do something] } .DE all input lines will be compared with both .I pattern_1 and .I pattern_2 unless the .B next function is used before the closing '}' in the .I pattern_1 portion. .NH 2 Output Redirection .LP Once a file (or pipe) is opened by .B awk it is not closed until .B awk exits. This can occassionally cause problems. For example, it means that a script that sorts its input lines into output files named by the contents of their first fields (similar to an example in the user manual) .DS { print $0 > $1 } .DE is going to fail if the number of different first fields exceeds about 10. This problem .I cannot be avoided by using something like .DS { command = "cat >> " $1 print $0 | command } .DE as the value of the variable .B command is different for each different value of .I $1 and is therefore treated as a different output "file". .LP [I have not been able to create a truly satisfactory fix for this that doesn't involve having \fBawk\fR treat output redirection to pipes differently from output to files; I would greatly appreciate hearing of one.] .NH 2 Field and Variable Types, Values, and Comparisons .LP The following is a synopsis of notes included with \fBawk\fR's source code. .NH 3 Types .LP Variables and fields can be strings or numbers or both. .NH 4 Variable Types .LP When a variable is set by the assignment .DS \fIvar\fR = \fIexpr\fR .DE its type is set to the type of .I expr (this includes +=, ++, etc). An arithmetic expression is of type .I number, a concatenation is of type .I string, etc. If the assignment is a simple copy, e.g. .DS \fIvar1\fR = \fIvar2\fR .DE then the type of .I var1 becomes that of .I var2. .LP Type is determined by context; rarely, but always very inconveniently, this context-determined type is incorrect. As mentioned in .I awk(1) the type of an expression can be coerced to that desired. E.g. .DS { \fIexpr1\fR + 0 .sp 1 \fIexpr2\fR "" # Concatenate with a null string } .DE coerces .I expr1 to numeric type and .I expr2 to string type. .NH 4 Field Types .LP As with variables, the type of a field is determined by context when possible, e.g. .RS .IP $1++ 8 clearly implies that \fI$1\fR is to be numeric, and .IP $1\ =\ $1\ ","\ $2 16 implies that $1 and $2 are both to be strings. .RE .LP Coercion is done as needed. In contexts where types cannot be reliably determined, e.g., .DS if($1 == $2) ... .DE the type of each field is determined on input by inspection. All fields are strings; in addition, each field that contains only a number is also considered numeric. Thus, the test .DS if($1 == $2) ... .DE will succeed on the inputs .DS 0 0.0 100 1e2 +100 100 1e-3 1e-3 .DE and fail on the inputs .DS (null) 0 (null) 0.0 2E-518 6E-427 .DE "only a number" in this case means matching the regular expression .DS ^[+-]?[0-9]*\e.?[0-9]+(e[+-]?[0-9]+)?$ .DE .NH 3 Values .LP Uninitialized variables have the numeric value 0 and the string value "". Therefore, if \fIx\fR is uninitialized, .DS if(x) ... if (x == "0") ... .DE are false, and .DS if(!x) ... if(x == 0) ... if(x == "") ... .DE are true. .LP Fields which are explicitly null have the string value "", and are not numeric. Non-existent fields (i.e., fields past \fBNF\fR) are also treated this way. .NH 3 Types of Comparisons .LP If both operands are numeric, the comparison is made numerically. Otherwise, operands are coerced to type string if necessary, and the comparison is made on strings. .NH 3 Array Elements .LP Array elements created by .B split are treated in the same way as fields. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-28 17:48 ` Adam Sampson @ 2023-06-28 18:03 ` KenUnix 2023-06-28 18:38 ` Clem Cole 2023-06-29 1:04 ` Bakul Shah 0 siblings, 2 replies; 20+ messages in thread From: KenUnix @ 2023-06-28 18:03 UTC (permalink / raw) To: Adam Sampson; +Cc: tuhs [-- Attachment #1: Type: text/plain, Size: 1538 bytes --] Guys, It's been too long. What would I use to compile this man page source? I do remember some option switches are required. Yes? Thanks On Wed, Jun 28, 2023 at 1:49 PM Adam Sampson <ats@offog.org> wrote: > On Wed, Jun 28, 2023 at 09:26:02AM +0300, Aharon Robbins wrote: > > Attached is "A Supplemental Document For Awk". This circulated on > > USENET in the 80s. My copy is dated January 18, 1989, but I'm sure > > it's older than that. > > In the utzoo Usenet archive, there are two versions of this document and > a few mentions of it... > > John Pierce posted to comp.unix.questions on 1989-04-02, saying he'd > written it "four or five years ago". > > Stu Heiss, in comp.unix.questions on 1989-03-06, said it was "posted to > net.sources 18 Jun 86 with message-id 238@sdchema.sdchem.uucp". > Unfortunately this isn't in the utzoo archive or the net.sources.mbox > in archive.org's Usenet Historical Collection. > > A copy identical to yours was posted by Jim Harkins to > comp.unix.questions on 1990-03-29. > > There's a later version, fixing a typo and some formatting and adding a > mention of \f and \b in printf, which was posted by Brian Kantor to > comp.doc on 1987-10-11 -- I've attached this. The same file (with two > .bps commented out) was reposted in comp.unix.questions on 1989-11-16 by > Francois-Michel Lang. > > Thanks, > > -- > Adam Sampson <ats@offog.org> <http://offog.org/> > -- End of line JOB TERMINATED -->> Okey Dokey, OK Boss [-- Attachment #2: Type: text/html, Size: 2425 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-28 18:03 ` KenUnix @ 2023-06-28 18:38 ` Clem Cole 2023-06-28 23:47 ` Greg 'groggy' Lehey 2023-06-29 1:04 ` Bakul Shah 1 sibling, 1 reply; 20+ messages in thread From: Clem Cole @ 2023-06-28 18:38 UTC (permalink / raw) To: KenUnix; +Cc: tuhs [-- Attachment #1: Type: text/plain, Size: 2607 bytes --] Download the file and make sure you save it in "UNIX" format, not DOS ( *i.e.* newline delimited not the nasty <CR><LF> cruft) -- (if you are not sure how to do that running the dos2unix(1) command will assure it's was not unix format when you are done). % file awkdoc awkdoc: troff or preprocessor input text, ASCII text % man groff We'll leave it to you to figure out which switches for troff/groff and macro package (hint: try the head(1) command to peak at the first few lines -- there are three likely choices, but it's pretty obvious since the same one as most V7 documents). FWIW: If you got a copy of Kernighan and Pike's - "The Unix Programming Environment" [ISBN 0-13-937699-2] which is available at most retailers. You can read Chapter 9 for this question. Although, given so many of the questions you seem to like to ask here, please consider doing all the exercises in the entire book. ᐧ On Wed, Jun 28, 2023 at 2:04 PM KenUnix <ken.unix.guy@gmail.com> wrote: > Guys, > > It's been too long. What would I use to compile this man page source? > > I do remember some option switches are required. Yes? > > Thanks > > > On Wed, Jun 28, 2023 at 1:49 PM Adam Sampson <ats@offog.org> wrote: > >> On Wed, Jun 28, 2023 at 09:26:02AM +0300, Aharon Robbins wrote: >> > Attached is "A Supplemental Document For Awk". This circulated on >> > USENET in the 80s. My copy is dated January 18, 1989, but I'm sure >> > it's older than that. >> >> In the utzoo Usenet archive, there are two versions of this document and >> a few mentions of it... >> >> John Pierce posted to comp.unix.questions on 1989-04-02, saying he'd >> written it "four or five years ago". >> >> Stu Heiss, in comp.unix.questions on 1989-03-06, said it was "posted to >> net.sources 18 Jun 86 with message-id 238@sdchema.sdchem.uucp". >> Unfortunately this isn't in the utzoo archive or the net.sources.mbox >> in archive.org's Usenet Historical Collection. >> >> A copy identical to yours was posted by Jim Harkins to >> comp.unix.questions on 1990-03-29. >> >> There's a later version, fixing a typo and some formatting and adding a >> mention of \f and \b in printf, which was posted by Brian Kantor to >> comp.doc on 1987-10-11 -- I've attached this. The same file (with two >> .bps commented out) was reposted in comp.unix.questions on 1989-11-16 by >> Francois-Michel Lang. >> >> Thanks, >> >> -- >> Adam Sampson <ats@offog.org> <http://offog.org/> >> > > > -- > End of line > JOB TERMINATED -->> Okey Dokey, OK Boss > > > [-- Attachment #2: Type: text/html, Size: 5062 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-28 18:38 ` Clem Cole @ 2023-06-28 23:47 ` Greg 'groggy' Lehey 2023-06-29 1:59 ` Stuff Received 2023-06-29 13:34 ` G. Branden Robinson 0 siblings, 2 replies; 20+ messages in thread From: Greg 'groggy' Lehey @ 2023-06-28 23:47 UTC (permalink / raw) To: Clem Cole, KenUnix; +Cc: tuhs [-- Attachment #1: Type: text/plain, Size: 1059 bytes --] On Wednesday, 28 June 2023 at 14:38:40 -0400, Clem Cole wrote: > On Wed, Jun 28, 2023 at 2:04 PM KenUnix <ken.unix.guy@gmail.com> wrote: > >> It's been too long. What would I use to compile this man page source? >> >> I do remember some option switches are required. Yes? > > Download the file and make sure you save it in "UNIX" format, not DOS ( > *i.e.* newline delimited not the nasty <CR><LF> cruft) -- (if you are not > sure how to do that running the dos2unix(1) command will assure it's was > not unix format when you are done). > > % file awkdoc > awkdoc: troff or preprocessor input text, ASCII text > % man groff There's also grog (groff guess) that may help. It's not very clever, but it recognizes a number of formats: $ grog ls.1 groff -mdoc ls.1 Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA.php [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-28 23:47 ` Greg 'groggy' Lehey @ 2023-06-29 1:59 ` Stuff Received 2023-06-29 6:27 ` segaloco via TUHS 2023-06-29 13:45 ` G. Branden Robinson 2023-06-29 13:34 ` G. Branden Robinson 1 sibling, 2 replies; 20+ messages in thread From: Stuff Received @ 2023-06-29 1:59 UTC (permalink / raw) To: tuhs On 2023-06-28 19:47, Greg 'groggy' Lehey wrote: > On Wednesday, 28 June 2023 at 14:38:40 -0400, Clem Cole wrote: [...] > > There's also grog (groff guess) that may help. It's not very clever, > but it recognizes a number of formats: > > $ grog ls.1 > groff -mdoc ls.1 Thank you -- I never knew of its existence. But what did people use before grog and why was the compilation line never placed in a comment in the file? N. > > Greg ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 1:59 ` Stuff Received @ 2023-06-29 6:27 ` segaloco via TUHS 2023-06-29 6:41 ` Andrew Hume ` (2 more replies) 2023-06-29 13:45 ` G. Branden Robinson 1 sibling, 3 replies; 20+ messages in thread From: segaloco via TUHS @ 2023-06-29 6:27 UTC (permalink / raw) To: The Eunuchs Hysterical Society > But what did people use before grog and why was the compilation line > never placed in a comment in the file? The primary macro packages I see come up between Bell and UCB are man, ms, mm, and me. Man of course finds use in the manual pages (although there are different representations of manpages in nroff over time.) From what I've seen (someone who was there can surely correct me) it seems that ms macros were more commonly used on the research side of things while the mm macros proliferated more in the supported side. Finally the me macros were a BSD component. Given these separations, the origin of or relative vicinity from which a paper originates provides much context as to which macros may be present. To a finer point, the papers published with V7 are ms macros papers while the new additions in PWB lineages are mm macros, while some papers that crop up in BSD likely use me (although I haven't gotten too far into BSD with doc research yet.) Papers from UNIX consumers such as universities are likely in ms or me most of the time. On the flip side, mm was the macro package touted with Documenter's Workbench, so many commercial operations using System V for documentation would've produced documents in mm. I'd be curious whether the earlier "Phototypesetter" package included ms or mm (or both.) I don't think I've seen a "papers" set with both the Lesk ms document and the Smith and Mashey mm one, so couldn't say how common both in the same Bell offering were. Additionally, my research hasn't touched on any officially sanctioned use of mm in BSD, so that's an area ripe for some more study. As for other breadcrumbs, Bell mm macros papers do often include a comment at the top indicating to print with nroff -mm or mm(1). I don't recall seeing similar in research papers, but haven't necessarily gone looking. In any case, the paper sets with UNIX itself typically had scripts included with the necessary command-lines, as many papers additionally needed some eqn and/or tbl processing. I imagine any other such formally distributed document sources would likewise include scripts in lieu of commentary, but it depends. - Matt G. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 6:27 ` segaloco via TUHS @ 2023-06-29 6:41 ` Andrew Hume 2023-06-29 6:45 ` Noel Hunt 2023-06-29 6:44 ` Noel Hunt 2023-06-29 14:02 ` G. Branden Robinson 2 siblings, 1 reply; 20+ messages in thread From: Andrew Hume @ 2023-06-29 6:41 UTC (permalink / raw) To: segaloco; +Cc: The Eunuchs Hysterical Society over time, folks in research tended to use make (or its descendants) to generate paper outputs. altho i do recall a tool similar to grog that correctly orchestrated the ideal/pic/eqn/tbl/troff pipeline needed to generate the output. the order was important. as for macros, for several years we tended to use the pm macros (akin to the ms macros) because they drove chris van wyck and kernighan’s page balancing backend, which was necessary to produce print ready copy for journals etc. > On Jun 28, 2023, at 11:27 PM, segaloco via TUHS <tuhs@tuhs.org> wrote: > >> But what did people use before grog and why was the compilation line >> never placed in a comment in the file? ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 6:41 ` Andrew Hume @ 2023-06-29 6:45 ` Noel Hunt 2023-06-29 6:48 ` Andrew Hume 0 siblings, 1 reply; 20+ messages in thread From: Noel Hunt @ 2023-06-29 6:45 UTC (permalink / raw) To: Andrew Hume; +Cc: segaloco, The Eunuchs Hysterical Society > altho i do recall a tool similar to grog that correctly orchestrated the ideal/pic/eqn/tbl/troff pipeline Perhaps you are referring to 'doctype'? ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 6:45 ` Noel Hunt @ 2023-06-29 6:48 ` Andrew Hume 2023-06-29 6:50 ` arnold 0 siblings, 1 reply; 20+ messages in thread From: Andrew Hume @ 2023-06-29 6:48 UTC (permalink / raw) To: Noel Hunt; +Cc: The Eunuchs Hysterical Society its possible; i simply can’t remember 40 years ago. > On Jun 28, 2023, at 11:45 PM, Noel Hunt <noel.hunt@gmail.com> wrote: > >> altho i do recall a tool similar to grog that correctly orchestrated the ideal/pic/eqn/tbl/troff pipeline > > Perhaps you are referring to 'doctype'? ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 6:48 ` Andrew Hume @ 2023-06-29 6:50 ` arnold 0 siblings, 0 replies; 20+ messages in thread From: arnold @ 2023-06-29 6:50 UTC (permalink / raw) To: noel.hunt, andrew; +Cc: tuhs It is doctype. It's still alive (as an rc/grep/awk) script in Plan 9 and descendants. Andrew Hume <andrew@humeweb.com> wrote: > its possible; i simply can’t remember 40 years ago. > > > On Jun 28, 2023, at 11:45 PM, Noel Hunt <noel.hunt@gmail.com> wrote: > > > >> altho i do recall a tool similar to grog that correctly orchestrated the ideal/pic/eqn/tbl/troff pipeline > > > > Perhaps you are referring to 'doctype'? > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 6:27 ` segaloco via TUHS 2023-06-29 6:41 ` Andrew Hume @ 2023-06-29 6:44 ` Noel Hunt 2023-06-29 14:02 ` G. Branden Robinson 2 siblings, 0 replies; 20+ messages in thread From: Noel Hunt @ 2023-06-29 6:44 UTC (permalink / raw) To: segaloco; +Cc: The Eunuchs Hysterical Society And let us not forget the wonderful 'mv' macros, for typesetting over-head projection slides. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 6:27 ` segaloco via TUHS 2023-06-29 6:41 ` Andrew Hume 2023-06-29 6:44 ` Noel Hunt @ 2023-06-29 14:02 ` G. Branden Robinson 2 siblings, 0 replies; 20+ messages in thread From: G. Branden Robinson @ 2023-06-29 14:02 UTC (permalink / raw) To: segaloco; +Cc: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 3759 bytes --] At 2023-06-29T06:27:44+0000, segaloco via TUHS wrote: > Man of course finds use in the manual pages (although there are > different representations of manpages in nroff over time.) Setting aside the well known bifurcation between man(7) and mdoc(7), which manage to stay out of each other's way in the macro name space, I'm not aware of any comparative survey of different man(7) implementations. Ultrix at some point--I have no insight into the chronology of it--had a large set of extensions that remains quietly documented and supported by groff to this day, albeit off in a corner where it seems to receive little attention. (Just as well, in my opinion, as not all of its innovations are worthy of embrace.) As far as other vendor extensions and developments go, I have collected all of the information known to me into the groff_man(7) page in the any-minute-now groff 1.23.0 release. Here are the relevant sections. (There are two because concept and implementation are distinguishable.) History M. Douglas McIlroy designed, implemented, and documented the AT&T man macros for Unix Version 7 (1979) and employed them to edit the first volume of its Programmer's Manual, a compilation of all man pages supplied by the system. That man supported the macros listed in this page not described as extensions, except .P and the deprecated .AT and .UC. The only strings defined were R and S; no registers were documented. .UC appeared in 3BSD (1980). Unix System III (1980) introduced .P and exposed the registers IN and LL, which had been internal to Seventh Edition Unix man. PWB/UNIX 2.0 (1980) added the Tm string. 4BSD (1980) added lq and rq strings. SunOS 2.0 (1985) recognized C, D, P, and X registers. 4.3BSD (1986) added .AT and .P. Ninth Edition Research Unix (1986) introduced .EX and .EE. SunOS 4.0 (1988) added .SB. The foregoing features were what James Clark implemented in early versions of groff. Later, groff 1.20 (2009) originated .SY/.YS, .TQ, .MT/.ME, and .UR/.UE. Plan 9 from User Space's troff introduced .MR in 2020. Authors The initial GNU implementation of the man macro package was written by James Clark. Later, Werner Lemberg supplied the S, LT, and cR registers, the last a 4.3BSD-Reno mdoc(7) feature. Larry Kollar added the FT, HY, and SN registers; the HF string; and the PT and BT macros. G. Branden Robinson implemented the AD and MF strings; CS, CT, and U registers; and the MR macro. Except for .SB, the extension macros were written by Lemberg, Eric S. Raymond, and Robinson. This document was originally written for the Debian GNU/Linux system by Susan G. Kleinmann. It was corrected and updated by Lemberg and Robinson. The extension macros were documented by Raymond and Robinson. I welcome any further insights people can offer. This man page isn't the best place to document extensions that withered on the vine (like Eighth/Ninth Edition Research Unix's addition of multi-column macros for man(7)), but I wouldn't mind collecting such things into some sort of auxiliary article. While the mandoc(1)/mdocml project's "History of UNIX Manpages"[1] is an invaluable resource, it doesn't really do what's written on the tin, and serves more as a history of (some) *roff _formatters_--not of the man(7) language. I assume that this stance is in part due to the unease bordering on antipathy that mandoc(1) proponents have for the man(7) macro package. In their view, everybody should be writing mdoc(7). Unfortunately this lacuna has left useful historical information about the man(7) package uncollected. Regards, Branden [1] https://manpages.bsd.lv/history.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 1:59 ` Stuff Received 2023-06-29 6:27 ` segaloco via TUHS @ 2023-06-29 13:45 ` G. Branden Robinson 1 sibling, 0 replies; 20+ messages in thread From: G. Branden Robinson @ 2023-06-29 13:45 UTC (permalink / raw) To: Stuff Received; +Cc: tuhs [-- Attachment #1: Type: text/plain, Size: 545 bytes --] At 2023-06-28T21:59:24-0400, Stuff Received wrote: > and why was the compilation line never placed in a comment in the > file? Having done some work with historical *roff documents, my conjecture is that the single source of truth was usually to be found in a Makefile. Unfortunately, *roff documents have not reliably been distributed along with the scripts directing control of their compilation and installation. If you insist upon that, you start sounding like one of those street-corner preaching copyleft people... ;-) Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-28 23:47 ` Greg 'groggy' Lehey 2023-06-29 1:59 ` Stuff Received @ 2023-06-29 13:34 ` G. Branden Robinson 2023-06-29 13:47 ` Rich Salz 1 sibling, 1 reply; 20+ messages in thread From: G. Branden Robinson @ 2023-06-29 13:34 UTC (permalink / raw) To: Greg 'groggy' Lehey; +Cc: tuhs [-- Attachment #1.1: Type: text/plain, Size: 686 bytes --] At 2023-06-29T09:47:50+1000, Greg 'groggy' Lehey wrote: > There's also grog (groff guess) that may help. It's not very clever, > but it recognizes a number of formats: > > $ grog ls.1 > groff -mdoc ls.1 I won't claim that grog is more clever now, but as of groff 1.23.0 it is[1] avowedly less buggy. It is also 52% of its former size (by `wc -l`), has 14 bug fixes since groff 1.22.4 (with only a wish list item remaining), sports an automated test suite, and the tool itself can now be conveniently passed around as a single file--so I'm attaching it. Regards, Branden [1] Will be. We're up to release candidate 4 now. https://alpha.gnu.org/gnu/groff/ [-- Attachment #1.2: grog --] [-- Type: text/plain, Size: 19221 bytes --] #!/usr/bin/perl # grog - guess options for groff command # Inspired by doctype script in Kernighan & Pike, Unix Programming # Environment, pp 306-8. # Copyright (C) 1993-2021 Free Software Foundation, Inc. # Written by James Clark. # Rewritten in Perl by Bernd Warken <groff-bernd.warken-72@web.de>. # Hacked up by G. Branden Robinson, 2021. # This file is part of 'grog', which is part of 'groff'. # 'groff' is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 2 of the License, or # (at your option) any later version. # 'groff' is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # You should have received a copy of the GNU General Public License # along with this program. If not, see # <http://www.gnu.org/licenses/gpl-2.0.html>. use warnings; use strict; use File::Spec; my $groff_version = 'DEVELOPMENT'; my @command = (); # the constructed groff command my @requested_package = (); # arguments to '-m' grog options my @inferred_preprocessor = (); # preprocessors the document uses my @inferred_main_package = (); # full-service package(s) detected my $main_package; # full-service package we go with my $do_run = 0; # run generated 'groff' command my $use_compatibility_mode = 0; # is -C being passed to groff? my %preprocessor_for_macro = ( 'EQ', 'eqn', 'G1', 'grap', 'GS', 'grn', 'PS', 'pic', '[', 'refer', #'so', 'soelim', # Can't be inferred this way; see grog man page. 'TS', 'tbl', 'cstart', 'chem', 'lilypond', 'glilypond', 'Perl', 'gperl', 'pinyin', 'gpinyin', ); my $program_name = $0; { my ($v, $d, $f) = File::Spec->splitpath($program_name); $program_name = $f; } my %user_macro; my %score = (); my @input_file; # .TH is both a man(7) macro and often used with tbl(1). We expect to # find .TH in ms(7) documents only between .TS and .TE calls, and in # man(7) documents only as the first macro call. my $have_seen_first_macro_call = 0; # man(7) and ms(7) use many of the same macro names; do extra checking. my $man_score = 0; my $ms_score = 0; my $had_inference_problem = 0; my $had_processing_problem = 0; my $have_any_valid_arguments = 0; sub fail { my $text = shift; print STDERR "$program_name: error: $text\n"; $had_processing_problem = 1; } sub warn { my $text = shift; print STDERR "$program_name: warning: $text\n"; } sub process_arguments { my $no_more_options = 0; my $delayed_option = ''; my $was_minus = 0; my $optarg = 0; my $pdf_with_ligatures = 0; foreach my $arg (@ARGV) { if ( $optarg ) { push @command, $arg; $optarg = 0; next; } if ($no_more_options) { push @input_file, $arg; next; } if ($delayed_option) { if ($delayed_option eq '-m') { push @requested_package, $arg; $arg = ''; } else { push @command, $delayed_option; } push @command, $arg if $arg; $delayed_option = ''; next; } unless ( $arg =~ /^-/ ) { # file name, no opt, no optarg push @input_file, $arg; next; } # now $arg starts with '-' if ($arg eq '-') { unless ($was_minus) { push @input_file, $arg; $was_minus = 1; } next; } if ($arg eq '--') { $no_more_options = 1; next; } # Handle options that cause an early exit. &version() if ($arg eq '-v' || $arg eq '--version'); &usage(0) if ($arg eq '-h' || $arg eq '--help'); if ($arg =~ '^--.') { if ($arg =~ '^--(run|with-ligatures)$') { $do_run = 1 if ($arg eq '--run'); $pdf_with_ligatures = 1 if ($arg eq '--with-ligatures'); } else { &fail("unrecognized grog option '$arg'; ignored"); &usage(1); } next; } # Handle groff options that take an argument. # Handle the option argument being separated by whitespace. if ($arg =~ /^-[dfFIKLmMnoPrTwW]$/) { $delayed_option = $arg; next; } # Handle '-m' option without subsequent whitespace. if ($arg =~ /^-m/) { my $package = $arg; $package =~ s/-m//; push @requested_package, $package; next; } # Treat anything else as (possibly clustered) groff options that # take no arguments. # Our do_line() needs to know if it should do compatibility parsing. $use_compatibility_mode = 1 if ($arg =~ /C/); push @command, $arg; } if ($pdf_with_ligatures) { push @command, '-P-y'; push @command, '-PU'; } @input_file = ('-') unless (@input_file); } # process_arguments() sub process_input { foreach my $file (@input_file) { unless ( open(FILE, $file eq "-" ? $file : "< $file") ) { &fail("cannot open '$file': $!"); next; } $have_any_valid_arguments = 1; while (my $line = <FILE>) { chomp $line; &do_line($line); } close(FILE); } # end foreach } # process_input() # Push item onto inferred full-service list only if not already present. sub push_main_package { my $pkg = shift; if (!grep(/^$pkg/, @inferred_main_package)) { push @inferred_main_package, $pkg; } } # push_main_package() sub do_line { my $command; # request or macro name my $args; # request or macro arguments my $line = shift; # Check for a Perl Pod::Man comment. # # An alternative to this kludge is noted below: if a "standard" macro # is redefined, we could delete it from the relevant lists and # hashes. if ($line =~ /\\\" Automatically generated by Pod::Man/) { $man_score += 100; } # Strip comments. $line =~ s/\\".*//; $line =~ s/\\#.*// unless $use_compatibility_mode; return unless ($line =~ /^[.']/); # Ignore text lines. # Perform preprocessor checks; they scan their inputs using a rump # interpretation of roff(7) syntax that requires the default control # character and no space between it and the macro name. In AT&T # compatibility mode, no space (or newline!) is required after the # macro name, either. We mimic the preprocessors themselves; eqn(1), # for instance, does not recognize '.EN' if '.EQ' has not been seen. my $boundary = '\\b'; $boundary = '' if ($use_compatibility_mode); if ($line =~ /^\.(\S\S)$boundary/ || $line =~ /^\.(\[)/) { my $macro = $1; # groff identifiers can have extremely weird characters in them. # The ones we care about are conventionally named, but me(7) # documents can call macros like '+c', so quote carefully. if (grep(/^\Q$macro\E$/, keys %preprocessor_for_macro)) { my $preproc = $preprocessor_for_macro{$macro}; if (!grep(/$preproc/, @inferred_preprocessor)) { push @inferred_preprocessor, $preproc; } } } # Normalize control lines; convert no-break control character to the # regular one and remove unnecessary whitespace. $line =~ s/^['.]\s*/./; $line =~ s/\s+$//; return if ($line =~ /^\.$/); # Ignore empty request. return if ($line =~ /^\.\\?\.$/); # Ignore macro definition ends. # Split control line into a request or macro call and its arguments. # Handle single-letter macro names. if ($line =~ /^\.(\S)(\s+(.*))?$/) { $command = $1; $args = $2; # Handle two-letter macro/request names in compatibility mode. } elsif ($use_compatibility_mode) { $line =~ /^\.(\S\S)\s*(.*)$/; $command = $1; $args = $2; # Handle multi-letter macro/request names in groff mode. } else { $line =~ /^\.(\S+)(\s+(.*))?$/; $command = $1; $args = $3; } $command = '' unless ($command); $args = '' unless ($args); ###################################################################### # user-defined macros # If the line calls a user-defined macro, skip it. return if (exists $user_macro{$command}); # These are all requests supported by groff 1.23.0. my @request = ('ab', 'ad', 'af', 'aln', 'als', 'am', 'am1', 'ami', 'ami1', 'as', 'as1', 'asciify', 'backtrace', 'bd', 'blm', 'box', 'boxa', 'bp', 'br', 'brp', 'break', 'c2', 'cc', 'ce', 'cf', 'cflags', 'ch', 'char', 'chop', 'class', 'close', 'color', 'composite', 'continue', 'cp', 'cs', 'cu', 'da', 'de', 'de1', 'defcolor', 'dei', 'dei1', 'device', 'devicem', 'di', 'do', 'ds', 'ds1', 'dt', 'ec', 'ecr', 'ecs', 'el', 'em', 'eo', 'ev', 'evc', 'ex', 'fam', 'fc', 'fchar', 'fcolor', 'fi', 'fp', 'fschar', 'fspecial', 'ft', 'ftr', 'fzoom', 'gcolor', 'hc', 'hcode', 'hla', 'hlm', 'hpf', 'hpfa', 'hpfcode', 'hw', 'hy', 'hym', 'hys', 'ie', 'if', 'ig', 'in', 'it', 'itc', 'kern', 'lc', 'length', 'linetabs', 'lf', 'lg', 'll', 'lsm', 'ls', 'lt', 'mc', 'mk', 'mso', 'msoquiet', 'na', 'ne', 'nf', 'nh', 'nm', 'nn', 'nop', 'nr', 'nroff', 'ns', 'nx', 'open', 'opena', 'os', 'output', 'pc', 'pev', 'pi', 'pl', 'pm', 'pn', 'pnr', 'po', 'ps', 'psbb', 'pso', 'ptr', 'pvs', 'rchar', 'rd', 'return', 'rfschar', 'rj', 'rm', 'rn', 'rnn', 'rr', 'rs', 'rt', 'schar', 'shc', 'shift', 'sizes', 'so', 'soquiet', 'sp', 'special', 'spreadwarn', 'ss', 'stringdown', 'stringup', 'sty', 'substring', 'sv', 'sy', 'ta', 'tc', 'ti', 'tkf', 'tl', 'tm', 'tm1', 'tmc', 'tr', 'trf', 'trin', 'trnt', 'troff', 'uf', 'ul', 'unformat', 'vpt', 'vs', 'warn', 'warnscale', 'wh', 'while', 'write', 'writec', 'writem'); # Add user-defined macro names to %user_macro. # # Macros can also be defined with .dei{,1}, ami{,1}, but supporting # that would be a heavy lift for the benefit of users that probably # don't require grog's help. --GBR if ($command =~ /^(de|am)1?$/) { my $name = $args; # Strip off any end macro. $name =~ s/\s+.*$//; # Handle special cases of macros starting with '[' or ']'. if ($name =~ /^[][]/) { delete $preprocessor_for_macro{'['}; } # XXX: If the macro name shadows a standard macro name, maybe we # should delete the latter from our lists and hashes. This might # depend on whether the document is trying to remain compatible # with an existing interface, or simply colliding with names they # don't care about (consider a raw roff document that defines 'PP'). # --GBR $user_macro{$name} = 0 unless (exists $user_macro{$name}); return; } # XXX: Handle .rm as well? # Ignore all other requests. Again, macro names can contain Perl # regex metacharacters, so be careful. return if (grep(/^\Q$command\E$/, @request)); # What remains must be a macro name. my $macro = $command; $have_seen_first_macro_call = 1; $score{$macro}++; ###################################################################### # macro package (tmac) ###################################################################### # man and ms share too many macro names for the following approach to # be fruitful for many documents; see &infer_man_or_ms_package. # # We can put one thumb on the scale, however. if ((!$have_seen_first_macro_call) && ($macro eq 'TH')) { # TH as the first call in a document screams man(7). $man_score += 100; } ########## # mdoc if ($macro =~ /^Dd$/) { &push_main_package('doc'); return; } ########## # old mdoc if ($macro =~ /^(Tp|Dp|De|Cx|Cl)$/) { &push_main_package('doc-old'); return; } ########## # me if ($macro =~ /^( [ilnp]p| n[12]| sh )$/x) { &push_main_package('e'); return; } ############# # mm and mmse if ($macro =~ /^( H| MULB| LO| LT| NCOL| PH| SA )$/x) { if ($macro =~ /^LO$/) { if ( $args =~ /^(DNAMN|MDAT|BIL|KOMP|DBET|BET|SIDOR)/ ) { &push_main_package('mse'); return; } } elsif ($macro =~ /^LT$/) { if ( $args =~ /^(SVV|SVH)/ ) { &push_main_package('mse'); return; } } &push_main_package('m'); return; } ########## # mom if ($macro =~ /^( ALD| AUTHOR| CHAPTER_TITLE| CHAPTER| COLLATE| DOCHEADER| DOCTITLE| DOCTYPE| DOC_COVER| FAMILY| FAM| FT| LEFT| LL| LS| NEWPAGE| NO_TOC_ENTRY| PAGENUMBER| PAGE| PAGINATION| PAPER| PRINTSTYLE| PT_SIZE| START| TITLE| TOC_AFTER_HERE TOC| T_MARGIN| )$/x) { &push_main_package('om'); return; } } # do_line() my @preprocessor = (); sub infer_preprocessors { my %option_for_preprocessor = ( 'eqn', '-e', 'grap', '-G', 'grn', '-g', 'pic', '-p', 'refer', '-R', #'soelim', '-s', # Can't be inferred this way; see grog man page. 'tbl', '-t', 'chem', '-j' ); # Use a temporary list we can sort later. We want the options to show # up in a stable order for testing purposes instead of the order their # macros turn up in the input. groff doesn't care about the order. my @opt = (); foreach my $preproc (@inferred_preprocessor) { my $preproc_option = $option_for_preprocessor{$preproc}; if ($preproc_option) { push @opt, $preproc_option; } else { push @preprocessor, $preproc; } } push @command, sort @opt; } # infer_preprocessors() # Return true (1) if either the man or ms package is inferred. sub infer_man_or_ms_package { my @macro_ms = ('RP', 'TL', 'AU', 'AI', 'DA', 'ND', 'AB', 'AE', 'QP', 'QS', 'QE', 'XP', 'NH', 'R', 'CW', 'BX', 'UL', 'LG', 'NL', 'KS', 'KF', 'KE', 'B1', 'B2', 'DS', 'DE', 'LD', 'ID', 'BD', 'CD', 'RD', 'FS', 'FE', 'OH', 'OF', 'EH', 'EF', 'P1', 'TA', '1C', '2C', 'MC', 'XS', 'XE', 'XA', 'TC', 'PX', 'IX', 'SG'); my @macro_man = ('BR', 'IB', 'IR', 'RB', 'RI', 'P', 'TH', 'TP', 'SS', 'HP', 'PD', 'AT', 'UC', 'SB', 'EE', 'EX', 'OP', 'MT', 'ME', 'SY', 'YS', 'TQ', 'UR', 'UE'); my @macro_man_or_ms = ('B', 'I', 'BI', 'DT', 'RS', 'RE', 'SH', 'SM', 'IP', 'LP', 'PP'); for my $key (@macro_man_or_ms, @macro_man, @macro_ms) { $score{$key} = 0 unless exists $score{$key}; } # Compute a score for each package by counting occurrences of their # characteristic macros. foreach my $key (@macro_man_or_ms) { $man_score += $score{$key}; $ms_score += $score{$key}; } foreach my $key (@macro_man) { $man_score += $score{$key}; } foreach my $key (@macro_ms) { $ms_score += $score{$key}; } if (!$ms_score && !$man_score) { # The input may be a "raw" roff document; this is not a problem, # but it does mean no package was inferred. return 0; } elsif ($ms_score == $man_score) { # If there was no TH call, it's not a (valid) man(7) document. if (!$score{'TH'}) { &push_main_package('s'); } else { &warn("document ambiguous; disambiguate with -man or -ms option"); $had_inference_problem = 1; } return 0; } elsif ($ms_score > $man_score) { &push_main_package('s'); } else { &push_main_package('an'); } return 1; } # infer_man_or_ms_package() sub construct_command { my @main_package = ('an', 'doc', 'doc-old', 'e', 'm', 'om', 's'); my $file_args_included; # file args now only at 1st preproc unshift @command, 'groff'; if (@preprocessor) { my @progs; $progs[0] = shift @preprocessor; push(@progs, @input_file); for (@preprocessor) { push @progs, '|'; push @progs, $_; } push @progs, '|'; unshift @command, @progs; $file_args_included = 1; } else { $file_args_included = 0; } foreach (@command) { next unless /\s/; # when one argument has several words, use accents $_ = "'" . $_ . "'"; } my $have_ambiguous_main_package = 0; my $inferred_main_package_count = scalar @inferred_main_package; # Did we infer multiple full-service packages? if ($inferred_main_package_count > 1) { $have_ambiguous_main_package = 1; # For each one the user explicitly requested... for my $pkg (@requested_package) { # ...did it resolve the ambiguity for us? if (grep(/$pkg/, @inferred_main_package)) { @inferred_main_package = ($pkg); $have_ambiguous_main_package = 0; last; } } } elsif ($inferred_main_package_count == 1) { $main_package = shift @inferred_main_package; } if ($have_ambiguous_main_package) { # TODO: Alphabetical is probably not the best ordering here. We # should tally up scores on a per-package basis generally, not just # for an and s. for my $pkg (@main_package) { if (grep(/$pkg/, @inferred_main_package)) { $main_package = $pkg; &warn("document ambiguous (choosing '$main_package'" . " from '@inferred_main_package'); disambiguate with -m" . " option"); $had_inference_problem = 1; last; } } } # If a full-service package was explicitly requested, warn if the # inference differs from the request. This also ensures that all -m # arguments are placed in the same order that the user gave them; # caveat dictator. my @auxiliary_package_argument = (); for my $pkg (@requested_package) { my $is_auxiliary_package = 1; if (grep(/$pkg/, @main_package)) { $is_auxiliary_package = 0; if ($pkg ne $main_package) { &warn("overriding inferred package '$main_package'" . " with requested package '$pkg'"); $main_package = $pkg; } } if ($is_auxiliary_package) { push @auxiliary_package_argument, "-m" . $pkg; } } push @command, '-m' . $main_package if ($main_package); push @command, @auxiliary_package_argument; push @command, @input_file unless ($file_args_included); ######### # execute the 'groff' command here with option '--run' if ( $do_run ) { # with --run print STDERR "@command\n"; my $cmd = join ' ', @command; system($cmd); } else { print "@command\n"; } } # construct_command() sub usage { my $stream = *STDOUT; my $had_error = shift; $stream = *STDERR if $had_error; my $grog = $program_name; print $stream "usage: $grog [--ligatures] [--run]" . " [groff-option ...] [--] [file ...]\n" . "usage: $grog {-v | --version}\n" . "usage: $grog {-h | --help}\n"; unless ($had_error) { print $stream "\n" . "Read each roff(7) input FILE and attempt to infer an appropriate\n" . "groff(1) command to format it. See the grog(1) manual page.\n"; } exit $had_error; } sub version { print "GNU $program_name (groff) $groff_version\n"; exit 0; } # version() # initialize my $in_unbuilt_source_tree = 0; { my $at = '@'; $in_unbuilt_source_tree = 1 if ('1.23.0.rc4.391-325a' eq "${at}VERSION${at}"); } $groff_version = '1.23.0.rc4.391-325a' unless ($in_unbuilt_source_tree); &process_arguments(); &process_input(); if ($have_any_valid_arguments) { &infer_preprocessors(); &infer_man_or_ms_package() if (scalar @inferred_main_package != 1); &construct_command(); } exit 2 if ($had_processing_problem); exit 1 if ($had_inference_problem); exit 0; # Local Variables: # fill-column: 72 # mode: CPerl # End: # vim: set cindent noexpandtab shiftwidth=2 softtabstop=2 textwidth=72: [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 13:34 ` G. Branden Robinson @ 2023-06-29 13:47 ` Rich Salz 2023-06-29 19:03 ` Steffen Nurpmeso 0 siblings, 1 reply; 20+ messages in thread From: Rich Salz @ 2023-06-29 13:47 UTC (permalink / raw) To: G. Branden Robinson; +Cc: tuhs [-- Attachment #1: Type: text/plain, Size: 75 bytes --] A perl script to inuit likely roff options as definitely a neat Unix hack. [-- Attachment #2: Type: text/html, Size: 100 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-29 13:47 ` Rich Salz @ 2023-06-29 19:03 ` Steffen Nurpmeso 0 siblings, 0 replies; 20+ messages in thread From: Steffen Nurpmeso @ 2023-06-29 19:03 UTC (permalink / raw) To: tuhs Rich Salz wrote in <CAFH29toi4aFfGY7g+SndAPz6ndjk8j+LKZOGfnd2GQGnrNXKhw@mail.gmail.com>: |A perl script to inuit likely roff options as definitely a neat Unix hack. The "problem" is that the "shebang" line used for UNIX man'uals on at least a few ("newer" <> post Y2K) systems has never been extended in plain *roff terms, for general macro things. Ie that For example, newer man(1)s read the first line of the manual and check for a syntax <^'\" >followed by concat of [egprtv]+ (and in fact *join in* $MANROFFSEQ environment [egprtv]+) while getopts 'egprtv' preproc_arg; do case "${preproc_arg}" in e) pipeline="$pipeline | $EQN" ;; g) GRAP ;; # Ignore for compatibility. p) pipeline="$pipeline | $PIC" ;; r) pipeline="$pipeline | $REFER" ;; t) pipeline="$pipeline | $TBL" ;; v) pipeline="$pipeline | $VGRIND" ;; *) usage ;; esac Of course, most roff's do not have that "super process" that groff actually is, for one, so you have to formulate pipelines anyway. And then roff is dead for the young. Generally speaking. It is only a pity in my opinion because the most widely used implementation (GNU roff) actually does "magic" already and anyway, namely in its preconv(1), which does preconv tries to find the input encoding with the following algorithm. ... 2. Otherwise, check whether the input starts with a Byte Order Mark (BOM, see below). If found, use it. 3. Otherwise, check whether there is a known coding tag (see below) in either the first or second input line. If found, use it. ... 5. If everything fails[.] And 3. is then [.]supports the coding tag convention (with some restrictions) as used by GNU Emacs and XEmacs[.] ... .\" -*- mode: troff; coding: latin-2 -*- But possibly the future brings not only integrative and truthful western white men, but also a roff which "can". The former i doubt, the latter i can still hope for. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-28 18:03 ` KenUnix 2023-06-28 18:38 ` Clem Cole @ 2023-06-29 1:04 ` Bakul Shah 1 sibling, 0 replies; 20+ messages in thread From: Bakul Shah @ 2023-06-29 1:04 UTC (permalink / raw) To: KenUnix; +Cc: TUHS The presence of .AB, .AU etc says you need nroff -ms But why even bother unless you plan to become an awkspert? > On Jun 28, 2023, at 11:03 AM, KenUnix <ken.unix.guy@gmail.com> wrote: > > Guys, > > It's been too long. What would I use to compile this man page source? > > I do remember some option switches are required. Yes? > > Thanks > > > On Wed, Jun 28, 2023 at 1:49 PM Adam Sampson <ats@offog.org> wrote: > On Wed, Jun 28, 2023 at 09:26:02AM +0300, Aharon Robbins wrote: >> Attached is "A Supplemental Document For Awk". This circulated on >> USENET in the 80s. My copy is dated January 18, 1989, but I'm sure >> it's older than that. > > In the utzoo Usenet archive, there are two versions of this document and > a few mentions of it... > > John Pierce posted to comp.unix.questions on 1989-04-02, saying he'd > written it "four or five years ago". > > Stu Heiss, in comp.unix.questions on 1989-03-06, said it was "posted to > net.sources 18 Jun 86 with message-id 238@sdchema.sdchem.uucp". > Unfortunately this isn't in the utzoo archive or the net.sources.mbox > in archive.org's Usenet Historical Collection. > > A copy identical to yours was posted by Jim Harkins to > comp.unix.questions on 1990-03-29. > > There's a later version, fixing a typo and some formatting and adding a > mention of \f and \b in printf, which was posted by Brian Kantor to > comp.doc on 1987-10-11 -- I've attached this. The same file (with two > .bps commented out) was reposted in comp.unix.questions on 1989-11-16 by > Francois-Michel Lang. > > Thanks, > > -- > Adam Sampson <ats@offog.org> <http://offog.org/> > > > -- > End of line > JOB TERMINATED -->> Okey Dokey, OK Boss ^ permalink raw reply [flat|nested] 20+ messages in thread
* [TUHS] Re: Trying to date "A Supplemental Document For Awk" 2023-06-28 6:26 [TUHS] Trying to date "A Supplemental Document For Awk" Aharon Robbins 2023-06-28 6:45 ` [TUHS] " arnold 2023-06-28 17:48 ` Adam Sampson @ 2023-06-29 0:26 ` Jeremy C. Reed 2 siblings, 0 replies; 20+ messages in thread From: Jeremy C. Reed @ 2023-06-29 0:26 UTC (permalink / raw) To: Aharon Robbins; +Cc: tuhs I found a copy from 1986 in usenix89/Lang/Awk_doc/:STUFF (the file is called :STUFF) from a tar usenix878889.tar.gz I didn't check but I assume it is one here https://www.tuhs.org/Archive/Applications/Shoppa_Tapes/ Path: plus5!wuphys!wucs!we53!ltuxa!cuae2!ihnp4!mhuxn!mhuxr!ulysses!ucbvax!sdcsvax!sdchem!jwp From: jwp@sdchem.UUCP (John Pierce) Newsgroups: net.sources Subject: Awk document Message-ID: <238@sdchema.sdchem.UUCP> Date: 18 Jun 86 20:04:32 GMT Reply-To: jwp@sdchem.UUCP (John Pierce) Organization: Chemistry Dept, UC San Diego Lines: 743 Posted: Wed Jun 18 15:04:32 1986 ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-06-29 19:03 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-06-28 6:26 [TUHS] Trying to date "A Supplemental Document For Awk" Aharon Robbins 2023-06-28 6:45 ` [TUHS] " arnold 2023-06-28 17:48 ` Adam Sampson 2023-06-28 18:03 ` KenUnix 2023-06-28 18:38 ` Clem Cole 2023-06-28 23:47 ` Greg 'groggy' Lehey 2023-06-29 1:59 ` Stuff Received 2023-06-29 6:27 ` segaloco via TUHS 2023-06-29 6:41 ` Andrew Hume 2023-06-29 6:45 ` Noel Hunt 2023-06-29 6:48 ` Andrew Hume 2023-06-29 6:50 ` arnold 2023-06-29 6:44 ` Noel Hunt 2023-06-29 14:02 ` G. Branden Robinson 2023-06-29 13:45 ` G. Branden Robinson 2023-06-29 13:34 ` G. Branden Robinson 2023-06-29 13:47 ` Rich Salz 2023-06-29 19:03 ` Steffen Nurpmeso 2023-06-29 1:04 ` Bakul Shah 2023-06-29 0:26 ` Jeremy C. Reed
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).