From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-2.sys.kth.se (smtp-2.sys.kth.se [130.237.32.160]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o66N8Frc010363 for ; Tue, 6 Jul 2010 19:08:16 -0400 (EDT) Received: from smtp-2.sys.kth.se (localhost [127.0.0.1]) by smtp-2.sys.kth.se (Postfix) with ESMTP id 8C0BF14C146; Wed, 7 Jul 2010 01:08:09 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-2.sys.kth.se ([127.0.0.1]) by smtp-2.sys.kth.se (smtp-2.sys.kth.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id WzY+94++4q+L; Wed, 7 Jul 2010 01:08:08 +0200 (CEST) X-KTH-Auth: kristaps [85.8.61.156] X-KTH-mail-from: kristaps@bsd.lv Received: from lappy.bsd.lv (h85-8-61-156.dynamic.se.alltele.net [85.8.61.156]) by smtp-2.sys.kth.se (Postfix) with ESMTP id DF1F114C12F; Wed, 7 Jul 2010 01:08:05 +0200 (CEST) Message-ID: <4C33B754.2010609@bsd.lv> Date: Wed, 07 Jul 2010 01:08:04 +0200 From: Kristaps Dzonsons User-Agent: Thunderbird 2.0.0.16 (X11/20080812) X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 To: "tech@mdocml.bsd.lv" , Jason McIntyre Subject: roff_getstr() and input characters Content-Type: multipart/mixed; boundary="------------090407060106070609080306" This is a multi-part message in MIME format. --------------090407060106070609080306 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi, (Jason, the bits I'd like you to weigh in on are a few paragraphs down.) Enclosed is a patch pushing the roff_getstr functionality directly into libmdoc. It works by testing against roff_getstr() in-band and splicing together a new buffer if necessary. I thought about putting the entire mandoc_special() check in libroff, but don't want to cause yet another scan over the line buffer. check_text() needs to warn against '\t' and '\b' anyway. This is an open question I'll answer later when I start looking at performance. The reason I want to air it with you (I know it works: I've tested it across all manuals) is because it also removes the check for isprint(), using strcspn() instead. As you can see, the rej filter is only for '\b', which we must prohibit else we boff output encoding; '\t' for non-literals (warning); and '\\' for the specials check. I argue for lifting the ASCII-constraint because (1) there's nothing in mdoc/groff/etc that disallows non-ASCII (e.g., Latin-1) characters and (2) it makes the code much cleaner. Thoughts? Kristaps PS, the patch doesn't mandate '\b': I just caught that now and will fix it later. --------------090407060106070609080306 Content-Type: text/plain; name="patch.txt" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="patch.txt" PyBET05UREVMRVRFLmMKPyBjb25maWcuaAo/IGNvbmZpZy5sb2cKPyBmb28uMQo/IGZvby4x Lmh0bWwKPyBtYW5kb2MKPyBtYW5kb2MuY29yZQo/IG1kb2MuNy5wZGYKPyBwYXRjaC50eHQK PyBzc2guMS5odG1sCj8gdXNlci44Cj8gcmVncmVzcy9tYW5kb2MuY29yZQo/IHJlZ3Jlc3Mv b3V0cHV0CkluZGV4OiBsaWJtYW5kb2MuaAo9PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09ClJDUyBmaWxlOiAvdXNy L3Zob3N0cy9tZG9jbWwuYnNkLmx2L2N2cy9tZG9jbWwvbGlibWFuZG9jLmgsdgpyZXRyaWV2 aW5nIHJldmlzaW9uIDEuOApkaWZmIC11IC1yMS44IGxpYm1hbmRvYy5oCi0tLSBsaWJtYW5k b2MuaAkxOSBKdW4gMjAxMCAyMDo0NjoyNyAtMDAwMAkxLjgKKysrIGxpYm1hbmRvYy5oCTYg SnVsIDIwMTAgMjM6MDY6MDIgLTAwMDAKQEAgLTE5LDcgKzE5LDcgQEAKIAogX19CRUdJTl9E RUNMUwogCi1pbnQJCSBtYW5kb2Nfc3BlY2lhbChjaGFyICopOworaW50CQkgbWFuZG9jX3Nw ZWNpYWwoY2hhciAqLCBjaGFyICoqLCBzaXplX3QgKik7CiB2b2lkCQkqbWFuZG9jX2NhbGxv YyhzaXplX3QsIHNpemVfdCk7CiBjaGFyCQkqbWFuZG9jX3N0cmR1cChjb25zdCBjaGFyICop Owogdm9pZAkJKm1hbmRvY19tYWxsb2Moc2l6ZV90KTsKSW5kZXg6IG1hbl92YWxpZGF0ZS5j Cj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT0KUkNTIGZpbGU6IC91c3Ivdmhvc3RzL21kb2NtbC5ic2QubHYvY3Zz L21kb2NtbC9tYW5fdmFsaWRhdGUuYyx2CnJldHJpZXZpbmcgcmV2aXNpb24gMS40NQpkaWZm IC11IC1yMS40NSBtYW5fdmFsaWRhdGUuYwotLS0gbWFuX3ZhbGlkYXRlLmMJMjggSnVuIDIw MTAgMTQ6Mzk6MTcgLTAwMDAJMS40NQorKysgbWFuX3ZhbGlkYXRlLmMJNiBKdWwgMjAxMCAy MzowNjowMiAtMDAwMApAQCAtMjA0LDE0ICsyMDQsMTUgQEAKIHN0YXRpYyBpbnQKIGNoZWNr X3RleHQoQ0hLQVJHUykgCiB7Ci0JY2hhcgkJKnA7CisJY2hhcgkJKnAsICpzcGVjOworCXNp emVfdAkJIHNwZWNzejsKIAlpbnQJCSBwb3MsIGM7CiAKIAlhc3NlcnQobi0+c3RyaW5nKTsK IAogCWZvciAocCA9IG4tPnN0cmluZywgcG9zID0gbi0+cG9zICsgMTsgKnA7IHArKywgcG9z KyspIHsKIAkJaWYgKCdcXCcgPT0gKnApIHsKLQkJCWMgPSBtYW5kb2Nfc3BlY2lhbChwKTsK KwkJCWMgPSBtYW5kb2Nfc3BlY2lhbChwLCAmc3BlYywgJnNwZWNzeik7CiAJCQlpZiAoYykg ewogCQkJCXAgKz0gYyAtIDE7CiAJCQkJcG9zICs9IGMgLSAxOwpJbmRleDogbWFuZG9jLmMK PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PQpSQ1MgZmlsZTogL3Vzci92aG9zdHMvbWRvY21sLmJzZC5sdi9jdnMv bWRvY21sL21hbmRvYy5jLHYKcmV0cmlldmluZyByZXZpc2lvbiAxLjIxCmRpZmYgLXUgLXIx LjIxIG1hbmRvYy5jCi0tLSBtYW5kb2MuYwk2IEp1bCAyMDEwIDIyOjA0OjMxIC0wMDAwCTEu MjEKKysrIG1hbmRvYy5jCTYgSnVsIDIwMTAgMjM6MDY6MDIgLTAwMDAKQEAgLTEsNCArMSw0 IEBACi0vKgkkSWQ6IG1hbmRvYy5jLHYgMS4yMSAyMDEwLzA3LzA2IDIyOjA0OjMxIGtyaXN0 YXBzIEV4cCAkICovCisvKgkkSWQ6IGxpYm1hbmRvYy5jLHYgMS4xIDIwMTAvMDcvMDUgMjA6 MDA6NTUga3Jpc3RhcHMgRXhwICQgKi8KIC8qCiAgKiBDb3B5cmlnaHQgKGMpIDIwMDgsIDIw MDkgS3Jpc3RhcHMgRHpvbnNvbnMgPGtyaXN0YXBzQGJzZC5sdj4KICAqCkBAIC01Miw3ICs1 Miw3IEBACiAKIAogaW50Ci1tYW5kb2Nfc3BlY2lhbChjaGFyICpwKQorbWFuZG9jX3NwZWNp YWwoY2hhciAqcCwgY2hhciAqKnYsIHNpemVfdCAqdnN6KQogewogCWludAkJIHRlcm1pbmF0 b3I7CS8qIFRlcm1pbmF0b3IgZm9yIFxzLiAqLwogCWludAkJIGxpbTsJCS8qIExpbWl0IGZv ciBOIGluIFxzLiAqLwpAQCAtNjAsNiArNjAsOCBAQAogCWNoYXIJCSpzdjsKIAkKIAlzdiA9 IHA7CisJKnYgPSBOVUxMOworCSp2c3ogPSAwOwogCiAJaWYgKCdcXCcgIT0gKnArKykKIAkJ cmV0dXJuKHNwZWNfbm9ybShzdiwgMCkpOwpAQCAtMTgxLDggKzE4MywxMiBAQAogCWNhc2Ug KCcqJyk6CiAJCWlmICgnXDAnID09ICorK3AgfHwgaXNzcGFjZSgodV9jaGFyKSpwKSkKIAkJ CXJldHVybihzcGVjX25vcm0oc3YsIDApKTsKKwkJKnYgPSBwICsgMTsKIAkJc3dpdGNoICgq cCkgewogCQljYXNlICgnKCcpOgorCQkJKnZzeiA9IDI7CisJCQlpZiAoJ1wwJyA9PSAqKytw IHx8IGlzc3BhY2UoKHVfY2hhcikqcCkpCisJCQkJcmV0dXJuKHNwZWNfbm9ybShzdiwgMCkp OwogCQkJaWYgKCdcMCcgPT0gKisrcCB8fCBpc3NwYWNlKCh1X2NoYXIpKnApKQogCQkJCXJl dHVybihzcGVjX25vcm0oc3YsIDApKTsKIAkJCXJldHVybihzcGVjX25vcm0oc3YsIDQpKTsK QEAgLTE5MCwxMCArMTk2LDEyIEBACiAJCQlmb3IgKGMgPSAzLCBwKys7ICpwICYmICddJyAh PSAqcDsgcCsrLCBjKyspCiAJCQkJaWYgKGlzc3BhY2UoKHVfY2hhcikqcCkpCiAJCQkJCWJy ZWFrOworCQkJKnZzeiA9IChzaXplX3QpYyAtIDM7CiAJCQlyZXR1cm4oc3BlY19ub3JtKHN2 LCAqcCA9PSAnXScgPyBjIDogMCkpOwogCQlkZWZhdWx0OgogCQkJYnJlYWs7CiAJCX0KKwkJ KnZzeiA9IDE7CiAJCXJldHVybihzcGVjX25vcm0oc3YsIDMpKTsKIAljYXNlICgnKCcpOgog CQlpZiAoJ1wwJyA9PSAqKytwIHx8IGlzc3BhY2UoKHVfY2hhcikqcCkpCkluZGV4OiBtZG9j X3ZhbGlkYXRlLmMKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PQpSQ1MgZmlsZTogL3Vzci92aG9zdHMvbWRvY21s LmJzZC5sdi9jdnMvbWRvY21sL21kb2NfdmFsaWRhdGUuYyx2CnJldHJpZXZpbmcgcmV2aXNp b24gMS4xMDkKZGlmZiAtdSAtcjEuMTA5IG1kb2NfdmFsaWRhdGUuYwotLS0gbWRvY192YWxp ZGF0ZS5jCTQgSnVsIDIwMTAgMjE6NTk6MzAgLTAwMDAJMS4xMDkKKysrIG1kb2NfdmFsaWRh dGUuYwk2IEp1bCAyMDEwIDIzOjA2OjAyIC0wMDAwCkBAIC00Nyw3ICs0Nyw3IEBACiAKIHN0 YXRpYwlpbnQJIGNoZWNrX3BhcmVudChQUkVfQVJHUywgZW51bSBtZG9jdCwgZW51bSBtZG9j X3R5cGUpOwogc3RhdGljCWludAkgY2hlY2tfc3RkYXJnKFBSRV9BUkdTKTsKLXN0YXRpYwlp bnQJIGNoZWNrX3RleHQoc3RydWN0IG1kb2MgKiwgaW50LCBpbnQsIGNoYXIgKik7CitzdGF0 aWMJaW50CSBjaGVja190ZXh0KHN0cnVjdCBtZG9jICosIGludCwgaW50LCBjaGFyICoqKTsK IHN0YXRpYwlpbnQJIGNoZWNrX2FyZ3Yoc3RydWN0IG1kb2MgKiwgCiAJCQlzdHJ1Y3QgbWRv Y19ub2RlICosIHN0cnVjdCBtZG9jX2FyZ3YgKik7CiBzdGF0aWMJaW50CSBjaGVja19hcmdz KHN0cnVjdCBtZG9jICosIHN0cnVjdCBtZG9jX25vZGUgKik7CkBAIC0yNzUsMTMgKzI3NSwx MSBAQAogewogCXZfcHJlCQkqcDsKIAlpbnQJCSBsaW5lLCBwb3M7Ci0JY2hhcgkJKnRwOwog CiAJaWYgKE1ET0NfVEVYVCA9PSBuLT50eXBlKSB7Ci0JCXRwID0gbi0+c3RyaW5nOwogCQls aW5lID0gbi0+bGluZTsKIAkJcG9zID0gbi0+cG9zOwotCQlyZXR1cm4oY2hlY2tfdGV4dCht ZG9jLCBsaW5lLCBwb3MsIHRwKSk7CisJCXJldHVybihjaGVja190ZXh0KG1kb2MsIGxpbmUs IHBvcywgJm4tPnN0cmluZykpOwogCX0KIAogCWlmICggISBjaGVja19hcmdzKG1kb2MsIG4p KQpAQCAtNDM5LDcgKzQzNyw3IEBACiAJaW50CQkgaTsKIAogCWZvciAoaSA9IDA7IGkgPCAo aW50KXYtPnN6OyBpKyspCi0JCWlmICggISBjaGVja190ZXh0KG0sIHYtPmxpbmUsIHYtPnBv cywgdi0+dmFsdWVbaV0pKQorCQlpZiAoICEgY2hlY2tfdGV4dChtLCB2LT5saW5lLCB2LT5w b3MsICZ2LT52YWx1ZVtpXSkpCiAJCQlyZXR1cm4oMCk7CiAKIAlpZiAoTURPQ19TdGQgPT0g di0+YXJnKSB7CkBAIC00NTQsNDMgKzQ1Miw5NSBAQAogCiAKIHN0YXRpYyBpbnQKLWNoZWNr X3RleHQoc3RydWN0IG1kb2MgKm1kb2MsIGludCBsaW5lLCBpbnQgcG9zLCBjaGFyICpwKQor Y2hlY2tfdGV4dChzdHJ1Y3QgbWRvYyAqbSwgaW50IGxuLCBpbnQgcG9zLCBjaGFyICoqcHAp CiB7CiAJaW50CQkgYzsKKwlzaXplX3QJCSBzeiwgc3BlY3N6LCBjcHN6OworCWNoYXIJCSpw LCAqc3BlYywgKmNwOworCWNvbnN0IGNoYXIJKnJlczsKKworCWZvciAocCA9ICpwcDsgKnA7 IHArKywgcG9zKyspIHsKKwkJc3ogPSBzdHJjc3BuKHAsICJcdFxiXFwiKTsKKworCQlwICs9 IChpbnQpc3o7CisKKwkJaWYgKCdcMCcgPT0gKnApCisJCQlicmVhazsKKworCQlwb3MgKz0g KGludClzejsKKworCQkvKgorCQkgKiBGaWx0ZXIgYmFja3NwYWNlIChub3QgYWxsb3dlZCwg YXMgaXQgd2lsbCBzY3JldyB1cAorCQkgKiBvdXIgb3V0cHV0IGZvcm1hdHRpbmcpIGFuZCB0 YWJzLCB3aGljaCBhcmUgb25seQorCQkgKiBzdWdnZXN0ZWQgaW4gbGl0ZXJhbCBjb250ZXh0 cy4gIEFsc28gaGFsdCBhdCBlc2NhcGVzCisJCSAqIHNvIHdlIGNhbiBjaGVjayB0aGF0IHRo ZXkncmUgYWNjZXB0YWJsZS4KKwkJICovCisKKwkJc3dpdGNoICgqcCkgeworCQljYXNlICgn XHQnKToKKwkJCWlmIChNRE9DX0xJVEVSQUwgJiBtLT5mbGFncykKKwkJCQljb250aW51ZTsK KwkJCS8qIEZBTExUSFJPVUdIICovCisJCWNhc2UgKCdcYicpOgorCQkJaWYgKG1kb2NfcG1z ZyhtLCBsbiwgcG9zLCBNQU5ET0NFUlJfQkFEQ0hBUikpCisJCQkJY29udGludWU7CisJCQly ZXR1cm4oMCk7CisJCWRlZmF1bHQ6CisJCQlicmVhazsKKwkJfQorCisJCS8qIENoZWNrIHRo ZSBzcGVjaWFsIGNoYXJhY3Rlci4gKi8KIAotCS8qIAotCSAqIEZJWE1FOiB3ZSBhYnNvbHV0 ZWx5IGNhbm5vdCBsZXQgXGIgZ2V0IHRocm91Z2ggb3IgaXQgd2lsbAotCSAqIGRlc3Ryb3kg c29tZSBhc3N1bXB0aW9ucyBpbiB0ZXJtcyBvZiBmb3JtYXQuCi0JICovCi0KLQlmb3IgKCA7 ICpwOyBwKyssIHBvcysrKSB7Ci0JCWlmICgnXHQnID09ICpwKSB7Ci0JCQlpZiAoICEgKE1E T0NfTElURVJBTCAmIG1kb2MtPmZsYWdzKSkKLQkJCQlpZiAoICEgbWRvY19wbXNnKG1kb2Ms IGxpbmUsIHBvcywgTUFORE9DRVJSX0JBRENIQVIpKQotCQkJCQlyZXR1cm4oMCk7Ci0JCX0g ZWxzZSBpZiAoICEgaXNwcmludCgodV9jaGFyKSpwKSAmJiBBU0NJSV9IWVBIICE9ICpwKQot CQkJaWYgKCAhIG1kb2NfcG1zZyhtZG9jLCBsaW5lLCBwb3MsIE1BTkRPQ0VSUl9CQURDSEFS KSkKKwkJYyA9IG1hbmRvY19zcGVjaWFsKHAsICZzcGVjLCAmc3BlY3N6KTsKKworCQlpZiAo MCA9PSBjKSB7CisJCQljID0gbWRvY19wbXNnKG0sIGxuLCBwb3MsIE1BTkRPQ0VSUl9CQURF U0NBUEUpOworCQkJaWYgKCAhIChNRE9DX0lHTl9FU0NBUEUgJiBtLT5wZmxhZ3MpICYmICEg YykKIAkJCQlyZXR1cm4oMCk7CisJCQljb250aW51ZTsKKwkJfQogCi0JCWlmICgnXFwnICE9 ICpwKQorCQlpZiAoTlVMTCA9PSBzcGVjKSB7CisJCQlwICs9IGMgLSAxOworCQkJcG9zICs9 IGMgLSAxOwogCQkJY29udGludWU7CisJCX0KKworCQkvKiBSZXNlcnZlZCB3b3JkLiAgV2Fz IGl0IGRlZmluZWQgdXNpbmcgYGRzJz8gKi8KIAotCQljID0gbWFuZG9jX3NwZWNpYWwocCk7 Ci0JCWlmIChjKSB7CisJCWlmIChOVUxMID09IChyZXMgPSByb2ZmX2dldHN0cm4oc3BlYywg c3BlY3N6KSkpIHsKKwkJCWMgPSBtZG9jX3Btc2cobSwgbG4sIHBvcywgTUFORE9DRVJSX0JB REVTQ0FQRSk7CisJCQlpZiAoICEgKE1ET0NfSUdOX0VTQ0FQRSAmIG0tPnBmbGFncykgJiYg ISBjKQorCQkJCXJldHVybigwKTsKIAkJCXAgKz0gYyAtIDE7CiAJCQlwb3MgKz0gYyAtIDE7 CiAJCQljb250aW51ZTsKIAkJfQogCi0JCWMgPSBtZG9jX3Btc2cobWRvYywgbGluZSwgcG9z LCBNQU5ET0NFUlJfQkFERVNDQVBFKTsKLQkJaWYgKCAhIChNRE9DX0lHTl9FU0NBUEUgJiBt ZG9jLT5wZmxhZ3MpICYmICEgYykKLQkJCXJldHVybihjKTsKKwkJLyogUmVwbGFjZSB0aGUg cm9mZi1kZWZpbmVkIHN0cmluZyB3aXRoIG91ciBvd24uICovCisKKwkJY3BzeiA9IHN0cmxl bihyZXMpICsgc3RybGVuKCpwcCkgKyAxOworCQljcCA9IG1hbmRvY19tYWxsb2MoY3Bzeik7 CisJCSpjcCA9ICdcMCc7CisKKwkJLyogRm9yY2Ugb25seSBwIC0gKnBwICsgJ1wwJyBjaGFy cy4gKi8KKwkJc3RybGNhdChjcCwgKnBwLCAoc2l6ZV90KShwIC0gKnBwICsgMSkpOworCQlz dHJsY2F0KGNwLCByZXMsIGNwc3opOworCQlzdHJsY2F0KGNwLCBwICsgYyArIDEsIGNwc3op OworCisJCWNwc3ogPSAoc2l6ZV90KShwIC0gKnBwKTsKKworCQlmcmVlKCpwcCk7CisJCSpw cCA9IGNwOworCisJCS8qIFJlbWVtYmVyIHRvIHJlYWRqdXN0IG91ciBwb3NpdGlvbi4gKi8K KworCQlwID0gKnBwICsgKGludCljcHN6IC0gMTsKKwkJcG9zID0gKGludCljcHN6IC0gMTsK IAl9CiAKIAlyZXR1cm4oMSk7CiB9Ci0KLQogCiAKIHN0YXRpYyBpbnQKSW5kZXg6IHRlcm0u Ywo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09ClJDUyBmaWxlOiAvdXNyL3Zob3N0cy9tZG9jbWwuYnNkLmx2L2N2 cy9tZG9jbWwvdGVybS5jLHYKcmV0cmlldmluZyByZXZpc2lvbiAxLjE1OQpkaWZmIC11IC1y MS4xNTkgdGVybS5jCi0tLSB0ZXJtLmMJNCBKdWwgMjAxMCAyMjowNDowNCAtMDAwMAkxLjE1 OQorKysgdGVybS5jCTYgSnVsIDIwMTAgMjM6MDY6MDIgLTAwMDAKQEAgLTM3OSwxMSArMzc5 LDYgQEAKIAlzaXplX3QJCSBzejsKIAogCXJocyA9IGNoYXJzX2EycmVzKHAtPnN5bXRhYiwg d29yZCwgbGVuLCAmc3opOwotCWlmIChOVUxMID09IHJocykgewotCQlyaHMgPSByb2ZmX2dl dHN0cm4od29yZCwgbGVuKTsKLQkJaWYgKHJocykKLQkJCXN6ID0gc3RybGVuKHJocyk7Ci0J fQogCWlmIChyaHMpCiAJCWVuY29kZShwLCByaHMsIHN6KTsKIH0K --------------090407060106070609080306-- -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv