From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11845 Path: news.gmane.org!.POSTED!not-for-mail From: Colin Watson Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: man-db 2.7.6.1: Test failures under musl libc Date: Sat, 26 Aug 2017 16:13:23 +0100 Message-ID: <20170826151323.llqvlpwqkiv4lmhp@riva.ucam.org> References: <5992359F.5010801@adelielinux.org> <20170826120425.jlqfcwjuvvb7l3km@riva.ucam.org> <20170826132808.GX1627@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1503760438 16919 195.159.176.226 (26 Aug 2017 15:13:58 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 26 Aug 2017 15:13:58 +0000 (UTC) User-Agent: NeoMutt/20170113 (1.7.2) Cc: "A. Wilcox" , musl@lists.openwall.com, man-db-devel@nongnu.org To: Rich Felker Original-X-From: musl-return-11858-gllmg-musl=m.gmane.org@lists.openwall.com Sat Aug 26 17:13:53 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dlcmS-0003ti-Ev for gllmg-musl@m.gmane.org; Sat, 26 Aug 2017 17:13:48 +0200 Original-Received: (qmail 23692 invoked by uid 550); 26 Aug 2017 15:13:52 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 23671 invoked from network); 26 Aug 2017 15:13:51 -0000 Content-Disposition: inline In-Reply-To: <20170826132808.GX1627@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:11845 Archived-At: On Sat, Aug 26, 2017 at 09:28:08AM -0400, Rich Felker wrote: > On Sat, Aug 26, 2017 at 01:04:26PM +0100, Colin Watson wrote: > > man-db can't reasonably do without //IGNORE, certainly not if you want > > reliability. Can you try building man-db with GNU libiconv? The build > > system uses AM_ICONV already, so should have enough options to let you > > do this. > > > > (I'd take a patch to the build system to have it detect this situation > > and emit an error earlier if //IGNORE isn't available.) > > Can you explain? This seems wrong; maybe I misunderstand //IGNORE but > I can't come up with any plausible scenario where a conversion with > //IGNORE would produce usable output. No, it definitely did help in some cases. Here's the NEWS entry from when I added that: o apropos, lexgrog, man, mandb, and whatis ignore encoding conversion errors for the last possible encoding of the source page. This helps, for example, with pages including misencoded non-ASCII names of authors; it usually seems better to allow these pages to pass with small errors than to break them entirely. That was nine years ago so I no longer have specific examples to hand, but that's the sort of thing my past self wouldn't have bothered doing without having run into it in practice. :-) I seem to remember the case of non-ASCII authors' names in otherwise-ASCII pages being quite common, and especially back then the toolchain wasn't always happy to accept UTF-8 at every stage in every environment. (This is all after manconv has made its best guess as to the input encoding using stricter checks; the choice at this point is normally between mostly-correct output or an error. For many programs I agree that an error would be more appropriate, but for a program whose job is to display documentation I prefer to make a best effort to do so.) This is actually a bit less critical than I remembered. I still think it's worthwhile in general, but I'd also take a patch to use //IGNORE only when an iconv implementation that supports it is in use. > Also please be aware that the encoding on a system using musl is > always UTF-8 (musl only supports UTF-8 locales), so conversion of > man pages to another locale that can't represent their contents is > out-of-scope. Well, you also have the C locale which isn't really true UTF-8. But anyway, as noted above, the use of //IGNORE here is not intended for the case where we are totally unable to represent any of the contents, but rather for the case of small unrepresentable sections. -- Colin Watson [cjwatson@debian.org]