From mboxrd@z Thu Jan 1 00:00:00 1970 From: random832@fastmail.com (Random832) Date: Mon, 28 Mar 2016 01:12:58 -0400 Subject: [TUHS] Character sets In-Reply-To: <20160328015814.GC18027@mercury.ccil.org> References: <56F7B14A.7040201@update.uu.se> <20160327112920.GH12921@mercury.ccil.org> <56F7C85F.6060503@update.uu.se> <20160327214943.GS3766@eureka.lemis.com> <56F8565C.3080704@update.uu.se> <20160327233049.GA11617@mercury.ccil.org> <1459128032.3939182.561077874.021FA249@webmail.messagingengine.com> <20160328015814.GC18027@mercury.ccil.org> Message-ID: <1459141978.2176234.561172346.36763B9D@webmail.messagingengine.com> On Sun, Mar 27, 2016, at 21:58, John Cowan wrote: > Random832 scripsit: > > > Sure it does, but replace that != " " with !isblank(*c), and it doesn't > > work anymore since it ignores multibyte characters. > > In which locales does isblank() actually return true on characters other > than space and tab? (This is a straight question.) See, no, that's a trick question. None of the other blank class characters are single-byte, so of course isblank doesn't. The following characters return true on is*w*blank for me: U+00a0 U+1680 U+2000 U+2001 U+2002 U+2003 U+2004 U+2005 U+2006 U+2007 U+2008 U+2009 U+200a U+200b U+202f U+205f U+3000 (Oddly enough, isblank(0xA0) is true even in the UTF-8 locale, though of course U+00a0 is actually a multibyte character "\xc2\xa0".) So, if what you _want_ is to find the next blank character, doing this loop with isblank won't work. If what you want is to find space or tab, sure. But that's why grep for patterns containing \s are so slow.