From mboxrd@z Thu Jan 1 00:00:00 1970 From: random832@fastmail.com (Random832) Date: Sun, 27 Mar 2016 21:20:32 -0400 Subject: [TUHS] Character sets In-Reply-To: <20160327233049.GA11617@mercury.ccil.org> References: <56F7B14A.7040201@update.uu.se> <20160327112920.GH12921@mercury.ccil.org> <56F7C85F.6060503@update.uu.se> <20160327214943.GS3766@eureka.lemis.com> <56F8565C.3080704@update.uu.se> <20160327233049.GA11617@mercury.ccil.org> Message-ID: <1459128032.3939182.561077874.021FA249@webmail.messagingengine.com> On Sun, Mar 27, 2016, at 19:30, John Cowan wrote: > > > while (*c && *c++ != " "); > > That particular piece of code still works if the encoding is UTF-8. Sure it does, but replace that != " " with !isblank(*c), and it doesn't work anymore since it ignores multibyte characters. Often you don't care, but you've got to remember to set LC_ALL=C when running grep etc on large data sets or it will be much slower, since \w and \s care about multibyte characters (as does case-insensitive matching, etc).