From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <140e7ec30710092233t313a8e66qc99b57674c9f1e30@mail.gmail.com> Date: Wed, 10 Oct 2007 13:33:10 +0800 From: sqweek To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> Subject: Re: [9fans] simplicity In-Reply-To: <5d375e920709180838t4070c23al11bc0eb5cc7280c9@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <8ccc8ba40709161155t356da3dcvc9735a2fe4f42a03@mail.gmail.com> <88ec1a25417025b5f86c7cdf76b249ff@quanstro.net> <46EE9A41.7DD78E60@null.net> <7359f0490709180827h6978ae52re27825646a091ec8@mail.gmail.com> <5d375e920709180838t4070c23al11bc0eb5cc7280c9@mail.gmail.com> Topicbox-Message-UUID: cd8c028c-ead2-11e9-9d60-3106f5b1d025 On 9/18/07, Uriel wrote: > Don't complain, at least it is not producing random behaviour, I have > seen versions of gnu awk that when feed plain ASCII input, if the > locale was UTF-8, rules would match random lines of input, the fix? > set the locale to 'C' at the top of all your scripts (and don't even > think of dealing with files which actually contain non-ASCII UTF-8). > > This was some years ago, it might be fixed by now, but it demonstrates > how the locale insanity makes life so much more fun.- Heh, funny that this thread got revived the very day that my colleague's backup script choked because he was running in a utf8 locale and hit a filename encoded in iso8859-1. Apparently GNU sed's . stops matching when it hits an invalid bytestream (which is not entirely unreasonable I guess). -sqweek