From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/24803 Path: news.gmane.org!not-for-mail From: Taco Hoekwater Newsgroups: gmane.comp.tex.context Subject: byte order marks in utf-8 (old) Date: Wed, 28 Dec 2005 13:48:24 +0100 Message-ID: <43B28998.5070308@elvenkind.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1135774107 6792 80.91.229.2 (28 Dec 2005 12:48:27 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 28 Dec 2005 12:48:27 +0000 (UTC) Original-X-From: ntg-context-bounces@ntg.nl Wed Dec 28 13:48:25 2005 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by ciao.gmane.org with esmtp (Exim 4.43) id 1Eraj5-0005oT-LV for gctc-ntg-context-518@m.gmane.org; Wed, 28 Dec 2005 13:48:19 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 9AF101280A; Wed, 28 Dec 2005 13:48:18 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 01226-05; Wed, 28 Dec 2005 13:48:13 +0100 (CET) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id DA3AC127E7; Wed, 28 Dec 2005 13:48:13 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 557A1127E7 for ; Wed, 28 Dec 2005 13:48:13 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 01227-03 for ; Wed, 28 Dec 2005 13:48:12 +0100 (CET) Original-Received: from post-24.mail.nl.demon.net (post-24.mail.nl.demon.net [194.159.73.194]) by ronja.ntg.nl (Postfix) with SMTP id 74143127D9 for ; Wed, 28 Dec 2005 13:48:12 +0100 (CET) Original-Received: from boo.demon.nl ([82.161.175.147]:41402 helo=[192.168.1.3]) by post-24.mail.nl.demon.net with esmtp (Exim 4.51) id 1Eraiy-000OBq-2S for ntg-context@ntg.nl; Wed, 28 Dec 2005 12:48:12 +0000 User-Agent: Mozilla Thunderbird 1.0 (X11/20050215) X-Accept-Language: en-us, en Original-To: mailing list for ConTeXt users X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.5 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:24803 Archived-At: Hi, A long time ago, Hans wrote: > Patrick Gundlach wrote: > >> are you sure that your scite does not use utf-16 and puts the BOM (byte >> order mark) there? > > scite indeed does this (kind of annoying) It is the BOM, in utf-8 file encoding. A bit pointless (utf-8 is based on opcodes instead of byte ordering), but is allowed by the unicode specification. > context can handle that for xml files > > i can consider handling it automatically (i.e. when BOM before first start-stop, > then assume utf-8) It is a safe bet that any document that starts with the three bytes 0xEF 0xBB 0xBF is encoded as UTF-8, esp. if it is supposed to be text input. Cheers, Taco