From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id D820EBBCA for ; Mon, 12 May 2008 16:21:28 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AuIBAKjvJ0jUnw7Vb2dsb2JhbACCMY9dAQwFAgQHEwOYHQ X-IronPort-AV: E=Sophos;i="4.27,473,1204498800"; d="scan'208";a="26080893" Received: from ptb-relay02.plus.net ([212.159.14.213]) by mail4-smtp-sop.national.inria.fr with ESMTP; 12 May 2008 16:21:15 +0200 Received: from [80.229.56.224] (helo=beast.local) by ptb-relay02.plus.net with esmtp (Exim) id 1JvYtr-0005XA-Lc for caml-list@yquem.inria.fr; Mon, 12 May 2008 15:21:11 +0100 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Re: Why OCaml sucks Date: Mon, 12 May 2008 15:16:13 +0100 User-Agent: KMail/1.9.9 References: <200805090139.54870.jon@ffconsultancy.com> <200805120854.45499.ober.14@osu.edu> In-Reply-To: <200805120854.45499.ober.14@osu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200805121516.13983.jon@ffconsultancy.com> X-Plusnet-Relay: 5a080f2b6caa80d2a98d127485711708 X-Spam: no; 0.00; ocaml:01 encodings:01 runtime:01 lexer:01 indirection:01 byte:01 frog:98 imho:01 wrote:01 caml-list:01 strings:01 bytes:03 dispatch:03 languages:03 character:04 On Monday 12 May 2008 13:54:45 Kuba Ober wrote: > > 5. Strings: pushing unicode throughout a general purpose language is a > > mistake, IMHO. This is why languages like Java and C# are so slow. > > Unicode by itself, when wider-than-byte encodings are used, adds "zero" > runtime overhead; the only overhead is storage (2 or 4 bytes per > character). You cannot degrade memory consumption without also degrading performance. Moreover, there are hidden costs such as the added complexity in a lexer which potentially has 256x larger dispatch tables or an extra indirection for every byte read. > Given that storage is cheap, I'd much rather have Unicode support than lack > of it. Sure. I don't mind unicode being available. I just don't want to have to use it myself because it is of no benefit to me (or many other people) but is a significant cost. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?e