From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA02553; Tue, 4 Mar 2003 14:49:57 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA01786 for ; Tue, 4 Mar 2003 14:49:55 +0100 (MET) Received: from TheWorld.com (pcls2.std.com [199.172.62.104]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id h24DnsH19189 for ; Tue, 4 Mar 2003 14:49:54 +0100 (MET) Received: from watergate.world.std.com (pool-141-149-177-244.bos.east.verizon.net [141.149.177.244]) by TheWorld.com (8.9.3/8.9.3) with ESMTP id IAA08101 for ; Tue, 4 Mar 2003 08:49:52 -0500 Message-Id: <5.2.0.9.0.20030304080202.03220958@pop.theWorld.com> X-Sender: chase@pop.theWorld.com X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Tue, 04 Mar 2003 08:49:52 -0500 To: caml-list@inria.fr From: David Chase Subject: Re: [Caml-list] [RANT] String representation (was: Strings as arrays or lists...) In-Reply-To: References: <20030303221022.GA24499@clipper.ens.fr> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Spam: no; 0.00; caml-list:01 rant:01 pons:01 bugged:01 java's:01 idioms:01 threads:01 buffer:01 noticeably:01 slower:01 recognized:99 infer:01 arrays:01 compiler:01 fernandez:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk At 01:43 PM 3/4/2003 +0100, Diego Olivier Fernandez Pons wrote: > Bonjour, > >Some of the features you wish are 'not so hard to implement', at least >if you already have 'conceptually bugged low-level' strings : I hope you aren't proposing that everyone implement the "String" data structure that they wish they really had. Strings are too basic, and too common, to be left to this. Consider Java's approach of String and StringBuffer. My main gripe (after years of use) is that too many functions should take a StringBuffer (which should probably be an abstract type, though there is indeed some loss of efficiency) and return a StringBuffer, rather than simply returning a String. From an optimization point-of-view, the most useful thing for performance would be to find a way to have an abstract type with non-abstract synchronization. All StringBuffer ops are synchronized, and probably should be, but a compiler and idioms can help IF THE SYNCHRONIZATION IS EXPOSED. That is, in Java, this code StringBuffer sb ... synchronized (sb) { // I know that sb is thread-private while (...) { sb.append(...); } } is likely to run faster than it would without the outer synchronization because the JIT compiler can see that the (non-abstract, exposed, final) StringBuffer synchronizations are redundant. On-the-other-hand, multithreaded programming appears to be so hard for the typical and even good-typical programmer, that I'd be interested in seeing if there were better ways to do it (say, threads communicating only through queues). >> - Arrays need fast random access by numeric index, strings do not >> need that. > >You can easily hava a (log n/k + 1) acces where n is the total size of >the string and k is the size of each bucket (if you choose a data >structure with constant buffer size) That will be noticeably slower. In the Java compiler that I worked on, the String operations were specially recognized by the compiler so that it could infer success of string bounds checks, and that "minor" optimization was noticeably beneficial. On the other hand, with some cleverness and a biggish K, you might be able to get most of the benefit simply from the "bounds check" (as in, small strings fit in their first and only bucket, illegal indices necessarily also overflow off the end of the first bucket). David Chase ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners