From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA06686; Mon, 26 Aug 2002 19:22:47 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA06646 for ; Mon, 26 Aug 2002 19:22:46 +0200 (MET DST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g7QHM3H01031; Mon, 26 Aug 2002 19:22:03 +0200 (MET DST) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA06522; Mon, 26 Aug 2002 19:22:02 +0200 (MET DST) Date: Mon, 26 Aug 2002 19:22:02 +0200 From: Xavier Leroy To: Oleg Cc: caml-list@inria.fr Subject: Re: [Caml-list] internal representation of string Message-ID: <20020826192202.A6666@pauillac.inria.fr> References: <200208261344.JAA23445@dewberry.cc.columbia.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <200208261344.JAA23445@dewberry.cc.columbia.edu>; from oleg_inconnu@myrealbox.com on Mon, Aug 26, 2002 at 09:46:40AM -0400 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > What is the internal representation of string? Is it basically a C-string > [with or without terminating '\0'] plus integer storing its size? Or is it > something more sophisticated? Like all heap blocks, strings contain a header defining the size of the string in machine words. The actual block contents are: - the characters of the string - padding bytes to align the block on a word boundary. The padding is one of 00 00 01 00 00 02 00 00 00 03 on a 32-bit machine, and up to 00 00 .... 07 on a 64-bit machine. Thus, the string is always zero-terminated, and its length can be computed as follows: number_of_words_in_block * sizeof(word) + last_byte_of_block - 1 The null-termination comes handy when passing a string to C, but is not relied upon to compute the length (in Caml), allowing the string to contain nulls. > Also, do functions like String.sub implement > copy-on-write mechanism or do they copy when they are called? They copy when they are called. Caml strings really behave like compactly-represented character arrays. Hope this answers your question, - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners