From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 7247C7FA13 for ; Tue, 8 Jul 2014 21:28:19 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of raould@gmail.com) identity=pra; client-ip=209.85.219.54; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="raould@gmail.com"; x-sender="raould@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of raould@gmail.com designates 209.85.219.54 as permitted sender) identity=mailfrom; client-ip=209.85.219.54; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="raould@gmail.com"; x-sender="raould@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-oa0-f54.google.com) identity=helo; client-ip=209.85.219.54; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="raould@gmail.com"; x-sender="postmaster@mail-oa0-f54.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjgBANdFvFPRVds2lGdsb2JhbABZg2Bagm+rfJAhh0EBgRAIFg8BAQEBBwsLCRIqhAMBAQEEEhEdARsSDAMMBgMCCw0CAgkdAgIhAQERAQUBChIGExIQiAsBAxENkiWQFmqLJ4FygxCNZQoZJwMKZIUZEQEFDoEei2yBdzqCd4FMBYRpBYVjjiWCAIFIjDCEIxgphRMd X-IPAS-Result: AjgBANdFvFPRVds2lGdsb2JhbABZg2Bagm+rfJAhh0EBgRAIFg8BAQEBBwsLCRIqhAMBAQEEEhEdARsSDAMMBgMCCw0CAgkdAgIhAQERAQUBChIGExIQiAsBAxENkiWQFmqLJ4FygxCNZQoZJwMKZIUZEQEFDoEei2yBdzqCd4FMBYRpBYVjjiWCAIFIjDCEIxgphRMd X-IronPort-AV: E=Sophos;i="5.01,626,1400018400"; d="scan'208";a="84208770" Received: from mail-oa0-f54.google.com ([209.85.219.54]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 08 Jul 2014 21:28:18 +0200 Received: by mail-oa0-f54.google.com with SMTP id eb12so7023377oac.27 for ; Tue, 08 Jul 2014 12:28:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=Xmp8gU470AgK4um8PJMrX6xhUDNiydBkhk4FfgT22lY=; b=Bz/N0TOkNW8Z5WPHylopnv8orqOwOrtziIi5xFlU4Q3h5whn+VCJfw8bxz9OAhrh5P WPkW0qEdywrkb1cLOmvmN/GCCQI2oy2kEJMySIWgqOLzQJvOzOfA1AAwqpf+svZzgEVO f5v0h1/nmRzn1adgbmhvyV8m+B9ztpqxd07939phCuC9IZnZZy3nqy23j7bkcGMuDQaf g7jO6c+vzII9mLVXByzWEewxSCDJRxu6bMHUUbEsv8wapadbRo5y6Z1KGB8WthiuRAKU muB8bwG04Nw1t4Ovu2ildk2LwAiLF2TvWj4u/QL8h74O0vwRqCaxHSXaxesSthVnDK8e o/Ig== X-Received: by 10.182.96.71 with SMTP id dq7mr41480403obb.42.1404847697416; Tue, 08 Jul 2014 12:28:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.76.134 with HTTP; Tue, 8 Jul 2014 12:27:57 -0700 (PDT) In-Reply-To: References: From: Raoul Duke Date: Tue, 8 Jul 2014 12:27:57 -0700 Message-ID: To: OCaml Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Caml-list] Immutable strings ja wohl, n'est pas, =D1=8D=D1=82=D0=BE =D0=B6=D0=B8=D0=B7=D0=BD=D1=8C, base= d on my experiences with strings and stuff over the years, i resonate with what Daniel posted. :-) things like UTF-whatever are baseline requirements, but beyond that (a) nobody has it right (b) unicode sucks. :-) On Tue, Jul 8, 2014 at 12:24 PM, Daniel B=C3=BCnzli wrote: > Le mardi, 8 juillet 2014 =C3=A0 19:15, mattiasw@gmail.com a =C3=A9crit : >> My two cents: >> >> To me it seems very strange to introduce a new string type and not make = it >> UTF-8 from start. > > No new string type was introduced. A bytes type was introduced. > >> ocaml will be that last language that doesn't have standardize unicode >> support. > > What do you mean by standarized unicode support in the language *exactly*= ? > > I'd be genuinely interested in knowing the actual real level of support f= or Unicode in these language, beyond saying our string is an UTF-X encoded = sequence of scalar values. For example do these other language do perform U= nicode normalisation on string literals/patterns (and identifiers if they c= hoose that craze) ? This for example would be absolutely necessary to have = for performing any kind of real world processing on unicode strings, but th= en there's not only a single normalisation form and the one you want depend= s on the context. Do they have a notation to indicate in which form they wa= nt the literal/pattern to be ? > >> Even old languages like Erlang has gone the UTF-8 way, and that >> includes program code. > > For a very very very very long time it has been possible to write, unnorm= alized or normalized according to the normal form your editor, UTF-8 encode= d literals in your OCaml sources; you just had to drop the idea of using la= tin1 identifiers, which are now anyway deprecated since 4.01. > > As for being able to write Unicode *identifiers* in the language I'm actu= ally quite glad OCaml hasn't that, there are both too many arrow characters= to use in Unicode and too many unreasonable programmers out there. > >> Bytes and strings have nothing in common, but str.[4] is still relevant = for >> UTF-8 strings. > > Direct indexing is rarely relevant in Unicode as usually you want those i= ndexes to correspond to user perceived characters (e.g. to align things in = text formatting) and user perceived characters may be written as a sequence= of unicode scalar value=E2=80=A6 or not (even in normal forms, since an ar= bitrary number of combining character can be applied to a base character). = The unicode segmentation algorithm allows you to find these boundaries, sim= ple indexing doesn't and is mostly worthless in Unicode processing. > > Best, > > Daniel > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs