From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id q1GAIuiL022844 for ; Thu, 16 Feb 2012 11:19:03 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AugGAOXXPE8+3JIEgWdsb2JhbABDgxKcQ5EVIgEBFiYngXIBAQQBJ0cQCwslIUUSK4dtAwa1aoweCwQFBQECAQgEBBIGAwKEHgcCEgUFBA2COmMEmxONDA X-IronPort-AV: E=Sophos;i="4.73,428,1325458800"; d="scan'208";a="144528446" Received: from a.mx.strauss-acoustics.ch ([62.220.146.4]) by mail1-smtp-roc.national.inria.fr with ESMTP; 16 Feb 2012 11:18:59 +0100 Received: from [192.168.1.102] (80-218-101-216.dclient.hispeed.ch [80.218.101.216]) by a.mx.strauss-acoustics.ch (Postfix) with ESMTPSA id D8C3569C5E9 for ; Thu, 16 Feb 2012 11:18:58 +0100 (CET) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1084) From: Philippe Strauss In-Reply-To: <32FE3555-556A-43AC-8B1B-9A4AF08DBA02@strauss-acoustics.ch> Date: Thu, 16 Feb 2012 11:18:58 +0100 Message-Id: <25EC3048-818F-49DA-9721-86C83BE7E18F@strauss-acoustics.ch> References: <32FE3555-556A-43AC-8B1B-9A4AF08DBA02@strauss-acoustics.ch> To: caml-list@inria.fr X-Mailer: Apple Mail (2.1084) Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by walapai.inria.fr id q1GAIuiL022844 Subject: Re: [Caml-list] ocaml-pcre and UTF-8 Oh found something on the PLEAC and man pcrepattern : let res_w = "^([\\p{Latin}\\-]+)$" Le 16 févr. 2012 à 10:29, Philippe Strauss a écrit : > Hello caml'ers, > > How do I convince PCRE to be UTF-8 friendly? example: > > -- > open Pcre > > external show : 'a -> string = "%show" > > let recomp = regexp ~flags:[`UTF8; `CASELESS] > > let res_w = "(*UTF8)^(\w+)$" > let rec_w = recomp res_w > > let accents = ["blurb"; "toxicité"; "velléités"; "à"; "où"; "über"; "marie-jeanne"] > > let () = > Printf.printf "config_utf8=%b\n" Pcre.config_utf8 ; > List.iter (fun word -> > try > let sub = Pcre.extract_opt ~full_match:false ~rex:rec_w word in > print_endline (show sub) > with Not_found -> Printf.eprintf "Not_found was raised on \"%s\" :-(\n%!" word > ) accents > -- > > at least on my setup it gives: > > -- > philou@air:~/mysrc/web/myco$ ./pcre_utf > config_utf8=true > [|Some ("blurb")|] > Not_found was raised on "toxicité" :-( > Not_found was raised on "velléités" :-( > Not_found was raised on "à" :-( > Not_found was raised on "où" :-( > Not_found was raised on "über" :-( > Not_found was raised on "marie-jeanne" :-( > -- > > A php user which happen to be a nice guy got it working on PHP :-( how lame. > > -- > Philippe Strauss > http://www.strauss-acoustics.ch/ > > > > > > > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- Philippe Strauss http://www.strauss-acoustics.ch/