From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 17CB97F986 for ; Sun, 29 Jun 2014 00:08:05 +0200 (CEST) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of daniel.buenzli@erratique.ch) identity=pra; client-ip=74.55.86.74; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="daniel.buenzli@erratique.ch"; x-sender="daniel.buenzli@erratique.ch"; x-conformance=sidf_compatible Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of daniel.buenzli@erratique.ch) identity=mailfrom; client-ip=74.55.86.74; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="daniel.buenzli@erratique.ch"; x-sender="daniel.buenzli@erratique.ch"; x-conformance=sidf_compatible Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@smtp.webfaction.com) identity=helo; client-ip=74.55.86.74; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="daniel.buenzli@erratique.ch"; x-sender="postmaster@smtp.webfaction.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhcDAKI7r1NKN1ZKm2dsb2JhbABag1+CJ4Ehw3cPAQEBAQEGCwsJFCiELYELAiYCSRYbAYg5BAmZP48jnTGBK5BaNoEWBZwkhSkXkDpr X-IPAS-Result: AhcDAKI7r1NKN1ZKm2dsb2JhbABag1+CJ4Ehw3cPAQEBAQEGCwsJFCiELYELAiYCSRYbAYg5BAmZP48jnTGBK5BaNoEWBZwkhSkXkDpr X-IronPort-AV: E=Sophos;i="5.01,568,1400018400"; d="scan'208";a="69390223" Received: from mail6.webfaction.com (HELO smtp.webfaction.com) ([74.55.86.74]) by mail3-smtp-sop.national.inria.fr with ESMTP; 29 Jun 2014 00:08:03 +0200 Received: from [172.20.10.2] (188.29.164.60.threembb.co.uk [188.29.164.60]) by smtp.webfaction.com (Postfix) with ESMTP id 001BB228BF7C for ; Sat, 28 Jun 2014 22:08:00 +0000 (UTC) Date: Sat, 28 Jun 2014 23:08:00 +0100 From: =?utf-8?Q?Daniel_B=C3=BCnzli?= To: Caml-list Message-ID: <94E1372FFFCE40BF87C5F09286E6C3FA@erratique.ch> X-Mailer: sparrow 1.6.4 (build 1178) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: [Caml-list] [ANN] Uucp 0.9.0 Hello, I'd like to announce uucp: Uucp is an OCaml library providing efficient access to a selection of character properties of the [Unicode character database][1]. Uucp is independent from any Unicode text data structure and has no dependencies. It is distributed under the BSD3 license. [1]: http://www.unicode.org/reports/tr44/ Home page: http://erratique.ch/software/uucp Documentation: http://erratique.ch/software/uucp/doc/Uucp Github mirror: https://github.com/dbuenzli/uucp It should be available shortly in opam. A few comments about the library can be found at the end of this message. Part of this work was sponsored by OCaml Labs, many thanks for their support. Best, Daniel A few notes. * Having seen unwarranted sights of horror at the simple mention of Unicode by fellow peers I took the time to write an absolute minimal introduction to Unicode in the documentation: http://erratique.ch/software/uucp/doc/Uucp.html#uminimal At the end of this introduction end I also give a few biased tips on how Unicode can be handled in OCaml. * The data used to represent the properties is directly linked in your executables. In the future I will adapt the library to use OCaml 4.02 module aliases so that you only pay for the submodules you access. For now linking against the library on osx 64-bit, result in a 6.4 Mo executable and on linux 32-bit a 3.7 Mo executable. * If you are interested in Unicode caseless matching (equality) of strings or identifiers, the documentation of the Uucp.Case module has sample code on how to perform that using Uutf and Uunf. This code may eventually be gathered in a proper module in the future. http://erratique.ch/software/uucp/doc/Uucp.Case.html#caseexamples * Not all properties are exposed. Obsolete properties, deprecated properties and those that are specific to some unicode processing algorithm (bidi, segmentation, normalization, etc.) are left out. The reason for the latter is that these algorithm may need to devise their own, maybe more efficient, representations. For example the normalization properties are not included, as they are best stored, used and exposed by a module that performs normalization (e.g. Uunf). It is not excluded that I reverse this in the future and make Uunf dependent on Uucp as this could make maintenance easier (though Uunf sometimes uses optimized representations). Also if the data needs to be used by more than one module this may become less wasteful than each module including its own data. * Regarding the last point, the selection is subjective. If there's a property you feel is useful in the wide and missing please tell me (e.g. on github's issue tracker) with a good rationale and I will add it. The full list of omitted properties is available here: http://erratique.ch/software/uucp/doc/Uucp.html#distrib_omit