caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] sedlex = ulex without camlp4
@ 2013-01-18 14:24 Alain Frisch
  2013-01-18 15:32 ` Daniel Bünzli
  0 siblings, 1 reply; 3+ messages in thread
From: Alain Frisch @ 2013-01-18 14:24 UTC (permalink / raw)
  To: caml-list

Dear all,

I'd like to announce the first public release of sedlex, a 
Unicode-friendly lexer-generator.  It is the successor of ulex, which 
was implemented as a Camlp4 syntax extension.  sedlex is now based on 
the new -ppx feature and consequently requires a very recent development 
version of OCaml.

Homepage: http://www.lexifi.com/sedlex

sedlex is available as an OPAM package, to be installed after switching 
to 4.01.0dev+trunk ("opam switch 4.01.0dev+trunk").

Alain

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] sedlex = ulex without camlp4
  2013-01-18 14:24 [Caml-list] sedlex = ulex without camlp4 Alain Frisch
@ 2013-01-18 15:32 ` Daniel Bünzli
  2013-01-18 15:38   ` Alain Frisch
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Bünzli @ 2013-01-18 15:32 UTC (permalink / raw)
  To: Alain Frisch; +Cc: caml-list

Hello Alain,  

I rapidly went through your documentation. 

If your UTF-8 and UTF-16 decoders are conformant, your module, on output, doesn't generate Unicode code points, but Unicode scalar values (code points minus the UTF-16 surrogates [1]). If that is the case it would be nice to state this invariant explicitely in the documentation.

This allows to directly pass the data generated by sedlex to modules like Uunf without further checks as those values belong to the Uunf.uchar type [2].

Best,

Daniel 


[1] http://www.unicode.org/glossary/#unicode_scalar_value
[2] http://erratique.ch/software/uunf/doc/Uunf#TYPEuchar



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] sedlex = ulex without camlp4
  2013-01-18 15:32 ` Daniel Bünzli
@ 2013-01-18 15:38   ` Alain Frisch
  0 siblings, 0 replies; 3+ messages in thread
From: Alain Frisch @ 2013-01-18 15:38 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list

I have to admit that I don't know much about Unicode and surrogates 
(moreover, support for utf-16 was contributed by someone else).  I'll 
happily update the documentation if someone looks at the source code and 
tells me that the property you mention indeed holds.

-- Alain


On 01/18/2013 04:32 PM, Daniel Bünzli wrote:
> Hello Alain,
>
> I rapidly went through your documentation.
>
> If your UTF-8 and UTF-16 decoders are conformant, your module, on output, doesn't generate Unicode code points, but Unicode scalar values (code points minus the UTF-16 surrogates [1]). If that is the case it would be nice to state this invariant explicitely in the documentation.
>
> This allows to directly pass the data generated by sedlex to modules like Uunf without further checks as those values belong to the Uunf.uchar type [2].
>
> Best,
>
> Daniel
>
>
> [1] http://www.unicode.org/glossary/#unicode_scalar_value
> [2] http://erratique.ch/software/uunf/doc/Uunf#TYPEuchar
>
>
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-01-18 15:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-18 14:24 [Caml-list] sedlex = ulex without camlp4 Alain Frisch
2013-01-18 15:32 ` Daniel Bünzli
2013-01-18 15:38   ` Alain Frisch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).