caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* migrate from ocamllex to ulex
@ 2006-04-18  7:37 Ruslan Kosolapov
  2006-04-18 17:04 ` [Caml-list] " Tom
  2006-04-18 20:20 ` Gerd Stolpmann
  0 siblings, 2 replies; 4+ messages in thread
From: Ruslan Kosolapov @ 2006-04-18  7:37 UTC (permalink / raw)
  To: caml-list


I want to use Polygen (http://polygen.org/web/), but this tool is not
work with UTF-8 (if I try to use UTF-8 symbols in template, error
"illegal character" appear).

As far as I understand problem is ocamllex - if I use UTF-8 symbols in
lexer.mll, ocamllex say to me "illegal character", so, I can't just
modify lexer.mll.

So, I think I should modify Polygen to ulex using.

I have no any OCaml expirience, so such task is hard for me.

I look for code examples or any detailed documentation which show me
how I can migrate from ocamllex to ulex.

Please help :)


PS: I tryed to modify file lexer.ml (such file produced by ocamllex),
but I don't know what exactly I should modify - lexer.ml is not
human-readable.

-- 
Ruslan Kosolapov
Plesk QA Department Second Manager
SWsoft, Inc.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] migrate from ocamllex to ulex
  2006-04-18  7:37 migrate from ocamllex to ulex Ruslan Kosolapov
@ 2006-04-18 17:04 ` Tom
  2006-04-19  3:54   ` Ruslan Kosolapov
  2006-04-18 20:20 ` Gerd Stolpmann
  1 sibling, 1 reply; 4+ messages in thread
From: Tom @ 2006-04-18 17:04 UTC (permalink / raw)
  To: Ruslan Kosolapov; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 147 bytes --]

On 18/04/06, Ruslan Kosolapov <rkosolapov@swsoft.com> wrote:
>
>
> I want to use Polygen (http://polygen.org/web/)


Is there an English site too?

[-- Attachment #2: Type: text/html, Size: 483 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] migrate from ocamllex to ulex
  2006-04-18  7:37 migrate from ocamllex to ulex Ruslan Kosolapov
  2006-04-18 17:04 ` [Caml-list] " Tom
@ 2006-04-18 20:20 ` Gerd Stolpmann
  1 sibling, 0 replies; 4+ messages in thread
From: Gerd Stolpmann @ 2006-04-18 20:20 UTC (permalink / raw)
  To: Ruslan Kosolapov; +Cc: caml-list

Am Dienstag, den 18.04.2006, 14:37 +0700 schrieb Ruslan Kosolapov:
> I want to use Polygen (http://polygen.org/web/), but this tool is not
> work with UTF-8 (if I try to use UTF-8 symbols in template, error
> "illegal character" appear).
> 
> As far as I understand problem is ocamllex - if I use UTF-8 symbols in
> lexer.mll, ocamllex say to me "illegal character", so, I can't just
> modify lexer.mll.

Well, ocamllex just processes bytes. In order to scan UTF-8, just must
create a regular expression that matches the byte representation. I did
this with great success for PXP - but it is absolutely non-trivial.
Better go with ulex.

> So, I think I should modify Polygen to ulex using.
> 
> I have no any OCaml expirience, so such task is hard for me.

Probably.

> I look for code examples or any detailed documentation which show me
> how I can migrate from ocamllex to ulex.

It is not that complicated. The main difference is not that ulex is
Unicode-based, but that ulex is a different kind of preprocessor. That
has consequences for how the preprocessor is invoked, and for the syntax
of the scanner.

ocamllex is a classical preprocessor that produces an intermediate file
which is then compiled. In contrast to that, ulex modifies the grammar
of the O'Caml language such that new constructs can be used. These
constructs are immediately mapped to the built-in elements of the
language, so it is actually a preprocessor, but much better integrated.

In order to run ulex, I strongly recommend to first install findlib
(http://ocaml-programming.de/packages). Then, do
mv lexer.mll lexer.ml - as ulex does not create intermediate files,
there is no need for the .mll extension. Compile with

ocamlfind ocamlc -package ulex -syntax camlp4o <args>

or

ocamlfind ocamlopt -package ulex -syntax camlp4o <args>

for the native-code compiler. <args> are the same arguments as for plain
ocamlc/ocamlopt. When linking the executable, also add the flag -linkpkg
to the compiler invocations.

You can simply use these compiler commands for all .ml and .mli files.

Of course, you must also modify lexer.ml. In principle, transform

{ <header> }

rule <name1> <arg1> <arg2> ... = 
  parse <regexp> { <action> }
      | <regexp> { <action> } ...

{ <trailer> }

to:

<header>

let <name1> <arg1> <arg2> ... =
  lexer <regexp> -> <action>
      | <regexp> -> <action> ...
;;

<trailer>

This is the purely syntactic part of the transformation. Furthermore,
typing is a bit different.

ocamllex uses the helper module Lexing. For example, to get the just
scanned phrase, you can use the function call

Lexing.lexeme lexbuf

within one of the <action>s. lexbuf is the buffer the lexer operates on.
ulex needs another type of buffer, suitable for Unicode. The module
Ulexing provides such a buffer. However, typing is different. The
corresponding call

Ulexing.lexeme lexbuf

returns the phrase, but not as string (O'Caml strings are simply
sequences of 8 bit characters), but as array of integers. Use

Ulexing.utf8_lexeme lexbuf

to get a string of UTF-8 bytes.

You will also see the different typing when you call the generated
lexers. For ocamllex, this is something like:

let lexbuf = Lexing.from_string "Example string" in
<name> lexbuf

(where <name> is the name of a lexer). For ulex, this is

let lexbuf = Ulexing.from_utf8_string "Example string" in
<name> lexbuf

Look into ulexing.mli, you can also read from other sources.

> Please help :)
> 
> 
> PS: I tryed to modify file lexer.ml (such file produced by ocamllex),
> but I don't know what exactly I should modify - lexer.ml is not
> human-readable.

Well, this is a finite automaton expressed as lookup table. After the
NFA to DFA transformation step, it is practically impossible to
understand it.

Gerd

P.S. Maybe this is also interesting for you:
http://www.gerd-stolpmann.de/buero/service_ocaml.html.en

-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] migrate from ocamllex to ulex
  2006-04-18 17:04 ` [Caml-list] " Tom
@ 2006-04-19  3:54   ` Ruslan Kosolapov
  0 siblings, 0 replies; 4+ messages in thread
From: Ruslan Kosolapov @ 2006-04-19  3:54 UTC (permalink / raw)
  To: Tom; +Cc: caml-list


 >> I want to use Polygen (http://polygen.org/web/)
 T> Is there an English site too?

  AFAIK there are Italian site only.

  But http://www.polygen.org/gs/dist/polygen-1.0.6-20040705-doc.zip
  contain English documentation.

-- 
Ruslan Kosolapov
Plesk QA Department Second Manager
SWsoft, Inc.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-04-19  3:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-18  7:37 migrate from ocamllex to ulex Ruslan Kosolapov
2006-04-18 17:04 ` [Caml-list] " Tom
2006-04-19  3:54   ` Ruslan Kosolapov
2006-04-18 20:20 ` Gerd Stolpmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).