caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] backslashes in ocamllex
@ 2003-10-06  1:37 Rafael 'Dido' Sevilla
  2003-10-06  7:28 ` Christian Lindig
  2003-10-06  7:53 ` Jean-Christophe Filliatre
  0 siblings, 2 replies; 4+ messages in thread
From: Rafael 'Dido' Sevilla @ 2003-10-06  1:37 UTC (permalink / raw)
  To: caml-list

Now I'm stuck again.  I'm revising the lexical analyzer for my compiler
to enable it to recognize escaped strings, with conventions different
from OCaml's.  Currently, I'm using this regex:

'\'' ("\\\\"|"\\'"|[^'\''])* '\''

in an attempt to recognize strings that begin and end with single
quotes, but may possibly include sequences like \' that represent
escaped quotes, and '\\' that represent escaped backslashes.  Of course,
this doesn't work, as I lately realized, because this string:

'\\' '\\'

looks like I'm escaping the second quote, so I wind up with an empty
token.  Any hints on how I'd go about doing this sort of thing?

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] backslashes in ocamllex
  2003-10-06  1:37 [Caml-list] backslashes in ocamllex Rafael 'Dido' Sevilla
@ 2003-10-06  7:28 ` Christian Lindig
  2003-10-06  8:26   ` Alain.Frisch
  2003-10-06  7:53 ` Jean-Christophe Filliatre
  1 sibling, 1 reply; 4+ messages in thread
From: Christian Lindig @ 2003-10-06  7:28 UTC (permalink / raw)
  To: Rafael 'Dido' Sevilla; +Cc: Caml Mailing List

On Mon, Oct 06, 2003 at 09:37:40AM +0800, Rafael 'Dido' Sevilla wrote:
> Now I'm stuck again.  I'm revising the lexical analyzer for my compiler
> to enable it to recognize escaped strings, with conventions different
> from OCaml's.  Currently, I'm using this regex:
> 
> '\'' ("\\\\"|"\\'"|[^'\''])* '\''
> 
> in an attempt to recognize strings that begin and end with single
> quotes, but may possibly include sequences like \' that represent
> escaped quotes, and '\\' that represent escaped backslashes.  -- 

As you discovered, you cannot recognize strings with a single regular
expression. You need a sub-lexer:

{
let get         = Lexing.lexeme
let getchar     = Lexing.lexeme_char
}

rule token = parse (* main lexer *)
    eof ->
  | ...
  | "'" -> string lexbuf (Buffer.create 80) (* use sub-lexer *)


and string = parse (* lexer for strings *)
    eof     -> { fun buf -> error "EOF in string" } 
  | '\\' _  -> { fun buf -> let c = getchar lexbuf 1 in
                            let k = match c with
                            | 'n'   -> '\n'
                            | 't'   -> '\t'
                            | .... 
                            in
                                ( Buffer.add_char buf k
                                ; string lexbuf buf
                                )
               }                 
  | _       -> { fun buf -> string lexbuf (Buffer.add_string (get lexbuf)
  | "'"     -> { fun buf -> Buffer.contents buf } (* return string *)

-- Christian

--
Christian Lindig         http://www.st.cs.uni-sb.de/~lindig/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] backslashes in ocamllex
  2003-10-06  1:37 [Caml-list] backslashes in ocamllex Rafael 'Dido' Sevilla
  2003-10-06  7:28 ` Christian Lindig
@ 2003-10-06  7:53 ` Jean-Christophe Filliatre
  1 sibling, 0 replies; 4+ messages in thread
From: Jean-Christophe Filliatre @ 2003-10-06  7:53 UTC (permalink / raw)
  To: Rafael 'Dido' Sevilla; +Cc: caml-list


Rafael 'Dido' Sevilla writes:
 > Now I'm stuck again.  I'm revising the lexical analyzer for my compiler
 > to enable it to recognize escaped strings, with conventions different
 > from OCaml's.  Currently, I'm using this regex:
 > 
 > '\'' ("\\\\"|"\\'"|[^'\''])* '\''
 > 
 > in an attempt to recognize strings that begin and end with single
 > quotes, but may possibly include sequences like \' that represent
 > escaped quotes, and '\\' that represent escaped backslashes.  Of course,
 > this doesn't work, as I lately realized, because this string:
 > 
 > '\\' '\\'
 > 
 > looks like I'm escaping the second quote, so I wind up with an empty
 > token.  Any hints on how I'd go about doing this sort of thing?

This should work:

	'\'' ([^'\'' '\\'] | '\\' _)* '\''

i.e.  any backslash  in the  string must  be followed  by  a character
(whatever  its  interpretation  is).   You  can  be  more  precise  if
backslashes must  be followed  by \  or ' and  nothing else.  Then the
regexp is

	'\'' ([^'\'' '\\'] | '\\' ('\\' | '\''))* '\''

(For instance  ocaml strings conform to  the former, but  a warning is
emitted  whenever   the  character  following  \   has  no  particular
interpretation).

-- 
Jean-Christophe

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] backslashes in ocamllex
  2003-10-06  7:28 ` Christian Lindig
@ 2003-10-06  8:26   ` Alain.Frisch
  0 siblings, 0 replies; 4+ messages in thread
From: Alain.Frisch @ 2003-10-06  8:26 UTC (permalink / raw)
  To: Christian Lindig; +Cc: Caml Mailing List

On Mon, 6 Oct 2003, Christian Lindig wrote:

> and string = parse (* lexer for strings *)
>     eof     -> { fun buf -> error "EOF in string" }
>   | '\\' _  -> { fun buf -> let c = getchar lexbuf 1 in
>...
>   | _       -> { fun buf -> string lexbuf (Buffer.add_string (get lexbuf)
>   | "'"     -> { fun buf -> Buffer.contents buf } (* return string *)

Note that the new ocamllex in OCaml 3.07 allows rules to have extra
arguments. It is both more readable and more efficient (no closure is
built for each action) than explicit abstractions in actions.

(Btw the CHANGES file fails to mention this new feature.)


-- Alain

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-10-06  8:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-06  1:37 [Caml-list] backslashes in ocamllex Rafael 'Dido' Sevilla
2003-10-06  7:28 ` Christian Lindig
2003-10-06  8:26   ` Alain.Frisch
2003-10-06  7:53 ` Jean-Christophe Filliatre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).