From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA24549; Mon, 6 Oct 2003 10:10:01 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA03660 for ; Mon, 6 Oct 2003 10:10:00 +0200 (MET DST) Received: from lri.lri.fr (lri.lri.fr [129.175.15.1]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id h968A0109552 for ; Mon, 6 Oct 2003 10:10:00 +0200 (MET DST) Received: from pc8-123 (pc8-123 [129.175.8.123]) by lri.lri.fr (8.12.10/jtpda-5.4) with ESMTP id h967rBUV019243 ; Mon, 6 Oct 2003 09:53:11 +0200 (MEST) Received: from filliatr by pc8-123 with local (Exim 3.35 #1 (Debian)) id 1A6QB5-0003jt-00; Mon, 06 Oct 2003 09:53:11 +0200 From: Jean-Christophe Filliatre MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16257.8039.671703.48266@gargle.gargle.HOWL> Date: Mon, 6 Oct 2003 09:53:11 +0200 To: "Rafael 'Dido' Sevilla" Cc: caml-list@inria.fr Subject: Re: [Caml-list] backslashes in ocamllex In-Reply-To: <20031006013740.GA2149@imperium.ph> References: <20031006013740.GA2149@imperium.ph> X-Mailer: VM 7.03 under Emacs 20.7.2 Reply-To: Jean-Christophe.Filliatre@lri.fr (Jean-Christophe Filliatre) X-MailScanner: Found to be clean X-Loop: caml-list@inria.fr X-Spam: no; 0.00; filliatre:01 filliatre:01 lri:01 caml-list:01 rafael:01 'dido':01 ocaml's:01 regex:01 emitted:01 compiler:01 regexp:01 ocaml:01 token:01 writes:01 string:03 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Rafael 'Dido' Sevilla writes: > Now I'm stuck again. I'm revising the lexical analyzer for my compiler > to enable it to recognize escaped strings, with conventions different > from OCaml's. Currently, I'm using this regex: > > '\'' ("\\\\"|"\\'"|[^'\''])* '\'' > > in an attempt to recognize strings that begin and end with single > quotes, but may possibly include sequences like \' that represent > escaped quotes, and '\\' that represent escaped backslashes. Of course, > this doesn't work, as I lately realized, because this string: > > '\\' '\\' > > looks like I'm escaping the second quote, so I wind up with an empty > token. Any hints on how I'd go about doing this sort of thing? This should work: '\'' ([^'\'' '\\'] | '\\' _)* '\'' i.e. any backslash in the string must be followed by a character (whatever its interpretation is). You can be more precise if backslashes must be followed by \ or ' and nothing else. Then the regexp is '\'' ([^'\'' '\\'] | '\\' ('\\' | '\''))* '\'' (For instance ocaml strings conform to the former, but a warning is emitted whenever the character following \ has no particular interpretation). -- Jean-Christophe ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners