From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id AD142BB81 for ; Sun, 23 Oct 2005 23:37:56 +0200 (CEST) Received: from mail.gmx.net (pop.gmx.de [213.165.64.20]) by nez-perce.inria.fr (8.13.0/8.13.0) with SMTP id j9NLbuZD017232 for ; Sun, 23 Oct 2005 23:37:56 +0200 Received: (qmail invoked by alias); 23 Oct 2005 21:37:55 -0000 Received: from p54A30DFD.dip0.t-ipconnect.de (EHLO [192.168.2.136]) [84.163.13.253] by mail.gmx.net (mp019) with SMTP; 23 Oct 2005 23:37:55 +0200 X-Authenticated: #20477425 From: Michael Wohlwend To: caml-list@yquem.inria.fr Subject: Re: Another problem (was Re: [Caml-list] Ocamllex question) Date: Sun, 23 Oct 2005 23:37:59 +0200 User-Agent: KMail/1.8 References: <435BD045.1000305@gushee.net> <435BFCD4.8040009@gushee.net> In-Reply-To: <435BFCD4.8040009@gushee.net> X-Face: S)[vu%Bha1d&ej9GfwAq~7C}A,y[B.uS}+D6'hb~xPwsxymw$fnCOaMe<=?utf-8?q?*bnUajSBR=5Fm=3FR=0A=09?=@V3;iX8[A}z`.%pEQ1r7iZhN8#ktTCBQ}&mkx>=RH&l|l6\]NZI@ MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200510232337.59207.micha-1@fantasymail.de> X-Y-GMX-Trusted: 0 X-Miltered: at nez-perce with ID 435C02B4.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 ocamllex:01 ocamllex:01 lexer:01 lazy:01 lexer:01 sequences:01 buffer:01 backslash:01 char:01 backslash:01 'n':01 escapes:01 buffer:01 char:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=FORGED_RCVD_HELO,SPF_FAIL autolearn=disabled version=3.0.3 On Sunday 23 October 2005 23:12, Matt Gushee wrote: > While we're on the subject of Ocamllex, there's another issue I'm > > wondering about. My lexer needs to handle quoted strings, something like: > | '"' [^ '"'] * as word '"' { WORD word } > > Not essential, but it would be nice to allow escaped quotes within such > strings: > > "The quick brown fox jumped over the \"lazy\" dog." > > I've haven't actually tried to implement this yet, but thinking about it > it seems like it would make the lexer hugely more complex. Can anyone > suggest a reasonably simple way to deal with escape sequences? you write a extra lexer-rule which gets called when a " is seen; in this rule you read single chars and append them to a string buffer. If you see a backslash you handle the next character special; example: let char_for_backslash = function | 'a' -> '\007' | 'v' -> '\011' | 'f' -> '\012' | 'n' -> '\n' | 't' -> '\t' | 'b' -> '\b' | 'r' -> '\r' | c -> c let bs_escapes = [ '\032' - '\255' ] let string_buff = Buffer.create 256 let reset_string_buffer () = Buffer.clear string_buff let store_string_char c = Buffer.add_char string_buff c let store_string s = Buffer.add_string string_buff s let get_stored_string () = Buffer.contents string_buff rule dict = parser | ... | [ '"' ] as d { reset_string_buffer(); scan_str lexbuf; let s = get_stored_string() in (* Printf.printf " String (%c) read:%s " d s; *) STRING(s) } and scan_str = parse | [ '"' ] { () } | '\\' (bs_escapes as c) { store_string_char (char_for_backslash c); scan_str lexbuf } | allowed_string_char as c {store_string_char c } | eof { raise( Lexical_error("unterminated string") ) } works well, Michael