caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Question re: camlp4 parser
@ 2005-07-26  0:11 Paul Snively
  2005-07-26  1:17 ` [Caml-list] " Stephane Glondu
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Snively @ 2005-07-26  0:11 UTC (permalink / raw)
  To: caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello everyone,

I'm beginning to explore some tasks in earnest. One of them is  
writing a simple .ini file parser. This seems like something that  
would fall easily within the LL(1) capabilities of the camlp4 parser  
keyword, but I'm having a bit of trouble remembering how to structure  
this.

For example, I'd like a parser that matches one or more printable  
ASCII characters. Something that looks like:

let rec printable = parser [< '' '..'~'; x = printable >] -> x

This, of course, has two obvious problems:

1) On test data such as Stream.of_string "Test!\013" it raises  
Stream.Failure, no doubt because it has found the \013 which doesn't  
match the character range, i.e. it is, of course, not doing lookahead.

2) Even if that weren't the case, the resulting "x" would be missing  
the first matching character. What I really need is the accumulation  
of all of the characters.

It's just been too long since I've had to do LL(1), I think. I'm sure  
I'm overlooking something obvious. Or do I just need to go ahead and  
use ulex, even though I can't use it from the toplevel, which really  
annoys me?

Many thanks and best regards,
Paul Snively

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iEYEARECAAYFAkLlf6wACgkQO3fYpochAqKU8wCcDYG8Z6ndVosBLI3tE3PZH2RM
n6YAoPjxNokFagTPoqI3Flnd0PbM0ESb
=BSi3
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Question re: camlp4 parser
  2005-07-26  0:11 Question re: camlp4 parser Paul Snively
@ 2005-07-26  1:17 ` Stephane Glondu
  2005-07-26 16:43   ` Paul Snively
  0 siblings, 1 reply; 6+ messages in thread
From: Stephane Glondu @ 2005-07-26  1:17 UTC (permalink / raw)
  To: Paul Snively; +Cc: caml-list

Paul Snively wrote:
> For example, I'd like a parser that matches one or more printable  ASCII
> characters. Something that looks like:
> 
> let rec printable = parser [< '' '..'~'; x = printable >] -> x

The inferred type should have given you a warning:
--> val printable : char Stream.t -> 'a = <fun>

In other word, your function never returns a correct value.

Try this:

let printable s =
  let buf = Buffer.create 100 in
  let rec aux = parser
      [< '' '..'~' as c; x = (Buffer.add_char buf c; aux) >] -> x
    | [< >] -> Buffer.contents buf
  in aux s ;;
--> val printable : char Stream.t -> string = <fun>

printable (Stream.of_string "Test!\013") ;;
--> - : string = "Test!"

Notice that you cannot remove the occurrences of "s" (even though it
would have the same type) if you are planning to use this function
several times.

> Many thanks and best regards,

You're welcome.

-- 

Stephane Glondu.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Question re: camlp4 parser
  2005-07-26  1:17 ` [Caml-list] " Stephane Glondu
@ 2005-07-26 16:43   ` Paul Snively
  2005-07-26 17:05     ` Stephane Glondu
  2005-07-27  7:04     ` Virgile Prevosto
  0 siblings, 2 replies; 6+ messages in thread
From: Paul Snively @ 2005-07-26 16:43 UTC (permalink / raw)
  To: Stephane Glondu; +Cc: caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello, Stephane!

On Jul 25, 2005, at 6:17 PM, Stephane Glondu wrote:
>
> The inferred type should have given you a warning:
> --> val printable : char Stream.t -> 'a = <fun>
>
> In other word, your function never returns a correct value.
>
Excellent point.

> Try this:
>
> let printable s =
>   let buf = Buffer.create 100 in
>   let rec aux = parser
>       [< '' '..'~' as c; x = (Buffer.add_char buf c; aux) >] -> x
>     | [< >] -> Buffer.contents buf
>   in aux s ;;
> --> val printable : char Stream.t -> string = <fun>
>
> printable (Stream.of_string "Test!\013") ;;
> --> - : string = "Test!"
>
> Notice that you cannot remove the occurrences of "s" (even though it
> would have the same type) if you are planning to use this function
> several times.
>
Thanks, this is exactly the kind of thing I was hoping for! So the  
key points are:

1) Use the | and an empty alternative pattern to capture the "no more  
matches" case.
2) Use "as" and take advantage of expression sequencing to accumulate  
the matches into a variable (Buffer, in this case).

That makes perfect sense and now seems obvious. :-)

One hopefully final question: is there a convenient shorthand for  
saying something like "all printable characters except '=' or '['?" I  
assume not--that is, we have ranges (' '..'~') or we have variants  
('A' | 'B' | 'C'...) and that's it. I'm somewhat spoiled, I think, by  
Spirit in C++, and its notion of "character sets" and operations on  
them, so I can say, e.g. "print_p - '='" that that will match all  
printable characters other than '='.

>
>> Many thanks and best regards,
>>
>
> You're welcome.
>

Thanks again,

> -- 
>
> Stephane Glondu.
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

Paul Snively

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iEYEARECAAYFAkLmaEAACgkQO3fYpochAqIc5QCeOaHzKj+bTBOObRMisOSzdyO7
RrkAoKkWokql0JuuFvLUeelr5NgTNsgg
=IFMX
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Question re: camlp4 parser
  2005-07-26 16:43   ` Paul Snively
@ 2005-07-26 17:05     ` Stephane Glondu
  2005-07-27  7:04     ` Virgile Prevosto
  1 sibling, 0 replies; 6+ messages in thread
From: Stephane Glondu @ 2005-07-26 17:05 UTC (permalink / raw)
  To: caml-list; +Cc: Paul Snively

On Tuesday 26 July 2005 09:43, Paul Snively wrote:
> One hopefully final question: is there a convenient shorthand for
> saying something like "all printable characters except '=' or '['?" I
> assume not--that is, we have ranges (' '..'~') or we have variants
> ('A' | 'B' | 'C'...) and that's it. I'm somewhat spoiled, I think, by
> Spirit in C++, and its notion of "character sets" and operations on
> them, so I can say, e.g. "print_p - '='" that that will match all
> printable characters other than '='.

I don't know whether there is a way to do this directly. You can split your 
range so that it avoids '=' and '[', or do something like this:

let printable s =
  let buf = Buffer.create 100 in
  let rec aux = parser
      [< '  ('=' | '[') >] -> Buffer.contents buf
    | [< '' '..'~' as c; x = (Buffer.add_char buf c; aux) >] -> x
    | [< >] -> Buffer.contents buf
  in aux s ;;

printable (Stream.of_string "path=/usr/src") ;;
--> - : string = "path"

Bear in mind that the '=' or '?' will be discarded by the parser. If you 
don't want so, you can use Stream.peek (but it's much more annoying).


-- 
Stephane Glondu.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Question re: camlp4 parser
  2005-07-26 16:43   ` Paul Snively
  2005-07-26 17:05     ` Stephane Glondu
@ 2005-07-27  7:04     ` Virgile Prevosto
  2005-07-28  1:27       ` Paul Snively
  1 sibling, 1 reply; 6+ messages in thread
From: Virgile Prevosto @ 2005-07-27  7:04 UTC (permalink / raw)
  To: caml-list

2005/7/26, Paul Snively <psnively@mac.com>:
> One hopefully final question: is there a convenient shorthand for
> saying something like "all printable characters except '=' or '['?" I
> assume not--that is, we have ranges (' '..'~') or we have variants
> ('A' | 'B' | 'C'...) and that's it. I'm somewhat spoiled, I think, by
> Spirit in C++, and its notion of "character sets" and operations on
> them, so I can say, e.g. "print_p - '='" that that will match all
> printable characters other than '='.
> 

As any other pattern, stream patterns can be refined with a 'when' condition:

let printable s =
 let buf = Buffer.create 100 in
 let rec aux = parser
   | [< '' '..'~' as c when c <> '=' && c <> '['; 
        x = (Buffer.add_char buf c; aux) >] -> x
   | [< >] -> Buffer.contents buf
 in aux s ;;

should do the trick. It might not be that convenient for a more
complex set of excluded characters, but it is possible to write a char
-> bool test outside of the stream parser.

-- 
E tutto per oggi, a la prossima volta
Virgile


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Question re: camlp4 parser
  2005-07-27  7:04     ` Virgile Prevosto
@ 2005-07-28  1:27       ` Paul Snively
  0 siblings, 0 replies; 6+ messages in thread
From: Paul Snively @ 2005-07-28  1:27 UTC (permalink / raw)
  To: virgile.prevosto; +Cc: caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Jul 27, 2005, at 12:04 AM, Virgile Prevosto wrote:

> As any other pattern, stream patterns can be refined with a 'when'  
> condition:
>
> let printable s =
>  let buf = Buffer.create 100 in
>  let rec aux = parser
>    | [< '' '..'~' as c when c <> '=' && c <> '[';
>         x = (Buffer.add_char buf c; aux) >] -> x
>    | [< >] -> Buffer.contents buf
>  in aux s ;;
>
> should do the trick. It might not be that convenient for a more
> complex set of excluded characters, but it is possible to write a char
> -> bool test outside of the stream parser.
>
Of course: it's all becoming quite clear now. Thanks for the  
excellent suggestion and your patience with my naïvete. :-)

> -- 
> E tutto per oggi, a la prossima volta
> Virgile
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

Best regards,
Paul Snively



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iEYEARECAAYFAkLoNH8ACgkQO3fYpochAqI94gCfXosjSfFZAbtanYQstgCjYLfY
HqUAoIWd4QpsWhynHyj8A6WJDqWOP61B
=BKDa
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-07-28  1:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-26  0:11 Question re: camlp4 parser Paul Snively
2005-07-26  1:17 ` [Caml-list] " Stephane Glondu
2005-07-26 16:43   ` Paul Snively
2005-07-26 17:05     ` Stephane Glondu
2005-07-27  7:04     ` Virgile Prevosto
2005-07-28  1:27       ` Paul Snively

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).