caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] scanf and %2c
@ 2003-06-13 20:28 Alan Schmitt
  2003-06-19  8:57 ` Pierre Weis
  2003-06-20  9:06 ` Pierre Weis
  0 siblings, 2 replies; 5+ messages in thread
From: Alan Schmitt @ 2003-06-13 20:28 UTC (permalink / raw)
  To: caml-list

Hi,

As I needed to parse some string representing time (of the form hh:mm), 
I decided to use scanf. The correct code to do it is:
# let time_parse s =
  Scanf.sscanf s "%2s:%2s" (fun a b -> a,b) 
  ;;
val time_parse : string -> string * string = <fun>

but of course this is not what I tried first, thinking that I wanted 
a string of two chars:
# let time_parse s =
      Scanf.sscanf s "%2c:%2c" (fun a b -> a,b) 
      ;;
val time_parse : string -> char * char = <fun>

this leads to the following:

# time_parse "10:20" ;;
Exception: Scanf.Scan_failure "scanf: bad input at char number 2: 0".
# time_parse "1:2" ;;
- : char * char = ('1', '2')

So shouldn't there be a warning (or an error) when using a size field 
with chars ?

Alan

-- 
The hacker: someone who figured things out and made something cool happen.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] scanf and %2c
  2003-06-13 20:28 [Caml-list] scanf and %2c Alan Schmitt
@ 2003-06-19  8:57 ` Pierre Weis
  2003-06-19 15:06   ` Nicolas George
  2003-06-20  9:06 ` Pierre Weis
  1 sibling, 1 reply; 5+ messages in thread
From: Pierre Weis @ 2003-06-19  8:57 UTC (permalink / raw)
  To: Alan Schmitt; +Cc: caml-list

Bonjour Alan,

> As I needed to parse some string representing time (of the form hh:mm), 
[...]

Welcome to the dates and time users' camp! Too bad that there is no
support for that kind of stuff in our favorite language :(

> So shouldn't there be a warning (or an error) when using a size field 
> with chars ?
> 
> Alan

We must be a bit more precise than that: we should check that the size
field is positive and lesser or equal than 1.

In effect:

- a 0 sized char scanf specification has a special useful meaning (see
Scanf.mli for details): it means ``pick'' the current character
without reading it (in order to test its value and decide what to do
next),

- a 1 sized char scanf specification seems to be harmless.

I will try to had a static check in the type-checker (the usual Caml
way), orelse a runtime failure in Scanf (the usual way of more
conventional programming languages).

Amicalement,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] scanf and %2c
  2003-06-19  8:57 ` Pierre Weis
@ 2003-06-19 15:06   ` Nicolas George
  0 siblings, 0 replies; 5+ messages in thread
From: Nicolas George @ 2003-06-19 15:06 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

Le primidi 1er messidor, an CCXI, Pierre Weis a écrit :
> > As I needed to parse some string representing time (of the form hh:mm), 
> Welcome to the dates and time users' camp! Too bad that there is no
> support for that kind of stuff in our favorite language :(

I have written a quite complete date parser for the OCamlnet project. It
is available at Sourceforge: <URL:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ocamlnet/ocamlnet/src/netstring/
> (netdate.mli and netdate.mlp).

[-- Attachment #2: Type: application/pgp-signature, Size: 185 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] scanf and %2c
  2003-06-13 20:28 [Caml-list] scanf and %2c Alan Schmitt
  2003-06-19  8:57 ` Pierre Weis
@ 2003-06-20  9:06 ` Pierre Weis
  2003-06-20 10:45   ` Alan Schmitt
  1 sibling, 1 reply; 5+ messages in thread
From: Pierre Weis @ 2003-06-20  9:06 UTC (permalink / raw)
  To: Alan Schmitt; +Cc: caml-list

Hi Alan,

> As I needed to parse some string representing time (of the form hh:mm), 
> I decided to use scanf. The correct code to do it is:
> # let time_parse s =
>   Scanf.sscanf s "%2s:%2s" (fun a b -> a,b)
>   ;;
> val time_parse : string -> string * string = <fun>

Just to implement stricter parsing rules (and BTW to show scanf
capabilities), I will elaborate a bit on this ``correct'' code.

To ensure that hh and mm are indeed decimal digits, we could write:

# let scan_date s = Scanf.sscanf s "%2d:%2d";;
val scan_date : string -> (int -> int -> 'a) -> 'a = <fun>

This way, the fields hh and mm are parsed and returned as integers as
they are supposed to be.

So far so good, but this is not precise enough, since (small) negative
hours are still accepted:

# scan_date "-2:12" (fun hh mm -> hh, mm);;
- : int * int = (-2, 12)

That's why I usually use:

# let scan_date s = Scanf.sscanf s "%2[0-9]:%2[0-9]";;
val scan_date : string -> (string -> string -> 'a) -> 'a = <fun>

# scan_date "-2:12" (fun x y -> x, y);;
Exception: Scanf.Scan_failure "scanf: bad input at char number 1: -".

Then, you may argue that we still parse dates like 99:99 which are
meaningless. Scanning the characters one at a time, we could be more
precise and reject a large class of those erroneous dates:

# let scan_date s = Scanf.sscanf s "%1[0-2]%1[0-9]:%1[0-5]%1[0-9]";;
val scan_date : string -> (string -> string -> string -> string -> 'a)
-> 'a = <fun>

If minutes are now appropriately handled, we still accept to parse hours
that are greater than 24!

To deal with that problem, we first define two auxilliary functions am
and pm to parse respectively dates before 20:00 and after 20:00, when
the first digit of the hour is already properly parsed:

let am ib = Scanf.bscanf ib "%1[0-9]:%1[0-5]%1[0-9]";;
let pm ib = Scanf.bscanf ib "%1[0-3]:%1[0-5]%1[0-9]";;

let scan_date_ib ib f =
  Scanf.bscanf ib "%c"
   (function c ->
    let h0 = String.make 1 c in
    match c with
    | '0' | '1' -> am ib (f h0)
    | '2' -> pm ib (f h0)
    | _ -> failwith ("Illegal date char " ^ h0));;

val scan_date_ib :
  Scanf.Scanning.scanbuf ->
  (string -> string -> string -> string -> 'a) -> 'a = <fun>

Remark that we turned to bscanf, that is scanning from scanning
buffers (and not strings), since the scanning is now split into
several phases that should go on scanning from the same data structure
(to do so with strings would involve horrific substring manipulations
of the string argument to pass it to the next step).

As a rule of thumb, scanning from buffers is much more general and
easy than scanning from string or files: phase scanning can be
composed smoothly and scanning from any other data structure is easily
expressed in terms of a basic function scanning from buffers.

For instance, if you insist for scanning from strings, you could define:

let scan_date s = scan_date_ib (Scanf.Scanning.from_string s);;

Now:
# scan_string_date "25:12";;
Exception: Scanf.Scan_failure "scanf: bad input at char number 2: 5".

Hope this helps,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/

PS: Using the new formats manipulation primitives, we could have
factorized a bit the functions am and pm as:

let minutes_fmt () = format_of_string ":%1[0-5]%1[0-9]";;
let am_fmt () = "%1[0-9]" ^^ minutes_fmt ();;
let pm_fmt () = "%1[0-3]" ^^ minutes_fmt ();;

let am ib = Scanf.bscanf ib (am_fmt ());;
let pm ib = Scanf.bscanf ib (pm_fmt ());;

(Note the additional () abstractions to circumvenient the value
restriction of polymorphic generalization.)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] scanf and %2c
  2003-06-20  9:06 ` Pierre Weis
@ 2003-06-20 10:45   ` Alan Schmitt
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Schmitt @ 2003-06-20 10:45 UTC (permalink / raw)
  To: caml-list

* Pierre Weis (pierre.weis@inria.fr) wrote:
> Just to implement stricter parsing rules (and BTW to show scanf
> capabilities), I will elaborate a bit on this ``correct'' code.

Thanks a lot for this enlightening lecture.

Alan Schmitt

-- 
The hacker: someone who figured things out and made something cool happen.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-06-20 10:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-13 20:28 [Caml-list] scanf and %2c Alan Schmitt
2003-06-19  8:57 ` Pierre Weis
2003-06-19 15:06   ` Nicolas George
2003-06-20  9:06 ` Pierre Weis
2003-06-20 10:45   ` Alan Schmitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).