caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Pierre Weis <pierre.weis@inria.fr>
To: alan.schmitt@polytechnique.org (Alan Schmitt)
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] scanf and %2c
Date: Fri, 20 Jun 2003 11:06:47 +0200 (MET DST)	[thread overview]
Message-ID: <200306200906.LAA12011@pauillac.inria.fr> (raw)
In-Reply-To: <20030613202820.GM9367@alan-schm1p> from Alan Schmitt at "Jun 13, 103 04:28:20 pm"

Hi Alan,

> As I needed to parse some string representing time (of the form hh:mm), 
> I decided to use scanf. The correct code to do it is:
> # let time_parse s =
>   Scanf.sscanf s "%2s:%2s" (fun a b -> a,b)
>   ;;
> val time_parse : string -> string * string = <fun>

Just to implement stricter parsing rules (and BTW to show scanf
capabilities), I will elaborate a bit on this ``correct'' code.

To ensure that hh and mm are indeed decimal digits, we could write:

# let scan_date s = Scanf.sscanf s "%2d:%2d";;
val scan_date : string -> (int -> int -> 'a) -> 'a = <fun>

This way, the fields hh and mm are parsed and returned as integers as
they are supposed to be.

So far so good, but this is not precise enough, since (small) negative
hours are still accepted:

# scan_date "-2:12" (fun hh mm -> hh, mm);;
- : int * int = (-2, 12)

That's why I usually use:

# let scan_date s = Scanf.sscanf s "%2[0-9]:%2[0-9]";;
val scan_date : string -> (string -> string -> 'a) -> 'a = <fun>

# scan_date "-2:12" (fun x y -> x, y);;
Exception: Scanf.Scan_failure "scanf: bad input at char number 1: -".

Then, you may argue that we still parse dates like 99:99 which are
meaningless. Scanning the characters one at a time, we could be more
precise and reject a large class of those erroneous dates:

# let scan_date s = Scanf.sscanf s "%1[0-2]%1[0-9]:%1[0-5]%1[0-9]";;
val scan_date : string -> (string -> string -> string -> string -> 'a)
-> 'a = <fun>

If minutes are now appropriately handled, we still accept to parse hours
that are greater than 24!

To deal with that problem, we first define two auxilliary functions am
and pm to parse respectively dates before 20:00 and after 20:00, when
the first digit of the hour is already properly parsed:

let am ib = Scanf.bscanf ib "%1[0-9]:%1[0-5]%1[0-9]";;
let pm ib = Scanf.bscanf ib "%1[0-3]:%1[0-5]%1[0-9]";;

let scan_date_ib ib f =
  Scanf.bscanf ib "%c"
   (function c ->
    let h0 = String.make 1 c in
    match c with
    | '0' | '1' -> am ib (f h0)
    | '2' -> pm ib (f h0)
    | _ -> failwith ("Illegal date char " ^ h0));;

val scan_date_ib :
  Scanf.Scanning.scanbuf ->
  (string -> string -> string -> string -> 'a) -> 'a = <fun>

Remark that we turned to bscanf, that is scanning from scanning
buffers (and not strings), since the scanning is now split into
several phases that should go on scanning from the same data structure
(to do so with strings would involve horrific substring manipulations
of the string argument to pass it to the next step).

As a rule of thumb, scanning from buffers is much more general and
easy than scanning from string or files: phase scanning can be
composed smoothly and scanning from any other data structure is easily
expressed in terms of a basic function scanning from buffers.

For instance, if you insist for scanning from strings, you could define:

let scan_date s = scan_date_ib (Scanf.Scanning.from_string s);;

Now:
# scan_string_date "25:12";;
Exception: Scanf.Scan_failure "scanf: bad input at char number 2: 5".

Hope this helps,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/

PS: Using the new formats manipulation primitives, we could have
factorized a bit the functions am and pm as:

let minutes_fmt () = format_of_string ":%1[0-5]%1[0-9]";;
let am_fmt () = "%1[0-9]" ^^ minutes_fmt ();;
let pm_fmt () = "%1[0-3]" ^^ minutes_fmt ();;

let am ib = Scanf.bscanf ib (am_fmt ());;
let pm ib = Scanf.bscanf ib (pm_fmt ());;

(Note the additional () abstractions to circumvenient the value
restriction of polymorphic generalization.)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2003-06-20  9:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-06-13 20:28 Alan Schmitt
2003-06-19  8:57 ` Pierre Weis
2003-06-19 15:06   ` Nicolas George
2003-06-20  9:06 ` Pierre Weis [this message]
2003-06-20 10:45   ` Alan Schmitt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200306200906.LAA12011@pauillac.inria.fr \
    --to=pierre.weis@inria.fr \
    --cc=alan.schmitt@polytechnique.org \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).