From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 7FA59BC84 for ; Thu, 24 Mar 2005 16:09:41 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j2OF9fuC026881 for ; Thu, 24 Mar 2005 16:09:41 +0100 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id QAA08677 for ; Thu, 24 Mar 2005 16:09:40 +0100 (MET) Received: from alex.barettalocal.com (h213-255-109-130.albacom.net [213.255.109.130] (may be forged)) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j2OF9ee6026875 for ; Thu, 24 Mar 2005 16:09:40 +0100 Received: from [127.0.0.1] (localhost.localdomain [127.0.0.1]) by alex.barettalocal.com (Postfix) with ESMTP id DD7C02BAA7B for ; Thu, 24 Mar 2005 16:09:41 +0100 (CET) Message-ID: <4242D835.4060301@barettadeit.com> Date: Thu, 24 Mar 2005 16:09:41 +0100 From: Alex Baretta User-Agent: Debian Thunderbird 1.0 (X11/20050116) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Ocaml Subject: Str library for channels? X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 4242D835.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 4242D834.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; baretta:01 ocamllex:01 expr:01 expr:01 lexbuf:01 lexbuf:01 lexing:01 stdin:01 ocamllex:01 regexp:01 pervasives:01 stdin:01 stdout:01 rec:01 api:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.1 required=5.0 tests=FORGED_RCVD_HELO autolearn=disabled version=3.0.2 X-Spam-Level: I have been confronted with the following little problem: finding the first occurrence of a string matching a given regular expression in an indefinitely long file. My solution was based on ocamllex. Here it is. let expr = ... as expr rule find = parse | ['\000'-'\255'] { find lexbuf } | expr { print_string expr; copy lexbuf } and copy = shortest | ([^'\n']* '\n') as line { print_string line; copy lexbuf } | ([^'\n']* as line) eof { print_string line } { find (Lexing.from_channel stdin) } I think that using ocamllex for such a minimal task is rather cumbersome. What I would like to do is the following: let re = Str.regexp ... let () = Pervasives.seek_in (Str.seek_forward re stdin) let () = copy_channel stdin stdout Where the following is defined somewhere. let copy_channel ?(s=String.create 1024) in_ch out_ch = let length = String.length s in let rec loop bytes_read = output out_ch s 0 bytes_read; loop (input in_ch s 0 length) in loop (input in_ch s 0 length) Str does not support scanning files. Is this a limitation in the API or in the regexp engine? Could Str be extended to handle files as well as strings? Alex -- ********************************************************************* http://www.barettadeit.com/ Baretta DE&IT A division of Baretta SRL tel. +39 02 370 111 55 fax. +39 02 370 111 54 Our technology: The Application System/Xcaml (AS/Xcaml) The FreerP Project