From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by c5ff346549e7 (Postfix) with ESMTPS id 79D1A5D4 for ; Wed, 26 Aug 2020 07:41:22 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.76,354,1592863200"; d="scan'208";a="464727984" Received: from prod-listesu18.inria.fr (HELO sympa.inria.fr) ([128.93.162.160]) by mail2-relais-roc.national.inria.fr with ESMTP; 26 Aug 2020 09:40:47 +0200 Received: by sympa.inria.fr (Postfix, from userid 20132) id 2403DE0B09; Wed, 26 Aug 2020 09:40:47 +0200 (CEST) Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id EC95CE0095 for ; Wed, 26 Aug 2020 09:40:43 +0200 (CEST) Authentication-Results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=oleg@okmij.org; spf=Pass smtp.mailfrom=oleg@okmij.org; spf=None smtp.helo=postmaster@mail1.g3.pair.com IronPort-PHdr: =?us-ascii?q?9a23=3ALPVWXxHKJuk0q3ILjLBhvZ1GYnF86YWxBRYc798d?= =?us-ascii?q?s5kLTJ78rs+wAkXT6L1XgUPTWs2DsrQY0rSQ6vq5EjVavd6oizMrSNR0TRgLiM?= =?us-ascii?q?EbzUQLIfWuLgnFFsPsdDEwB89YVVVorDmROElRH9viNRWJ+iXhpTEdFQ/iOgVr?= =?us-ascii?q?O+/7BpDdj9it1+C15pbffxhEiCCybL9vLRi6txjdutcLjYdtN6o91BTEqWZUdu?= =?us-ascii?q?pLwm9lOUidlAvm6Meq+55j/SVQu/Y/+MNFTK73Yac2Q6FGATo/K2w669HluhfF?= =?us-ascii?q?TQuU+3sTSX4WnQZSAwjE9x71QJH8uTbnu+Vn2SmaOcr2Ta0oWTmn8qxmRgPkhD?= =?us-ascii?q?sBOjUk9mzcl85+g79BoB+5qRJxw5DabpyWOvVxYqzSYN0VSHFdXspNTSFNHp+w?= =?us-ascii?q?YpERA+cHIO1Wr5P9p1wLrRamHAesAP3gyjBVjXLx2q061/ouEQ7d0QwnHNIOtX?= =?us-ascii?q?XUrNfvOKcVS+C1w7DFwDPeZPxZxTnz8pLHcgw9of6SR7Jwd9LcxEohGQ7LjFid?= =?us-ascii?q?t4PoMi2b2+kJr2SX8+VuWOChhWMppAx8oDuiy9sshITViY8Y1E3I+ypnzIg1O9?= =?us-ascii?q?G2R0p2b96qHpZWqiqUOYx2QsY4TGFpviY30roGuZ2+fCgLypQr3Rnfa+aIc4WO?= =?us-ascii?q?/xntV/6RLC95iX9kYr6yiRK//VKux+HmS8W4zVlHojJbntXQqHwBzQLf5tWIR/?= =?us-ascii?q?dn4Eus1yuD2xrO5uxGP0w5k7fQJYQ7zb4qjJUTtFzOHi/ol0Xyi6+bbkAk9fKp?= =?us-ascii?q?6+Tjf7nqvJCcOoFuhgHmKKsum9a/Df4kPQgJWmiX4eW81Lv98k3lWLhHj/w7nr?= =?us-ascii?q?PXvZ3eP8gWqLS1DxJI3oss8xq/Ci2p0NUcnXkJNlJFfxeHgpDuO1HKPv/4Auyy?= =?us-ascii?q?g1OvkDduxvDGPKftApLXLnjMiLvhZ6py61ZAyAovytBS/45bBasEIPL3Q0PxsN?= =?us-ascii?q?3YDgQlMwGv2ObmCNB91psEVm6VA6+ZNrnSsV6S6e41LemMftxdhDGoIPEg47vq?= =?us-ascii?q?jGQlsV4bZ6igm5UNO16iGfEzBEGUbjK4hdMMHk8NvQ8/TqrtklLUAm0bXGq7Q6?= =?us-ascii?q?9pvmJzM4mhF4qWHtnw0ozE5z+yG9htXk4DCl2IFi65JYCNWvNVLi3JZNdokyZC?= =?us-ascii?q?Xr+kGdd4iUOe8TTiwr8iFdL6vzUCvMu6ht924uzR0xYo+m4sVpXP4yS2V2hx21?= =?us-ascii?q?gwaXoz1aF7r1Z6zw7ag697hv1aU9tJ6KEQXw=3D=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0D2AgCqEUZfh3IDJ0JfgQmBTIMYVQExL?= =?us-ascii?q?I04omKBaQsBAwEMIwoCBAEBhEyCOgIcBwEENBMCEAEBBQEBAQIBAwMEARMBAQE?= =?us-ascii?q?KCwkIKYVjDEMBEAGBYiKDRxM/dXWDDQGCewEPsE2BATOEOwGGEoE4AY0hDg2BQ?= =?us-ascii?q?ECBEYNOgkWBNQWECIItBJJKh0mBaZofLoJtgRmBKYYihg6LKDCDBYElgR2MVY4?= =?us-ascii?q?cAY1+jxOVJ4FrZ4ETTTAIgyQJRxkNjioPCYdudIVRMgEyNwIGCgEBAwlYhCKME?= =?us-ascii?q?gEB?= X-IPAS-Result: =?us-ascii?q?A0D2AgCqEUZfh3IDJ0JfgQmBTIMYVQExLI04omKBaQsBAwE?= =?us-ascii?q?MIwoCBAEBhEyCOgIcBwEENBMCEAEBBQEBAQIBAwMEARMBAQEKCwkIKYVjDEMBE?= =?us-ascii?q?AGBYiKDRxM/dXWDDQGCewEPsE2BATOEOwGGEoE4AY0hDg2BQECBEYNOgkWBNQW?= =?us-ascii?q?ECIItBJJKh0mBaZofLoJtgRmBKYYihg6LKDCDBYElgR2MVY4cAY1+jxOVJ4FrZ?= =?us-ascii?q?4ETTTAIgyQJRxkNjioPCYdudIVRMgEyNwIGCgEBAwlYhCKMEgEB?= X-IronPort-AV: E=Sophos;i="5.76,354,1592863200"; d="scan'208";a="357284734" X-MGA-submission: =?us-ascii?q?MDEnjwOjs12dbGA4HCXb1v+e1tJrKyLPT8y2Wi?= =?us-ascii?q?73cj01Z1c1rOHkhlESldbd5gJk1Vw1Fskwpvdmjf+hyMaBFAsnyC1Jlm?= =?us-ascii?q?VWConuXZbPAr0YgSJGUQCoJhpf5xwYzOYz9h5Qc9/mEdOWOg8j5GdKSN?= =?us-ascii?q?iyNhyVMmWn9tjDbjXr+ThGyA=3D=3D?= Received: from mail1.g3.pair.com ([66.39.3.114]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Aug 2020 09:40:21 +0200 Received: from mail1.g3.pair.com (localhost [127.0.0.1]) by mail1.g3.pair.com (Postfix) with ESMTP id 8101F3FBABB; Wed, 26 Aug 2020 03:40:19 -0400 (EDT) Received: from Melchior.localnet (172.231.214.202.rev.vmobile.jp [202.214.231.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail1.g3.pair.com (Postfix) with ESMTPSA id 65C2C582FA5; Wed, 26 Aug 2020 03:40:18 -0400 (EDT) Date: Wed, 26 Aug 2020 16:44:23 +0900 From: Oleg To: caml-list@inria.fr Cc: murthy.chet@gmail.com Message-ID: <20200826074423.GA2109@Melchior.localnet> Mail-Followup-To: Oleg , caml-list@inria.fr, murthy.chet@gmail.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Subject: [Caml-list] ANN: a small library for shell/AWK/Perl-like scripting Reply-To: Oleg X-Loop: caml-list@inria.fr X-Sequence: 18229 Errors-To: caml-list-owner@inria.fr Precedence: list Precedence: bulk Sender: caml-list-request@inria.fr X-no-archive: yes List-Id: List-Help: List-Subscribe: List-Unsubscribe: List-Post: List-Owner: List-Archive: Archived-At: Some time ago Chet Murthy asked about writing shell-like scripts in OCaml. Prompted by it, I also want to report on my experience and announce a small library that made it pleasant to do shell/AWK/Perl-like scripting in OCaml. The library is available at http://okmij.org/ftp/ML/myawk/0README.dr and consists of two small ML files, myawk.ml and strings.ml. The latter collects general-purpose string operations, more convenient than those in Stdlib.String. The rest of that web directory contains half a dozen sample scripts with comments. Here is the first example: a typical AWK script, but written in OCaml: #!/bin/env -S ocaml #load "myawk.cma" open Myawk open Strings let hash = string_of_int <|> Hashtbl.hash ;; (* Sanitize the files originally used by join1.ml and join2.ml The files are made of space-separated fields; the first field is the key. It is sensitive; but because it is a key it can't be replaced with meaningless garbage. We obfuscate it beyond recognition. The third field is obfuscated as well. The second and fourth can be left as they are, and the fifth, if present, is replaced with XXX The script is a proper filter: reads from stdin, writes to stdout *) for_each_line @@ map words @@ function (f1::f2::f3::f4::rest) -> print [hash f1; f2; hash f3; f4; if rest = [] then "" else "XXX"] ;; Here <|> is a function composition. I wish it were in Stdlib. The real example, used in real life, was performing a database join SELECT T2.* from Table1 as T1, Table2 as T2 where T1.f1 = T2.f1 where Table1 and Table2 are text files with space-separated column values. Table1 is supposed to be fed to stdin: let () = for_each_line @@ map words @@ map_option (function (x::_) -> Some x | _ -> None) @@ (ignore <|> shell "grep %s table1.txt") It is a typical rough-and-dirty script. Alas, it was too rough: I was so excited that it typechecked and worked the first time, that I didn't look carefully at the output and overlooked what I was looking for (resulting in an unneeded hassle and apology). I should have queried exactly for what I wanted: SELECT T1.f1, T1.f4 FROM Table1 as T1, Table2 as T2 WHERE T1.f1 = T2.f1 AND T1.f3 <> "3" which is actually easy to write in myawk (probably not so in AWK though) let () = for_each_line ~fname:"table2.txt" @@ map words @@ map_option (function (w::_) -> Some w | _ -> None) @@ fun w -> for_each_line ~fname:"table1.txt" @@ map words @@ map_option (function (x::f2::f3::f4::_) when x = w && f4 <> "3" -> Some [x;f4] | _ -> None) @@ print This is the classical nested loop join. Chet Murthy might be pleased to see the extensive use of the continuation-passing style. I was apprehensive at first, but it turned out not to be a hassle. The library has a few other examples, including case-branching and rewriting a real AWK script from the OCaml distribution. Finally, let's compare with shell scripts. The example below doesn't show off the library, but it does show the benefits of OCaml for scripting. The original shell script is a sample GIT commit hook, quoted in the comments: (* From GIT's sample hooks: ANY-GIT-REPO/.git/hooks/commit-msg.sample # Called by "git commit" with one argument, the name of the file # that has the commit message. The hook should exit with non-zero # status after issuing an appropriate message if it wants to stop the # commit. The hook is allowed to edit the commit message file. # This example catches duplicate Signed-off-by lines. test "" = "$(grep '^Signed-off-by: ' "$1" | sort | uniq -c | sed -e '/^[ ]*1[ ]/d')" || { echo >&2 Duplicate Signed-off-by lines. exit 1 } *) module H = Hashtbl let commit_msg = Sys.argv.(1) let ht = H.create 5 let () = for_each_line ~fname:commit_msg @@ fun l -> if is_prefix "Signed-off-by: " l <> None then begin if H.find_opt ht l <> None then begin prerr_endline "Duplicate Signed-off-by lines."; exit 1 end else H.add ht l () end Although the OCaml script seems to have more characters, one doesn't need to type them all. Scripts like that are meant to be entered in an editor; even ancient editors have completion facilities. Looking at the original shell script brings despair, and drives me right towards Unix Haters. Not only the script is algorithmically ugly: if a duplicate signed-off line occurs near the beginning, we can report it right away and stop. We don't need to read the rest of the commit message, filter it, sort it, precisely count all duplicates and filter again. Not only the script gratuitously wastes system resources (read: the laptop battery) by launching many processes and allocating communication buffers. Mainly, the script isn't good at its primary purpose: it isn't easy to write and read. Pipeline composition of small stream processors is generally a good thing -- but not when each stream processor is written in its own idiosyncratic language. Incidentally, I have doubts about the script: I think that quotes around $1 are meant to be embedded; but why they are not escaped then? Probably it is some edge case of bash, out of several 0thousands. In contrast, OCaml script does exactly what is required, with no extra work. Everything is written in only one language.