From: "François Pottier" <francois.pottier@inria.fr> To: OCaML Mailing List <caml-list@inria.fr>, menhir-list <menhir@inria.fr> Subject: [Caml-list] [ANN] New release of Menhir (20211230) Date: Fri, 31 Dec 2021 09:22:49 +0100 [thread overview] Message-ID: <82582cbe-09da-1410-0cce-f1ebf4f71294@inria.fr> (raw) Dear OCaml & Menhir users, I am pleased to announce a new release of Menhir, with a major improvement. The code back-end has been rewritten from the ground up by Émile Trotignon and by myself, and now produces efficient and well-typed OCaml code. The infamous Obj.magic is not used any more. Furthermore, the new code back-end produces code that is more aggressively optimized, leading to a significant reduction in memory allocation and a typical performance improvement of up to 20% compared to the previous code back-end. opam update opam install menhir.20211230 Happy well-typed parsing in 2022! -- François Pottier francois.pottier@inria.fr http://cambium.inria.fr/~fpottier/ ## 2021/12/30 * The code back-end has been rewritten from the ground up by Émile Trotignon and François Pottier, and now produces efficient and **well-typed** OCaml code. The infamous `Obj.magic` is not used any more. The table back-end and the Coq back-end are unaffected by this change. The main side effects of this change are as follows: - The code back-end now needs type information. This means that *either* Menhir's type inference mechanism must be enabled (the easiest way of enabling it is to use Menhir via `dune` and to check that the `dune-project` file says `(using menhir 2.0)` or later) *or* the type of every nonterminal symbol must be explicitly given via a `%type` declaration. - The code back-end no longer allows the type of any symbol to be an open polymorphic variant type, such as ```[> `A ]```. As a workaround, we suggest using a closed polymorphic variant instead. - The code back-end now adheres to the *simplified* error-handling strategy, as opposed to the *legacy* strategy. For grammars that do *not* use the `error` token, this makes no difference. For grammars that use the `error` token in the limited way permitted by the simplified strategy, this makes no difference either. The simplified strategy makes the following requirement: the `error` token should always appear at the end of a production, whose semantic action should abort the parser by raising an exception. Grammars that make more complex use of the `error` token, and therefore need the `legacy` strategy, cannot be compiled by the new code back-end. As a workaround, it is possible to switch to the table back-end (using `--table --strategy legacy`) or to the ancient code back-end (using `--code-ancient`). **In the long run, we recommend abandoning the use of the `error` token**. Support for the `error` token may be removed entirely at some point in the future. The original code back-end, which has been around since the early days of Menhir (2005), temporarily remains available (using `--code-ancient`). It will be removed at some point in the future. The new code back-end offers several levels of optimization, which remain undocumented and are subject to change in the future. At present, the main levels are roughly as follows: - `-O 0 --represent-everything` uses a uniform representation of the stack and produces straightforward code. - `-O 0` uses a non-uniform representation of the stack; some stack cells have fewer fields; some stack cells disappear altogether. - `-O 1` reduces memory traffic by moving `PUSH` operations so that they meet `POP` operations and cancel out. - `-O 2` optimizes the reduction of unit productions (that is, productions whose right-hand side has length 1) by performing a limited amount of code specialization. The default level of optimization is the maximum level, `-O 2`. * The new command line switch `--exn-carries-state` causes the exception `Error` to carry an integer parameter: `exception Error of int`. When the parser detects a syntax error, the number of the current state is reported in this way. This allows the caller to select a suitable syntax error message, along the lines described in [Section 11](http://cambium.inria.fr/~fpottier/menhir/manual.html#sec68) of the manual. This command line switch is currently supported by the code back-end only. * The `$syntaxerror` keyword is no longer supported. * Document the trick of wrapping module aliases in `open struct ... end`, like this: `%{ open struct module alias M = MyLongModuleName end %}`. This allows you to use the short name `M` in your grammar, but forces OCaml to infer types that refer to the long name `MyLongModuleName`. (Suggested by Frédéric Bour.)
reply other threads:[~2021-12-31 8:22 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=82582cbe-09da-1410-0cce-f1ebf4f71294@inria.fr \ --to=francois.pottier@inria.fr \ --cc=caml-list@inria.fr \ --cc=menhir@inria.fr \ --subject='Re: [Caml-list] [ANN] New release of Menhir (20211230)' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).