From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from minnie.tuhs.org (minnie.tuhs.org [45.79.103.53]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id b89d39ed for ; Fri, 7 Dec 2018 10:09:28 +0000 (UTC) Received: by minnie.tuhs.org (Postfix, from userid 112) id 7301CA24F2; Fri, 7 Dec 2018 20:09:27 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 9AB9EA1F05; Fri, 7 Dec 2018 20:08:52 +1000 (AEST) Received: by minnie.tuhs.org (Postfix, from userid 112) id 291E9A1F03; Fri, 7 Dec 2018 20:08:44 +1000 (AEST) X-Greylist: delayed 919 seconds by postgrey-1.35 at minnie.tuhs.org; Fri, 07 Dec 2018 20:08:37 AEST Received: from m12-18.163.com (m12-18.163.com [220.181.12.18]) by minnie.tuhs.org (Postfix) with ESMTP id DFDD193D07 for ; Fri, 7 Dec 2018 20:08:37 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Date:MIME-Version:Subject:From:Message-ID; bh=bf4F9 ZFx3c3veUemspIvelqv8XIYWPN3AjRHNy9RVtQ=; b=aMIlX0YO98zrgD2Ybtfqi qiBe4il/JZyu4ZeR7jqICzuinqH7LvLikfIMUIQsOus/xeR80ICdeDbkDVUCPlNb yq4aqRXillGXWnzlIl2WhV3uQtwHRb34WfatH9UfP1Q3oY1LPs0IzeDFoHf1T6fl m5oI/TGELgCeo/EOVREPL0= Received: from [192.168.0.100] (unknown [175.7.16.182]) by smtp14 (Coremail) with SMTP id EsCowABnoYcLQwpc5LQ7Dg--.7266S2; Fri, 07 Dec 2018 17:53:16 +0800 (CST) Date: Fri, 07 Dec 2018 09:53:14 +0000 In-Reply-To: <20181207083154.ucpd6qim3ghclfhn@crack.deadbeast.net> References: <1C1BB8D2-9535-4A6D-96E8-3792F14BCBAE@163.com> <20181207083154.ucpd6qim3ghclfhn@crack.deadbeast.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Autocrypt: addr=Caipenghui_c@163.com; keydata=mQGNBFt2VswBDADz+xR2utqiLw9nSI /RtmrB7N8+CBmUpNbngZPtSEHWdHsORAaa5Vf5OWUlIMXiS5SGRv5sy6bd6e4CNFdm6pPckDrx1j YZUgx/03/awLjyyVMW5Q4GKCNjgxtlUk5VFQvPcGUutza4fP9p17vduot6NztG4foxUxm4QrKvnJ ODsNANC/PUrcz3Y09qAVFfSJKPC3jlO91wr4S/0OjNVUIou7xINZuv9W2DMJ/kR2hqxmEOrdsKMy XHHTV6Kv7S89DL82XmgRj6kSpkjQx4L26TG3ldwk6pxUD/CR9f1ChPrldLg6TSgqxxS67lXhdpsl i03rpAF7fqp0W9on2+NbEDYIcgL9V1IRr7hTiCL9RJVoLRpS37EFOcIq0v5RwxkSdf6TxyGfRife yKyH9YkeZqW2wAsqlioNHnynBd+RG/jASMbxja3T8kdabkd63HVr4G+apoXIWg0CYJ4sLfCejF+l jLxvn0wPIBSCuHfYs0k11zjNmNJP3cJrrqot0PFxkAEQEAAbQUQ2FpcGVuZ2h1aV9jQDE2My5jb2 2JAbAEEwEKABoECwkIBwIVCgIWAQIZAAWCW3ZWzAKeAQKbAwAKCRDH2IUrHygpuG5nDACTq2HExC WD4ZZhICb5PSXQ9lAAjlnzKNobc8eo68URwklgiO6YCTTQC4ZNwinduloh6PuQZqcMZdySDCcPjx FhG7fSz445E9wWY5acOIa7V5HkNKzScI2w6RIoOCxCjy+D3+Asl5CdCANrkDpQhP9scDg/H6JWO5 qERoBtWe5IghCHmvp+PMgoiIGd6TcKbWJZ13L+8ZFRRAiwD1v3Z7I1acX/REWTNVjpz5VpCshc25 V9/2MuIdyWV8QxH+fL63CFXwW/1sD15wvxX+9W702RLR9FJyycxK6d7F8BUjnr5h8ECYbllItT5/ BkxruLky7Zyd6gZr2lWCYpES8Bu01QfnScBp54Y3K9k9iEm31s09S4H4OhzzL7c7pAqFgDrg+kRT BfnjxIGlw4hPtpBiFQc8LVN+0hQiR7mZopooq8MuNmX2t9jZQ2Po153J8XDloDe3q+FoIvPJcJrd mjJW85OtAL9iO/2FwBMmT8QujiJk+SzVXKnK/agGL0E/CcwSi5AY0EW3ZWzAEMAJ9icjRpGSdFyb skV4DqPGgL+MCaw38PJUbAL9hsV9/prPd/2It78N4CbPosP/oA7MtYhuo2TULhjyjrxa+8GNDHEB FxPx3w1npASqUXs3xiNJSUle9c9I1O9U9Rq7hhlHMi2eBr8S/3bIVZWWn4K3fJ6v4rPGv+19tlZC qII9fRtIS1ToY248eQmbdfTYsfwrNgbZfVnsjj3KUMWG0Fg7kfxniqpa1JGKroZfaCXAA3okHS9Z 25LidQ+LoKe0cmwetUJYD2yevkEq3ab0NZSw6fc2tZG4CEj7KiLPihquwGCI/r7uAI0tSae/UnM6 /Y+P8FS4X+hzn6GKlyl4vFeAI3CM1T5Mle++U4oobWy8PW9sv/6x08MliXqb+mIDEYhlfxS044qO aEjbvfGSX3owq/B0xcOrSqJSUF73tdMieyWFzakPAuCVUUSPp1Ww2/RAVRn6jIy25sjgw6HPV3aa CtOwIfUL/jD4hbSXH/ozEU4XH+P+HQw7RuTgZPu4huFQARAQABiQGfBBgBCgAJBYJbdlbMApsMAA oJEMfYhSsfKCm4tigL/iMa7GcVb6MtqQNCYkXWJFAsPLzK6JuvzA8ZbAg9H6m0cubet0sIVY61M2 j2XF0pnXthRWIWnVAkRW1SKJ9kSVXhY3lmc+Qbaf8LxjaSzXOu+PMDpCGujS7v6O6fkL7398rgsu oP42K56SLZfe6A3v0J/Yf5bThyvlfLBTGHeP4ZACQmRh8E/vy5Raonjc/YiBkpLFczc9rwg75BI4 iUw/rNcrGh4iy3Oqk1ZfxNeWyHl7vXrD64YHWCMi/p+cV8kZRldf7l1w1dpYGVXwEdcSjngHiKEx 4HGm4i/PskiQpSaRyUHP4ftRi29tmT7URHd7pX1eaT2cnBGbawvkqmTnNm2ThjcH5bUemw4UC9RF OnAXGxcZdUMMFKYXBjPP8k0ZsXDozcFrfglcNhvWhix/xvDitmiVZSxlma9v17auWzfPghHwaPKr RSWFBscvtjmRN122/bXqpSRHcRK/rAt8u4fyAvSqEe+06PYKnvs+i6w9yfDnd5LJ57ySGDw1S+ow == To: tuhs@minnie.tuhs.org, "G. Branden Robinson" , tuhs@tuhs.org From: Caipenghui Message-ID: <37B2A52F-43FC-4B2B-970F-170318D89135@163.com> X-CM-TRANSID: EsCowABnoYcLQwpc5LQ7Dg--.7266S2 X-Coremail-Antispam: 1Uf129KBjvJXoW3WF4UCFy7CrWfZF45uFW3KFg_yoWxtry7pF WfK348KF4kX3s2y3s7Z342v3WrC395J3y5Jr95W3s7Kws8uF929r1fK3Wa9a47urn7Wa1a q3yjvF95Can0yaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07b8wIgUUUUU= X-Originating-IP: [175.7.16.182] X-CM-SenderInfo: xfdl1v5qjk3xlbf6il2tof0z/1tbiOQwWh1XlifUSqwAAsR Subject: Re: [TUHS] c's comment X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" =E4=BA=8E December 7, 2018 8:31:56 AM UTC, "G=2E Branden Robinson" =E5=86=99=E5=88=B0: > At 2018-12-07T04:27:41+0000, Caipenghui wrote: > > Why can't c language comments be nested? What is the historical > reason > > for language design?Feeling awkward=2E >=20 > I'm a callow youth compared to many on this list but I'll give it a > try=2E >=20 > My understanding is that it's not so much a historical reason[1] as a > design choice motivated by ease of lexical analysis=2E >=20 > As you may be aware, interpretation and compilation of programming > languages often split into two parts: lexical analysis and semantic > parsing=2E >=20 > For instance, in >=20 > int a =3D 1; >=20 > A lexical analyzer breaks this input into several "tokens": > * A type declarator; > * A variable name; > * An assignment operator; > * A numerical literal (constant); > * A statement separator; > * A newline=2E >=20 > Because we'll need it in a moment, here's another example: >=20 > char s[] =3D "foobar"; /* initial value */ >=20 > The tokens here are: > * A type declarator; > * A variable name; > * An array indicator, which is really part of the type declarator > (nobody said C was easy); > * An assignment operator; > * A string literal; > * A statement separator; > * A comment; > * A newline=2E >=20 > The lexical analyzer ("lexer") categorizes the tokens and hands them > to > a semantic parser to give them meaning and build a "machine" that will > execute the code=2E (That "machine" is then translated into > instructions > that will run on a general-purpose computer, either in silicon or > software, as we see in virtual machines like Java's=2E) >=20 > There is such a thing as "lexerless parsing" which combines these two > steps into one, but tokenization vs=2E semantic analysis remains a > useful > way to learn about how programs actually become things that execute=2E >=20 > To answer your question, it is often desirable, because it can keep > matters simple and comprehensible, to have a programming language that > is easy to tokenize=2E Regular expressions are a popular means of doing > tokenization, and the classic Unix tool "lex" has convenient support > for > this=2E (Its classic counterpart, a parser generator, is known as > "yacc"=2E) >=20 > If you have experience with regular expressions you may realize that > there are things that it is hard (or impossible[2]) for them to do=2E >=20 > In classic C, there is only one type of comment=2E It starts with '/*' > not inside a string literal, and continues until a '*/' is seen=2E >=20 > It is a simple rule=2E If you wanted to nest comments, then the lexer > would have to keep track of state--specifically, how many '/*'s it had > seen, and pop one and only one of them for each '*/' it encounters=2E >=20 > Furthermore, you have another design decision to make; should '*/' > count > to close a comment if it occurs inside a string literal inside a > comment? People might comment out code containing string literals, > after all, and then you have to worry about what those string literals > might contain=2E >=20 > Not only is it easier on a programmer's suffering brain to keep a > programming language lexically simple--see the recent thread on the > nightmare that is Fortran lexing, for instance--but it also affords > easier opportunities for things that are not language implementations > to > lexically analyze your language=2E >=20 > A tremendously successful example of this is "syntax" highlighting in > text editors and IDE editor windows, which mark up your code with > pretty > colors to help you understand what you are doing=2E >=20 > At this point you may see, incidentally, why it is more correct to > call > "syntax highlighting" lexical highlighting instead=2E >=20 > A well-trained lexical analyzer can correctly tokenize and highlight: >=20 > int a =3D 42; > int a =3D "foobar"; >=20 > But a syntax parser knows that a string literal cannot be assigned to > a > variable of integral type--that's a syntax error=2E It might be nice if > our text editors would catch this kind of mistake, too, and for all I > know Eclipse or NetBeans does=2E But doing so adds significantly more > machinery to the development environment=2E In my experience, lexical > highlighting largely forecloses major categories of fumble-fingers or > rookie mistakes that used to linger until a compilation was attempted=2E > Back in the old days (1993!) a freshman programmer on SunOS 4 would be > subjected to a truly incomprehensible chain of compiler errors that > arose from a single lexical mistake like a missing semicolon=2E With > the > arms race of helpful compiler diagnostics currently going between LLVM > and GCC, and with our newfangled text editors doing lexical analysis > and > spraying terminal windows with avant-garde SGR escapes making things > colorful, the learning process for C seems less savage than it used to > be=2E >=20 > If you'd like to learn more about lexing and parsing from a practical > perspective, with the fun of implementing your own C-like language > step-by-step which you can then customize to your heart's content, I > recommend chapter 8 of: >=20 > _The Unix Programming Environment_, by Kernighan and Pike, > Prentice Hall 1984=2E >=20 > I have to qualify that recommendation a bit because you will have to > do > some work to port traditional K&R C to ANSI C, and point out that > these > days people use flex and bison (or flex and byacc) instead of lex and > yacc, but if you're a moderately seasoned C programmer who hasn't > checked off the "written a compiler" box, K&P hold one's hand pretty > well through the process=2E It's how I got my feet wet, it taught me a > lot, and was less intimidating than Aho, Sethi, and Ullman=2E >=20 > I will venture that programming languages that are simple to parse > tend > to be easier to learn and retain, and promote more uniformity in > presentation=2E In spite of the feats on display in the IOCCC, and > interminable wars over brace style and whitespace, we see less > variation > in source code layout in lexically simple languages than we > (historically) did in Fortran=2E As much as I would love to have > another > example of a hard-to-lex language, I don't know one=2E As others > pointed > out here, Backus knew the revolution when he saw it, and swiftly chose > the winning side=2E >=20 > I welcome correction on any of the above points by the sages on this > list=2E >=20 > Regards, > Branden >=20 > [1] A historical choice would be the BCPL comment style of '//', > reintroduced in C++ and eventually admitted into C with the C99 > standard=2E An ahistorical choice would have been using '@@' for this > purpose, for instance=2E >=20 > [2] The identity between the CS regular languages and what can be > recognized by "regular expression" implementations was broken long > ago, > and I am loath to make claims about what something like perlre can't > do=2E Thank you for your wonderful comments=2E I don't know what Dennis Ritchie = thinks? Why not put the comment content in the first line of the comment? First example: /* hello world * * */ Second example: /* * hello world * */ To be honest, this is a bit uncomfortable for some perfectionists and look= s uncomfortable=2E As Dennis said a word, "You can't understand this=2E" Caipenghui