From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/16226 Path: news.gmane.org!.POSTED!not-for-mail From: Kolen Cheung Newsgroups: gmane.text.pandoc Subject: Re: Markdown, tables and CSV Date: Tue, 29 Nov 2016 18:06:40 -0800 (PST) Message-ID: References: <047d7b86ebe83c062b05332eab9b@google.com> <20BF19CB-A2B0-4B19-A749-D750CDD89736@martinfenner.org> <7e398825-a285-4e73-ad3d-908f1f141589@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_760_91323986.1480471600175" X-Trace: blaine.gmane.org 1480471603 29932 195.159.176.226 (30 Nov 2016 02:06:43 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 30 Nov 2016 02:06:43 +0000 (UTC) Cc: mf-+Z+QprJ1jbpwFuiNLMe2Ig@public.gmane.org To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCS252WXTEIBBMHI7DAQKGQEGTYMOFI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Nov 30 03:06:38 2016 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-io0-f184.google.com ([209.85.223.184]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cBuI9-00071S-G3 for gtp-pandoc-discuss@m.gmane.org; Wed, 30 Nov 2016 03:06:37 +0100 Original-Received: by mail-io0-f184.google.com with SMTP id c21sf28433054ioj.0 for ; Tue, 29 Nov 2016 18:06:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=sender:date:from:to:cc:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=bk9/dDb+kOR7ZwtjrMZt49VxJ5gyeLiaOQtgK2/QrFI=; b=kkhVul9h6vXddsNJidOmDog+8og6bID2rKAk8lWzE1rsjUm5QjkYJBt0ZQjaebkYce qtLPDfOMVhHWefs0L+GPBGeUYptO2OmXWup5mkkNs5X8aX9ZLe7jTh3t2Rn+0hk7wIlC N5t4Q+a2LpVxKpuLhMzjI+Dis0n8UCFSppG+KkpDVYpXLBwzfuZMxoFKUHPgFL30qIVb +JdxYcSap2r7JiJm99Jh8G4g7Fa/WF0wShM+CS+Nu/QdgnFjysLB3G3r87OGNQEZfC+E 6i6Bg+dYY6Q+zKYU35r3aJLD9dvE5OByLP8Z6/Br0RelwAkpOOf2hTnpghDDttFGHtR0 UZYA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=bk9/dDb+kOR7ZwtjrMZt49VxJ5gyeLiaOQtgK2/QrFI=; b=LDsOyxImm+f54dapI3+O9Y3X/3IwhapJzgdykgwAn4F4kphh4bH04HSoYjkjdn/CBd 3wdbuaZxVlRTj+kcEVarHLS0ZQySVeHL70N1WMKxCFZX8caI/K/LPpeokyeGZSp3zIEm r3+NAr6KvcDr8OQJAmoTV2mLcOPTekAR9el4jXHcQXI1snbOeVX0+lyE1ho6ICG/oE2B FGJtVSnBnvAvk0ZJEwsl4xHalhAigEdUoVOCRdVIPu3cycoYiz9Qv2/YEnMIgmY4zvZX aAyF8WtsgJIYf17cLo8Pu47AS4k+jbO1wGXYxjrG7AgWfJw/YIXFy6BlyFaXHrVFipNd ysaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=sender:x-gm-message-state:date:from:to:cc:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=bk9/dDb+kOR7ZwtjrMZt49VxJ5gyeLiaOQtgK2/QrFI=; b=hYfFRAR15w2DQ/5UhPtq8meZhk+LwxxJlMBVyEW2n+iTk5/jhxwc47zyf5chdDCFvV txnZsJR3qIkHCAs96Nk2ZnPDxOcrFL+W2DvXUMyxGEKMCFMYjW/Jnk6N6DEwx6g3qugN tZUsSV9foKI5NBORMBzeCHx+mm5DfU0Bn92VaVfdjl00732qN1/BOSNaKvN+qsOUDsnD vwF/AApKGYtwXMmk+V6W0xl/rxM9/mxWX1TdPZMnZyIZlc8EOel5gDmyTgUmrYocrKe4 +7L4UkPUEJ5a1LnxTJkLvoAMt5k+ibO9gpIZWfWeTwebxYclJKRptrWZjW8OxFCEm86a /c6Q== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AKaTC02yVeEKbdiFS9z2QGUo99NMgWiDBCNJ4C2/p2wTlq4jJcEy2TMYvL03TdKn9BMXGQ== X-Received: by 10.157.56.132 with SMTP id p4mr1469341otc.20.1480471601200; Tue, 29 Nov 2016 18:06:41 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.157.31.122 with SMTP id x55ls16438989otx.36.gmail; Tue, 29 Nov 2016 18:06:40 -0800 (PST) X-Received: by 10.157.37.59 with SMTP id k56mr1464024otb.3.1480471600724; Tue, 29 Nov 2016 18:06:40 -0800 (PST) In-Reply-To: <7e398825-a285-4e73-ad3d-908f1f141589-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:16226 Archived-At: ------=_Part_760_91323986.1480471600175 Content-Type: multipart/alternative; boundary="----=_Part_761_1039976164.1480471600176" ------=_Part_761_1039976164.1480471600176 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Idempotent in this case means pantable and pantable2csv are =E2=80=9Cinvers= e=E2=80=9D to=20 each other. e.g. pandoc -t markdown -F pantable -F pantable2csv test.md=20 should be identical to test.md. This can be done because pantable2csv=20 losslessly represents all info in the pandoc=E2=80=99s AST into YAML+CSV in= =20 code-block. The only cases it ain=E2=80=99t lossless are 1.=20 =20 when pandoc parses the markdown in each cell to AST, and pantable2csv=20 passes those AST back into markdown by using pandoc. i.e. whether this i= s=20 idempotent or not depends on pandoc=E2=80=99s markdown -> AST -> markdow= n=20 conversion. 2.=20 =20 Potentially the width might have some truncation error, especially when= =20 the to-format and from-format are not the same. =20 When I say I achieve [image: P^3=3DP^2], it means that pandoc -t native -F= =20 pantable -F pantable2csv -F pantable -F pantable2csv -F pantable -F=20 pantable2csv csv_table.md =3D pandoc -t native -F pantable -F pantable2csv = -F=20 pantable -F pantable2csv csv_table.md (which is part of the unit test). The= =20 diff between [image: P^2] and [image: P] is exactly from (1). A corollary to (1) is that it is kind of slow, since each table cells call= =20 pandoc for the conversion once. Probably nothing can be improved except to= =20 reinvent the parsing of tables (probably there=E2=80=99s no way to tell pan= doc to=20 ignore markdown in cell, while not escaping character sequences). (And=20 tables has [image: m \times n] cells so inherently it will be slow.) But I= =20 don=E2=80=99t quite worry about the performance aspect if it is going to so= lve a=20 workflow problem. Eventually I think I=E2=80=99m going to make a thin wrapper of both to prov= ide a=20 cli version. And then automator scripts can be created. Then basically I=20 can select the table in text editors and call system services to convert it= =20 in place. i.e. highlight table, convert to csv, edit, highlight and convert= =20 to table. This system-service-part won=E2=80=99t be cross-platform though. = (By the=20 way, a sad news is Apple just fired the one responsible for Automator and= =20 Applescripts, and kill the whole team! This used be the forte of OS X!) In a sense, pantable2csv gives one the power to edit the table =E2=80=9Cin = AST=20 directly=E2=80=9D easily, while pantable provide a way to pretty-print it b= ack in=20 native markdown. Note that after commit 298e6f3, all pandoc table=E2=80=99s= info=20 has a markdown representation (grid_tables only misses alignment which is= =20 added in that commit). =E2=80=8B --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/e6213b58-11e9-4948-80c2-650347e26c2e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_761_1039976164.1480471600176 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Idempotent in this case means pantable a= nd pantable2csv are =E2=80=9Cinverse=E2=80=9D to each = other. e.g. pandoc -t markdown -F pantable -F pantable2csv te= st.md should be identical to test.md. This can be d= one because pantable2csv losslessly represents all inf= o in the pandoc=E2=80=99s AST into YAML+CSV in code-block. The only cases i= t ain=E2=80=99t lossless are

  1. when pandoc parses the markdown in each c= ell to AST, and pantable2csv passes those AST back into markdown by using p= andoc. i.e. whether this is idempotent or not depends on pandoc=E2=80=99s <= code style=3D"font-size: 0.85em; font-family: 'Latin Modern Mono', = Consolas, 'Liberation Mono', Menlo, Courier, 'PingFang HK',= =E8=98=8B=E6=96=B9-=E6=B8=AF, 'PingFang TC', =E8=98=8B=E6=96=B9-= =E7=B9=81, 'PingFang SC', =E8=98=8B=E6=96=B9-=E7=B0=A1, PingFang, &= #39;Microsoft YaHei New', 'Microsoft Yahei', =E5=BE=AE=E8=BD=AF= =E9=9B=85=E9=BB=91, SimSun, =E5=AE=8B=E4=BD=93, STXihei, =E5=8D=8E=E6=96=87= =E7=BB=86=E9=BB=91, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white= -space: pre-wrap; border: 1px solid rgb(234, 234, 234); background-color: r= gb(248, 248, 248); border-top-left-radius: 3px; border-top-right-radius: 3p= x; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; display= : inline;">markdown -> AST -> markdown conversion.

  2. Potentially the width might have some tru= ncation error, especially when the to-format and from-format are not the sa= me.

When I say I achieve 3D"P^3=3DP^2"= , it means that pandoc -t native -F pantable -F pantable2csv -= F pantable -F pantable2csv -F pantable -F pantable2csv csv_table.md = =3D pandoc -t native -F pantable -F pantable2csv -F pantable -= F pantable2csv csv_table.md (which is part of the unit test). The di= ff between 3D"P= and 3D"P" is = exactly from (1).

A corollary to (1) is that i= t is kind of slow, since each table cells call pandoc for the conversion on= ce. Probably nothing can be improved except to reinvent the parsing of tabl= es (probably there=E2=80=99s no way to tell pandoc to ignore markdown in ce= ll, while not escaping character sequences). (And tables has 3D"m ce= lls so inherently it will be slow.) But I don=E2=80=99t quite worry about t= he performance aspect if it is going to solve a workflow problem.

Eventually I think I=E2=80= =99m going to make a thin wrapper of both to provide a cli version. And the= n automator scripts can be created. Then basically I can select the table i= n text editors and call system services to convert it in place. i.e. highli= ght table, convert to csv, edit, highlight and convert to table. This syste= m-service-part won=E2=80=99t be cross-platform though. (By the way, a sad n= ews is Apple just fired the one responsible for Automator and Applescripts,= and kill the whole team! This used be the forte of OS X!)

In a sense, pant= able2csv gives one the power to edit the table =E2=80=9Cin AST direc= tly=E2=80=9D easily, while pantable provide a way to prett= y-print it back in native markdown. Note that after commit 298e6f3, all pan= doc table=E2=80=99s info has a markdown representation (grid_tables only mi= sses alignment which is added in that commit).

=E2=80=8B

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/e6213b58-11e9-4948-80c2-650347e26c2e%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_761_1039976164.1480471600176-- ------=_Part_760_91323986.1480471600175--