From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29833 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Joseph Reagle Newsgroups: gmane.text.pandoc Subject: Re: Pandoc Document Model in Python Date: Wed, 22 Dec 2021 13:05:23 -0500 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13476"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBD65ZAVVYEKRB2WRRWHAMGQELWDWWJY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Dec 22 19:05:33 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-il1-f186.google.com ([209.85.166.186]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1n05zY-0003LD-7o for gtp-pandoc-discuss@m.gmane-mx.org; Wed, 22 Dec 2021 19:05:32 +0100 Original-Received: by mail-il1-f186.google.com with SMTP id u8-20020a056e021a4800b002a1ec0f08afsf1742200ilv.7 for ; Wed, 22 Dec 2021 10:05:32 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1640196331; cv=pass; d=google.com; s=arc-20160816; b=Jy0/KNagau1bet6s52msNAz/ZLyQAjBeKYVFcumEbCzNqCVM9JhyEOGYk84jUBW/bq HitYqmoeffw2hQ2xZHmkZL2frnn+D3s4nzOJBFQbyTwpU3fZQK+5r22UjGPvDTPXuxof xE1sQNDW2p+Dag6ARmR8XSYG2JWPStC60bVlxLQ1+WYPINmJRzjeJbY1ouFlBs2KdXCb 8tPizr2Ly/pKadQT2WeKbgrXeEAy6hjHfVgBjoCIb/Ut4yJPaNfVailH2ndOiiyESxso qzZX/7+Wiv8/+xJa6lwf2Ovgazp6Vn4Wmi1CKs+hUrUG4iwlwkTfZcb+PqDcoE9bA7+c pBTA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:sender:dkim-signature; bh=h55ugeQXLJC4xZPyB4XMEbDX806+QWAjmzeLCbEn4fA=; b=xcWG78Rze/8UskgyYRL3p9GpV0IeleWZ/aqdOZEAoX5BItGQ79WNQaxV2aGHbaIkPR Wv1SnUtrWoezSiD6ASZXl1AttNvFPEdpAkp0+zKGfGqlvcYjhGCRgJIt7nzqDTIV1P+8 modD8CHIAIW1GQ334WlqU9Jd3eiJRv6RpUUXjPQ0Dk92kzdjixA20FSxy2uP/0ihhBAM 4HnyxSfBpS5eP50R0bnRqVw2H/vwhkwaV3NjPtNowfBWZWJhnRJLglGjVTqDgWOGDmru Zw4FDIv9eGZNQHz00GBC+9mhVsNjSAMCNabJoN1anJ4ysoWkopNMc0hz2+c4h+Lybjf6 bBWw== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@reagle.org header.s=default header.b=DtcJ+DPp; spf=pass (google.com: domain of joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org designates 23.83.212.47 as permitted sender) smtp.mailfrom=joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=h55ugeQXLJC4xZPyB4XMEbDX806+QWAjmzeLCbEn4fA=; b=fVc5ic+I/Sl02XsMabgQKKipanZTOokWqttprlrfZgYZAMC4ian8jhATOgb6Fb0QbZ +YXrP+b2POWnRuMU/lhqO+kuS6//OrUSHpdb/CBicnCVixGAEhk9oxtWgLRESNYT2z4e oT4o42ZchaB2psnyJQhBfwL7ffMn9ZfGRukPqV4lFWvLUwQ6ReNDD44UWAFVHjx5Z0Nu f0deWbIyJ7sQWEV/mXs0XLlAw0TjaOty3e/NTda31o28GN9mG1ng+Z3s5djU0HsjaSyB wCy+KfPAMAYfCoDUkVj3RP6D+iQWt8mDUPAsPt0ggkiMbajwrR4TzZBy5ZRbDhMwApYz pyng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=h55ugeQXLJC4xZPyB4XMEbDX806+QWAjmzeLCbEn4fA=; b=bDYDpCgtr42SiG/fQnGaqMzT1KvE/hzbPrrk4jzMDdw8eeTzzrqUjOA4tP2aOpsNwR eGl7+gMzXSbRwTDfHgABhnj3o9c64ecp0v5P8qsuHOdi+o0NPKmPbNlOGxKxRya1lyFq 7ZKJOEnXgKCKQBVR6kINMNV5DGCc8PlSO8erzgV4RSkC/u/wax2fnmeCjv6PBYKxldvh qLpEQ1rXS/M+cWOozOKX0gBGAtX1nFBtRi3HYQ7sOt0Kik9ZlP9/3nH1GMq5DpAKht8y oNab2irGA7bMv/krwIagy+ebYNU7SU0AXwdF Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM531Bu1Q/rYpDwyZmme3xVI5o55JeH3j/ATtgoCWSqycFMcfa1i61 AnzN0+7t0vuidtNVhKNygJ0= X-Google-Smtp-Source: ABdhPJxvmNmThw/K7zg9eKKTKmX+TgdbGjjC684aJp85YsrXdn4qBo44HKvIrS5IN2UypcJ+h9KIuw== X-Received: by 2002:a05:6e02:973:: with SMTP id q19mr1962670ilt.206.1640196331180; Wed, 22 Dec 2021 10:05:31 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a92:c547:: with SMTP id a7ls466442ilj.6.gmail; Wed, 22 Dec 2021 10:05:28 -0800 (PST) X-Received: by 2002:a92:db52:: with SMTP id w18mr2117619ilq.216.1640196328933; Wed, 22 Dec 2021 10:05:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1640196328; cv=none; d=google.com; s=arc-20160816; b=RHWYpijEOXhhEDMNvQofAynmFguMEMQ/zeltuHk308b8HF3tpCQEaYEL26xx6vDjBE aGrmRBd4/Q+CDMZ5bdqy/4i9pV/FuhFeLqbyFeOEfzThbe8MD4UPZMqnIu7v+Q0JmRxP QwPYfuyJx1nykAkVnig6Yrd3cIBqgZ/Ie+RHDOWOLPUA6/wgvy6OXulpyKkiDh6yC+Cj c7D/8fq5dpPouBcvYWAUTMMB/VHwUauX9YgtdFJccWzRyBgL3MVfj6jHYr38jfxAEBlL K+OluIgAgs5tBvHbVTis+JyS0DaVoMV9JSm/VsGD3HrTT8pYvUJNOJL3rvbIUdDG5NAw /qSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:to:subject :dkim-signature; bh=dMaOGsraZ/0nMoSlkQWcpvqUFhLwk80oVJDsgrvKPJs=; b=ugPcDfOlIYA50viitAx/OA/Ens+A4WCQV41cfXJ6bQ1zQW1S4UBVAqQiknVr9K7KgP fcde0OMinZxWvz9hcy1TA3wbu2MwVUWUpvZoJ63IkZbxvcuP0Ivnmd1QXu50h6jMIZbG iJVjf1y/URWlfSmTHS3u8wBfNPi4KH1hABo94eB9Aork7dvai09twjsbcMovzcPI+Ml+ 1HBppHGavRiiP0+swCfNL6dLprX/MCr8rmzC96YrIr/zEDi/sc+3hgSd68AmBUCHVL2T p0FMSYKkKmwFt+x8cOzw2jvTLhR1yx+c2r+vwoE3IfICAwhZoK9RUalVy/e/sYsJjhME iiJQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@reagle.org header.s=default header.b=DtcJ+DPp; spf=pass (google.com: domain of joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org designates 23.83.212.47 as permitted sender) smtp.mailfrom=joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org Original-Received: from cyan.elm.relay.mailchannels.net (cyan.elm.relay.mailchannels.net. [23.83.212.47]) by gmr-mx.google.com with ESMTPS id y14si212095ill.0.2021.12.22.10.05.27 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Dec 2021 10:05:28 -0800 (PST) Received-SPF: pass (google.com: domain of joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org designates 23.83.212.47 as permitted sender) client-ip=23.83.212.47; X-Sender-Id: a2hosting|x-authuser|joseph-T1oY19WcHSwdnm+yROfE0A@public.gmane.org Original-Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 5B14E621979 for ; Wed, 22 Dec 2021 18:05:26 +0000 (UTC) Original-Received: from az1-ss21.a2hosting.com (unknown [127.0.0.6]) (Authenticated sender: a2hosting) by relay.mailchannels.net (Postfix) with ESMTPA id 98258622AFE for ; Wed, 22 Dec 2021 18:05:25 +0000 (UTC) X-Sender-Id: a2hosting|x-authuser|joseph-T1oY19WcHSwdnm+yROfE0A@public.gmane.org Original-Received: from az1-ss21.a2hosting.com (az1-ss21.a2hosting.com [68.66.224.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.120.81.136 (trex/6.4.3); Wed, 22 Dec 2021 18:05:26 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: a2hosting|x-authuser|joseph-T1oY19WcHSwdnm+yROfE0A@public.gmane.org X-MailChannels-Auth-Id: a2hosting X-Relation-Lonely: 0ca17ce21808362f_1640196326174_3887656194 X-MC-Loop-Signature: 1640196326174:3134334674 X-MC-Ingress-Time: 1640196326174 Original-Received: from c-73-149-23-48.hsd1.ma.comcast.net ([73.149.23.48]:54255 helo=[192.168.0.50]) by az1-ss21.a2hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1n05zQ-00049U-J3 for pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; Wed, 22 Dec 2021 11:05:24 -0700 In-Reply-To: Content-Language: en-US X-AuthUser: joseph-T1oY19WcHSwdnm+yROfE0A@public.gmane.org X-Original-Sender: joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@reagle.org header.s=default header.b=DtcJ+DPp; spf=pass (google.com: domain of joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org designates 23.83.212.47 as permitted sender) smtp.mailfrom=joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29833 Archived-At: I finally had a chance to read through the documentation: wow, impressive! = And by that I mean not only the library but the documentation itself. It's = rare to see such comprehensive and clear documentation of a new project. Fo= r folks who don't already read haskell, it's a great way to learn about Pan= doc. BTW: It's probably too late to change, but I wonder if you should've given = it a novel name? I wonder if it'll be easy for folks to find? (I'm not sure= on github why it's labeled as a predominantly JavaScript project?) Also, I wonder if there will ever be a higher level way of searching/transf= orming markdown in Python? Panflute is a bit higher-level and more python-i= diomatic, and your examples [1] are fantastic, but I crave the intuitive XM= L-based selectors (e.g., eTree, BeautifulSoup, and CSS). Your API, like mos= t, requires me to be familiar with the pandoc AST to do anything (e.g., met= a is the first -- `doc[0]` -- items in the document structure). [1]: https://boisgera.github.io/pandoc/cookbook/ In the examples below, I exercise the three options for pandoc and python. = I kind of like using pandoc to convert it to HTML, use those selectors, and= then convert back if need be... It's be great if panflute (or pandoc) had = high-level selectors. ```python # 1. Using pandoc API to print date # Requires I remember pandoc data model via list indices # No find/select; lots of iteration doc =3D pandoc.read(COMMONMARK_SPEC) meta =3D doc[0] # doc: Pandoc(Meta, [Block]) meta_dict =3D meta[0] # meta: Meta({Text: MetaValue}) date =3D meta_dict["date"] date_inlines =3D date[0] # date: MetaInlines([Inline]) print("pandoc:" + pandoc.write(date_inlines).strip()) # 2. Using panflute to print date # Data-model is a bit more intuitive. # No find/select doc =3D pf.convert_text(COMMONMARK_SPEC, standalone=3DTrue) print("panflute" + doc.get_metadata()["date"]) # 3. Using pandoc + BeautifulSoup to print date # Requires me to remember HTML model, but I'm more familiar. # Can use BeautifulSoup or CSS selectors doc =3D pandoc.read(COMMONMARK_SPEC) html =3D pandoc.write(doc, format=3D"html", options=3D["--standalone"]) soup =3D BeautifulSoup(html, "html5lib") date =3D soup.find("meta", {"name": "dcterms.date"})["content"] print("BS native selector:" + date) # CSS selector date =3D soup.select("""meta[name=3D"dcterms.date"]""")[0]["content"] # CS= S print("BS/CSS selector:" + date) ``` --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/c0c49e25-898d-c72c-3303-69005985ea01%40reagle.org.