From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/30491 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: RTF to Markdown questions Date: Wed, 27 Apr 2022 11:28:14 -0700 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30964"; mail-complaints-to="usenet@ciao.gmane.io" To: Kris Wilk , pandoc-discuss Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBQUWU2JQMGQE4VYGOOQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Apr 27 20:28:23 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qk1-f186.google.com ([209.85.222.186]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1njmOj-0007nh-9q for gtp-pandoc-discuss@m.gmane-mx.org; Wed, 27 Apr 2022 20:28:21 +0200 Original-Received: by mail-qk1-f186.google.com with SMTP id o13-20020a05620a0d4d00b0069f47054e58sf1693142qkl.13 for ; Wed, 27 Apr 2022 11:28:21 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1651084100; cv=pass; d=google.com; s=arc-20160816; b=M/bpdJVzzeNYdCZ07JR8Lpcohq7BE+UqyHcXUMlxhH53zRluBMCpEe7BSfUifYkpM2 MZnPzXwx+pw98ozNjWxHbZpKgBZb8kx0mQPkdjrpR/fJ6avSnKWEhKid635TPWPVCfmi AL7GiCwQ4qfalvKeMwV09PHBwnTG3ZvN4KXKbvu/KXu/Y3gFF151YJOG9TQTiKjnHVO5 OKI2CQwy/eC1U9z3sJdoB396owwHAqnB0s58nrxH6diBDZQqmHO/XRWNyWBMWZKjlOk7 z7aBGAIFeM76PH3u9u9UyqzSR5mvWLEFMMJgWj5xd5z9omox9YPV5PHweCpoNXSnSMci Dz5A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :date:references:in-reply-to:subject:to:from:sender:dkim-signature; bh=UiyoYd9SRXIPPy1671xOroy7HZkzzi4plXxk6ydHnlg=; b=PsFuV6ocWoOC6oUF8xm2UwAUDMCucYHoZBMiaLrDfY0oAiH3mXnwm9jpccqKJB2WeL p1r5R54hUKSZQ4UpFEYhlhEnw3rC+WZpkZTm9dxnUa6s3V/LZRyf3Qi6f4SEl2al7a8S lyAYlOlRHQkdheZKsZ5BNkjNLO64UhRZzqsrq0r8ME7nhx41riMRoKTEUBcvOZMOAwmq FaYcuhwtuZ0EHPvJf8BC4EiA943iLz8/LKPflgK5OXDPeAugE+2ZYVEW6jTmwyBoI/EH tcSCrqltQmhWlsrR+gaKgzfTrU9RWgERAwWe9Pkhi3sYQs7bCCFe7BGXmA0lJQCTErqT DWtg== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20210112.gappssmtp.com header.s=20210112 header.b=oOUSf5Iq; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::1036 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:from:to:subject:in-reply-to:references:date:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=UiyoYd9SRXIPPy1671xOroy7HZkzzi4plXxk6ydHnlg=; b=q8B1q9U/xzhcc317VmBcfd7cB7ov3yQwR9iBTpsRwVntMJn3ze212zes7ysH73DtO6 +QdSiWMBMyVEjdib7YD9hWW3mgEa5Rpj42ihC+auqoKXhDG/SC02B2OlmkgVcaXUPRxd hU69wj7mJOBoyVKMgGEMIjZI4coVd85/3vvaoOfGKxE+Y4jSPwJXQ4rKT/DtaWHU8te2 wFpi5tYflF0Dg7gIBkOO2QFHcBlzniBIKk/nQQKcREB4F33aIMhJDHxuSyrycGj1z4qN y1d24Hp6lpzRsdS4Kk734EGBeI1bc0+u0HQuqKW/cbBhPoCVbMRpKgWKMBnTmkxZmqPS gm3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:from:to:subject:in-reply-to:references :date:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=UiyoYd9SRXIPPy1671xOroy7HZkzzi4plXxk6ydHnlg=; b=bU/+lufk2IHCPakOKDGlGLGMLOuda32HAUHlxWY3X0VTNa59WR5Ci348zlJeChJu61 Zum5o1meysQGe6xxV5B2eBTZeNoglhsOjoTnk1VUU0qqnAtosbFK35dU8lo8S6FD+v0u T5WibdmXUqRmqCaV3F04m/ufRDuAropVyNX1jGF6XjEGVZcw2WMBurkp7LlQEF5NHuWm YXU7ugQk4yS0xhVMl6NUhh9SMVd63iS6faZuhVqI7pH4asFc8INI0cpZWHOuF0+UGJbc YhvEMjAWC0BaPY5R6/vvGWPJh7hJTAXVHQzmzfSvz+6MYkm2o8hklliza1cRVLvHBbta cC/g== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM530QbyZbncDlT5Q31MYU/tf0UsOMCmH4ZAaxsZ6Ico7hQRozLdVz fjOfx3mfD0cZJNNhC4P8/DQ= X-Google-Smtp-Source: ABdhPJwuB/Q05XCqcGt9GSUmBJ5pAyd+JAFhDx1Wa1ssdT1nBFWSZM/RLZZ8B5UOdkHk/sv4g6jKHA== X-Received: by 2002:a37:68c3:0:b0:69f:9b94:6724 with SMTP id d186-20020a3768c3000000b0069f9b946724mr1149126qkc.15.1651084100272; Wed, 27 Apr 2022 11:28:20 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:620a:4410:b0:69f:4444:7eee with SMTP id v16-20020a05620a441000b0069f44447eeels3666853qkp.8.gmail; Wed, 27 Apr 2022 11:28:18 -0700 (PDT) X-Received: by 2002:a37:5cd:0:b0:69f:8c18:ecce with SMTP id 196-20020a3705cd000000b0069f8c18eccemr2580795qkf.162.1651084097975; Wed, 27 Apr 2022 11:28:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651084097; cv=none; d=google.com; s=arc-20160816; b=St3YkaoQNFTPb9TNjXys4zkXwUtDoTpeUccHvE4mInGgACWLrNBLBAEAGj0XxVdwfw d74EYpbR8jC2We6uGEuraiwtw+cCtO+JDyzNyw6lf+RzZgI2SYSgu/dV99LFk7psVKl9 7u3TbWMjNr76uNiQAPzDE/f8Bh5zU7s4c7Hf8IPW03JV0u34uuAtCWm6pUSsvFGeJx8M 8eFdY04VRjKQztJHwH9CHPiXj+L5EyulmjlaP7JIShVXn84dtVmg4yJ70TIbDiyiBozn bb8NJKE/UKwjdAm69mIq4wuDSbbS89pBTZ4LQoIiZjqehEZ0sI8R4t/87da1GKSknzsk Lh6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :dkim-signature; bh=CTWapi+mz3Go2we6Rhw5X0zGZs6IVJGLVFasSui/2nE=; b=zdXRBs5mnXwE/4ila8jfswXF4Z+vWeLbmnx3ASFjkZggu8fkBOB3aoINGm9R+pDRNX 6c10N20PSmkIlK5dcTzp3WpTTREzW2o3DFROl29TK3LmvpaT3eHKAXluVNFjBN+3jWK9 wUWCE5vkQzNJgSvKvKUiOjXldJdKQen9sF3zES5vedWArVPmRi3nSvX0yvrzdNp8nPgY lMbIL6OTCoeY18/b+4FVE+rCagiQPfb+lvt0CJs2jiEvM3XiweUqm37aP3ntxZ4FjOY4 9F8JRExQRPkXbB5/hDQA/exnerU9Pd+LpuC3EBh79AVEATMozOKrsGwnVQ8NwWrVe+he WpwA== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20210112.gappssmtp.com header.s=20210112 header.b=oOUSf5Iq; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::1036 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com. [2607:f8b0:4864:20::1036]) by gmr-mx.google.com with ESMTPS id o3-20020ac85543000000b002eb870d94ffsi196814qtr.2.2022.04.27.11.28.17 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 27 Apr 2022 11:28:17 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::1036 as permitted sender) client-ip=2607:f8b0:4864:20::1036; Original-Received: by mail-pj1-x1036.google.com with SMTP id fv2so2166187pjb.4 for ; Wed, 27 Apr 2022 11:28:17 -0700 (PDT) X-Received: by 2002:a17:902:a712:b0:158:9e75:686c with SMTP id w18-20020a170902a71200b001589e75686cmr31337660plq.56.1651084096904; Wed, 27 Apr 2022 11:28:16 -0700 (PDT) Original-Received: from hermes.johnmacfarlane.net ([45.32.92.108]) by smtp.gmail.com with ESMTPSA id o41-20020a17090a0a2c00b001d75aabe050sm3634262pjo.34.2022.04.27.11.28.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 11:28:16 -0700 (PDT) Original-Received: by hermes.johnmacfarlane.net (sSMTP sendmail emulation); Wed, 27 Apr 2022 11:28:14 -0700 In-Reply-To: X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20210112.gappssmtp.com header.s=20210112 header.b=oOUSf5Iq; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::1036 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:30491 Archived-At: The issue with bold is probably because the RTF file includes some spaces inside the boldface emphasis. That is depressingly common in word processing documents, and we have code in the docx reader, if I recall, that handles it by converting helloSPACE to helloSPACE We could port this over to the RTF reader, I think -- can you put up an issue on the tracker so we don't forget? The other issue can be handled using a simple Lua filter. Save it as ununderline.lua and use -L ununderline.lua on the command line: function Underline(el) return el.content end You could probably handle the spacing issue with a more complex Lua filter, as well. Kris Wilk writes: > Sorry if anyone gets this twice, had to correct my formatting... > > I'm trying to use pandoc (for the first time) to convert some RTF files to > markdown. My goal is to extract the text with ***bold*** and **italics** > preserved and no other formatting. > > Simply converting with "pandoc in.rtf -o out.md" produces a markdown file > that's not quite what I need. For instance, here's a line from the output: > > **[Scientific Name]{.underline}: ***Aplysia parvula *Morch, 1863 > > FIRST and foremost, pandoc tries to preserve the underlined text, which I > don't want. Can this be disabled? I've tried the "bracketed_spans" and " > native_spans" extensions but this still processes the underlines as: > > **Scientific Name: ***Aplysia parvula *Morch, 1863 > > SECOND, at least when I view this in VSCode's markdown preview, the bold > and emphasis are not presented correctly, I guess because they touch each > other or have spaces (or both?)? It displays correctly if it's: > > **Scientific Name:** *Aplysia parvula* Morch, 1863 > > I realize that the text in the RTF might have the bold/italic tagged > weirdly but is there a way to deal with this or am I just stuck? I have > about 500 such files to process, so I'm looking for automated methods. > > Thanks in advance for any help you can provide! > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/aecd40a2-09db-4e1b-96ad-752973375e0cn%40googlegroups.com.