From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29908 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Leonard Rosenthol Newsgroups: gmane.text.pandoc Subject: Re: Copy-pasting code from the PDF loses formatting Date: Thu, 6 Jan 2022 08:59:53 -0500 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000846db605d4ea492d" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2772"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCDIL7E46MGBBZXL3OHAMGQE32EAVMY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Jan 06 15:00:09 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-wr1-f63.google.com ([209.85.221.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1n5TJI-0000PS-Fy for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 06 Jan 2022 15:00:08 +0100 Original-Received: by mail-wr1-f63.google.com with SMTP id v18-20020a5d5912000000b001815910d2c0sf1317070wrd.1 for ; Thu, 06 Jan 2022 06:00:08 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1641477607; cv=pass; d=google.com; s=arc-20160816; b=P6rgSWJXw/zBR7IxBRJD6tURRC6pnFP84awPO9J1ldvegVpbTym4opT5QKcjPnslP9 heLI9n14pIEnt7fxKvr8rdgqhchJsZi025fR8dbN8HTu0lPIbIIu+LW3GXk7eSzA2ENf 2KuujJXkL/ppNyLPxF5wuRelfMxTpUoU1LDH8nN9XSv1nkpUtuOEBR8SRFDQ3qfXym94 wTSSFPMs5k/ElbvWSQXP4ZdoTu7MYE9amlu0mp2/8RTUdF01GeKPNHOHls+b2gKUeaWi MTIhzo1cUSu9+7xrnef+dKzCrWcE/mcLTKjZdxZnV/dnmcPTuTq1k4HfnnXAvm8TYwjA GqpA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:to:subject:message-id:date :from:in-reply-to:references:mime-version:sender:dkim-signature; bh=fVHNDHACrNEzx/lk7jRZ26adHIE0NeIUAVSmWuiv1TE=; b=c0tDPOTCImY/U5hpgOo9IK2CkJX+S09t797cSDWi3WvxLkwhXJkCgnUFWGj1iBcHNb ozVIt+aQycLRPkMR3EWj8ZCrFLuKgUNQwMW4WSVCLMgkOF5GWf7VfqnaO2e/JidSEYnJ sit3SWiaKO9iqa+7PHkZjdM105kalclE/dPrchlqSm/aHsFMEeKSUlAp1aLB8PUyhq1m AqzbbxGbasGI6fFNOxL2V72lKMR/fOMj+9VXorPwxeZtXPzTFa9H+5YRakvFaXlsjePA gaBnOYsLKzbqVW6wHgpw8r8hfyQ+sUIyPewc+aPwsutyfpNKH7MTh3nxwzxO1Y8au747 qd1w== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@lazerware-com.20210112.gappssmtp.com header.s=20210112 header.b=f019sdTQ; spf=neutral (google.com: 2a00:1450:4864:20::42b is neither permitted nor denied by best guess record for domain of leonardr-bM6h3K5UM15l57MIdRCFDg@public.gmane.org) smtp.mailfrom=leonardr-bM6h3K5UM15l57MIdRCFDg@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:mime-version:references:in-reply-to:from:date:message-id :subject:to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=fVHNDHACrNEzx/lk7jRZ26adHIE0NeIUAVSmWuiv1TE=; b=EIZmZYiYSf0sPj2lhkvFyN5sq3kY8AUHK6niHSWsbq6hThwcm8C6rqHC/zx67D2iRn fRjpOHQ0TFEPFFW6uPdSya9OkYSPlTyLyBgznaty9hEubod2z5KEiGLfuVsrz+Y6N0Hs D3+gMrFzdajtCftep3bDdYmW7SFUXstJpeq76TPOUDozWkuARB25oChAB25Rle0FKyTr BbDQqEqk7UEurgOret+PmY8PybGkomdLK2huHGD9SWjf+glPQHD+GMf/XMfzMBZzSDE+ VJSNPNt78r603B4G/cgIqbRDtVq1g/uZCFfA4bocaMKBDHeBUmF2Nlaa2fpwwVrVlFg3 Dmmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:mime-version:references:in-reply-to:from :date:message-id:subject:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=fVHNDHACrNEzx/lk7jRZ26adHIE0NeIUAVSmWuiv1TE=; b=6K9056n3/3Pi/d6fF/4T0fs+Hweopg2e8MRQJcDDanktSB8U9b5z/06wFdh9DC5Qtb msoh3Yr2rz9R4bnZcLTCuVkmBPZdb5UxbAlvrU4Heyk0JlQj55h478SCaFuNB5z2cAcY cbEpnjH40Bh8NGTTlILUO3psKQEh4vXWeh51Mk8ev6wXDdMKXsHzqueDVnKnJNlI0VKd qY8Gckys0d8vsSuqycYJv6/wI4bxY76KJS16SWxjH5T8rY2AF92mBFi5+UTgZTdqXeJP 45Td42F1nAqFyzz/LbkZyyjlDltvYWjXntpREKs0/skEC/ttg+eLlJt5xyUEJTuPCxIq oB6g== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM530wfY+1k0SNbIvXurpU/LKz1hhmQ3Ywvsf0BP5FVWd3ZFMaUmxY 3sKrdJjalDU8MT5nkR8rqoM= X-Google-Smtp-Source: ABdhPJzdMntu3/vZoFelfqfygYOMIoiEIfr3NNZf13vLUJZ82hb76ZoXqQkvSu/eTJJii99PiD1sVA== X-Received: by 2002:a5d:4f0e:: with SMTP id c14mr51886510wru.716.1641477607680; Thu, 06 Jan 2022 06:00:07 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:600c:1ca7:: with SMTP id k39ls834788wms.1.gmail; Thu, 06 Jan 2022 06:00:05 -0800 (PST) X-Received: by 2002:a05:600c:4e11:: with SMTP id b17mr7143709wmq.66.1641477605371; Thu, 06 Jan 2022 06:00:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1641477605; cv=none; d=google.com; s=arc-20160816; b=EBiRttJhqis154/BznPLuCFBnkj9y3Tf86utls+qFl2TnqdGk9hrqijh2hXYKK5iE7 HJWZHZcPo/OrTTN4EcxZX1kl9JkZAkn8o1NFykXdT1li899c7FEPUpuQ1Z0owv24vE7f kIfqaiYcviPmjz8uK0rBpA75ZEq9BLipDVU58TaP79ZoD53MbHMWjZmCSjnZ3sJGxvAu P65lcihNzIuBJVi8Ki0lyF/rryNgX/aj0n1rURi6aiJTCJOlddhm/pdgpwCLIK9qKM1t h97jHcORhQ/Cr+Y0tvbwAnbiYsDSaoIJB2eaoZp0TC8e/dcc1n0d0ToEwenbvad7bKYQ FujQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=oLZNjuFtZNO/6u+JlL7ekZ6lAJsjN0OryYeM+6TtH6Y=; b=W7bPBVzqxEmNoEetASJKPpI6tdNGgnUTLuJCu9PxvAjBIcSBs5E4n4I/8zERZp5RIi bpuniNh0EM8jCGad/5RmuJVl1QW76mD3a84XX+AAEmDEHFSVCfck2Tr8GmElf2tPbIYD BiDRc3yElQSOjx1WXtiLnIA97FMpZ8USOmrti+QjVDthT0lKjZnfYG0eWLdQ3GfYQYol sXpIl7kRd4YnezojzYSlO/hQVF8kVAo/Oxgtd/Vi6AvgcrQRCtC/koRYaHVyTz4oNngl VUvERfx4/S9DPF5XGW+WCH+ogLByzVBuz0jVvbicpJQBr8WmnzKpwfq4lFqB3dnVf8Cw J39Q== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@lazerware-com.20210112.gappssmtp.com header.s=20210112 header.b=f019sdTQ; spf=neutral (google.com: 2a00:1450:4864:20::42b is neither permitted nor denied by best guess record for domain of leonardr-bM6h3K5UM15l57MIdRCFDg@public.gmane.org) smtp.mailfrom=leonardr-bM6h3K5UM15l57MIdRCFDg@public.gmane.org Original-Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com. [2a00:1450:4864:20::42b]) by gmr-mx.google.com with ESMTPS id l19si545243wms.3.2022.01.06.06.00.05 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Jan 2022 06:00:05 -0800 (PST) Received-SPF: neutral (google.com: 2a00:1450:4864:20::42b is neither permitted nor denied by best guess record for domain of leonardr-bM6h3K5UM15l57MIdRCFDg@public.gmane.org) client-ip=2a00:1450:4864:20::42b; Original-Received: by mail-wr1-x42b.google.com with SMTP id v6so4944141wra.8 for ; Thu, 06 Jan 2022 06:00:05 -0800 (PST) X-Received: by 2002:a5d:564f:: with SMTP id j15mr49395430wrw.366.1641477604399; Thu, 06 Jan 2022 06:00:04 -0800 (PST) In-Reply-To: X-Original-Sender: leonardr-bM6h3K5UM15l57MIdRCFDg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@lazerware-com.20210112.gappssmtp.com header.s=20210112 header.b=f019sdTQ; spf=neutral (google.com: 2a00:1450:4864:20::42b is neither permitted nor denied by best guess record for domain of leonardr-bM6h3K5UM15l57MIdRCFDg@public.gmane.org) smtp.mailfrom=leonardr-bM6h3K5UM15l57MIdRCFDg@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29908 Archived-At: --000000000000846db605d4ea492d Content-Type: text/plain; charset="UTF-8" Robert - the reason why none of the viewers are copyring out indentation is that there isn't actually indentation there (aka no spaces are tab characters), the text is simply "moved". Normally PDF viewers are able to apply heuristics to "guess" when the amount of "movement" is supposed to mean indentation - but this particular amount of "movement" is too small for consideration. If you make the indent say 4 spaces worth instead of 2, I suspect you will get the result you wish. On Thu, Jan 6, 2022 at 4:10 AM Robert Fekete wrote: > Hi Everyone, > > I'm trying to create PDF output from HTML input, and ran into a weird > error: > > Code samples (for example, YAML or Python) are properly formatted in the > pdf, but most of the formatting is lost when copy-pasting the code from the > PDF into a text editor or terminal. Depending on the PDF viewer, either: > > - line breaks are retained, but indentation is lost (evince, preview, > adobe reader), or > - line breaks are lost and everything becomes a single line, but > whitespace is retained (built-in pdf viewer of Firefox and VS Code) > > I'm currently using pandoc 2.14.2 on MacOS Big Sur. > > I have attached two test files (input and output), I created the pdf with > the wkhtml2pdf engine, but I've tested other engines as well and the > results were similar (xelatex, weasyprint). > > Has anyone seen a similar problem? Any pointers are appreciated. > > Kind Regards, > Robert > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/a976bf18-7019-43cf-84c2-0a2d375cef55n%40googlegroups.com > > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALu%3Dv3LO_f8GBNxwre9mTrMT%2BMttf6-b4eA45iKS1SUb8vSs%3DQ%40mail.gmail.com. --000000000000846db605d4ea492d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Robert - the reason why none of the viewers are copyring o= ut indentation is that there isn't actually indentation=C2=A0there (aka= no spaces are tab characters), the text is simply=C2=A0"moved".= =C2=A0 =C2=A0 Normally PDF viewers are able to apply heuristics to "gu= ess" when the amount of "movement" is supposed to mean inden= tation - but this particular amount of "movement" is too small fo= r consideration.=C2=A0 If you make the indent say 4 spaces worth instead of= 2, I suspect you will get the result you wish.

On Thu, Jan 6, 2022 at 4:10 = AM Robert Fekete <fekete77.= robert-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi Everyone,

I'm try= ing to create PDF output from HTML input, and ran into a weird error:

Code samples (for example, YAML or Python) are prop= erly=20 formatted in the pdf, but most of the formatting is lost when=20 copy-pasting the code from the PDF into a text editor or terminal.=20 Depending on the PDF viewer, either:
  • line breaks are retained, but indentation is lost (evi= nce, preview, adobe reader), or
  • line breaks are lost and everything= becomes a single line, but=20 whitespace is retained (built-in pdf viewer of Firefox and VS Code)
  • I'm currently using pandoc 2.14.2 on MacOS Big Sur.
    I have attached two test files (input and output), I created t= he pdf with the wkhtml2pdf engine, but I've tested other engines as wel= l and the results were similar (xelatex, weasyprint).

    Has anyone seen a similar problem? Any pointers are appreciated.

    Kind Regards,
    Robert

    --
    You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
    To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
    To view this discussion on the web visit https= ://groups.google.com/d/msgid/pandoc-discuss/a976bf18-7019-43cf-84c2-0a2d375= cef55n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://g= roups.google.com/d/msgid/pandoc-discuss/CALu%3Dv3LO_f8GBNxwre9mTrMT%2BMttf6= -b4eA45iKS1SUb8vSs%3DQ%40mail.gmail.com.
--000000000000846db605d4ea492d--