From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29909 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Robert Fekete Newsgroups: gmane.text.pandoc Subject: Re: Copy-pasting code from the PDF loses formatting Date: Thu, 6 Jan 2022 06:50:02 -0800 (PST) Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_988_637527326.1641480602635" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13052"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCFYHG6V5EEBBG4D3SHAMGQEM7KHISY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Jan 06 15:50:06 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oo1-f62.google.com ([209.85.161.62]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1n5U5e-0003Bj-FJ for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 06 Jan 2022 15:50:06 +0100 Original-Received: by mail-oo1-f62.google.com with SMTP id b26-20020a4ac29a000000b002dac1c5b232sf1626059ooq.2 for ; Thu, 06 Jan 2022 06:50:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=3oDOQr07hxunx0d4cnahXVIznhomCEgKyQiM1V11Aio=; b=m0mqDbqEmF/1R/fw7FKW070WtNZ0q0fNVcNdUEiQriARQW7MQ/7ypS3Awr/syTGkEa FW2c6p5JULhS1cBYOBQx5m1DaoN/0XBtldXxpjd1pqFc4lHZeiN4gUUX1/KQ1i72sdBF wX7dj2+cUkp9f5k2vWq98ySS5yBOXzA3JEtWgsmxkMfofhUAR2XJopsfJujBeg6LuOwP eAi8hdit8psrZy8hr4TWegerxMXWfSs0uE4+UFfbH9ZD4xtI+gkDT0Wi6XFm5/fN1Gg8 u3L7vbJ46Syb383SwPXtWLB0j6ZqxEpy51xbxQOcLxsyNeKGPgDYfxIfDjcfniGnCutZ 0qJw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=3oDOQr07hxunx0d4cnahXVIznhomCEgKyQiM1V11Aio=; b=bOgsPWo66YQ7OdJvr9H2KeO9XuM5WQvepVHJkhw8f0ufCvWK4ZdfdDphegATLaqEI7 g8pzpNgX6to4P/qgvSN9gL8usNW+IdZKnaZtXJVpVjA7K6ko4U1YSbQhIZdIfgq1PU7C +CejYi7IdIY8TYkWvifDLv7kQdzauUyO02vGdMKQmhpj4fgRztLRg6MLmMo3CzntB7Aj uzbb5EwVejnmFZ9M74zaR+EB0781M1OGyRhBsRWrX1AlRQJqluaWtJjYL9Ghp/LfBri3 mCkDITYUkHxZgYtZAFOM0HqHT5QPq79geVpAkTjxv8ky12Z7/Sq6yr6AEnTMczCXUYaE tPCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=3oDOQr07hxunx0d4cnahXVIznhomCEgKyQiM1V11Aio=; b=Q42AMhbbtnV4ZXlVk+kjf/U6TjyKpEyyyWG8JjTVwqZr6atCiyADnT22f0rGZY+rdx E1UxqCc2HZ3I6LyKI5ARwVoI0ELK0MDhrJ0V517KX4IUOZk3HPMcLbc/bhldZDI1ARxj kloBEzRrWmSuTuVnqhgjJ7FTBtJGItskNv/C5ZK+raxjuf2N6rcw8zrDFvkKMk/3PxiJ nS+pg/MhfZxXAuhXgxXgXegk0GQGx5Ej4IvYN69yvOhL4XVwt95R12FX14sFrcOi0YQ8 Yaond5cpZo+nSqIndMH1AUG8gAy2NqdnDRZf0hHvLBPaLDCQ1Oq8PQ0fE1m0US17ee22 oytA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM532oL9H6y/cwMPw/Aj8CPaI62vvjXLbT3fh48bOfoFyyks8NmMPy iFCNHqKDUBzXOT1Fyt7QG0Y= X-Google-Smtp-Source: ABdhPJzZ4XI4BiEDS2y2ODBwJlEB16ONtP26rNUszzXjSMNjlE/kW4D5HDsnGlvl5ExkD6E4um+INA== X-Received: by 2002:a9d:17cc:: with SMTP id j70mr42598668otj.313.1641480605000; Thu, 06 Jan 2022 06:50:05 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aca:ded5:: with SMTP id v204ls496298oig.10.gmail; Thu, 06 Jan 2022 06:50:03 -0800 (PST) X-Received: by 2002:a05:6808:d49:: with SMTP id w9mr4482743oik.33.1641480603210; Thu, 06 Jan 2022 06:50:03 -0800 (PST) In-Reply-To: X-Original-Sender: fekete77.robert-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29909 Archived-At: ------=_Part_988_637527326.1641480602635 Content-Type: multipart/alternative; boundary="----=_Part_989_1597953221.1641480602635" ------=_Part_989_1597953221.1641480602635 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Leonard, Thanks a lot for the tip, unfortunately it doesn't seem to solve the=20 problem, but I'll play with it some more. Is there any way to force this,= =20 maybe from the HTML side, like replacing spaces with tabs? (Sorry if this= =20 doesn't make sense, I don't know much about the inner workings of the PDF= =20 format). Leonard Rosenthol a k=C3=B6vetkez=C5=91t =C3=ADrta (2022. janu=C3=A1r 6., c= s=C3=BCt=C3=B6rt=C3=B6k, 15:00:08=20 UTC+1): > Robert - the reason why none of the viewers are copyring out indentation= =20 > is that there isn't actually indentation there (aka no spaces are tab=20 > characters), the text is simply "moved". Normally PDF viewers are able= =20 > to apply heuristics to "guess" when the amount of "movement" is supposed = to=20 > mean indentation - but this particular amount of "movement" is too small= =20 > for consideration. If you make the indent say 4 spaces worth instead of = 2,=20 > I suspect you will get the result you wish. > > On Thu, Jan 6, 2022 at 4:10 AM Robert Fekete wrote= : > >> Hi Everyone,=20 >> >> I'm trying to create PDF output from HTML input, and ran into a weird=20 >> error: >> >> Code samples (for example, YAML or Python) are properly formatted in the= =20 >> pdf, but most of the formatting is lost when copy-pasting the code from = the=20 >> PDF into a text editor or terminal. Depending on the PDF viewer, either: >> >> - line breaks are retained, but indentation is lost (evince, preview,= =20 >> adobe reader), or >> - line breaks are lost and everything becomes a single line, but=20 >> whitespace is retained (built-in pdf viewer of Firefox and VS Code) >> >> I'm currently using pandoc 2.14.2 on MacOS Big Sur. >> >> I have attached two test files (input and output), I created the pdf wit= h=20 >> the wkhtml2pdf engine, but I've tested other engines as well and the=20 >> results were similar (xelatex, weasyprint).=20 >> >> Has anyone seen a similar problem? Any pointers are appreciated. >> >> Kind Regards, >> Robert >> >> --=20 >> You received this message because you are subscribed to the Google Group= s=20 >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n=20 >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit=20 >> https://groups.google.com/d/msgid/pandoc-discuss/a976bf18-7019-43cf-84c2= -0a2d375cef55n%40googlegroups.com=20 >> >> . >> > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/d82e995e-040c-44ae-9658-211660d69887n%40googlegroups.com. ------=_Part_989_1597953221.1641480602635 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Leonard,

Thanks a lot for the tip, unfortu= nately it doesn't seem to solve the problem, but I'll play with it some mor= e. Is there any way to force this, maybe from the HTML side, like replacing= spaces with tabs? (Sorry if this doesn't make sense, I don't know much abo= ut the inner workings of the PDF format).

Leonard Rosenthol a k=C3=B6vetkez= =C5=91t =C3=ADrta (2022. janu=C3=A1r 6., cs=C3=BCt=C3=B6rt=C3=B6k, 15:00:08= UTC+1):
Robert - the reason why none of the viewers are copyring out i= ndentation is that there isn't actually indentation=C2=A0there (aka no = spaces are tab characters), the text is simply=C2=A0"moved".=C2= =A0 =C2=A0 Normally PDF viewers are able to apply heuristics to "guess= " when the amount of "movement" is supposed to mean indentat= ion - but this particular amount of "movement" is too small for c= onsideration.=C2=A0 If you make the indent say 4 spaces worth instead of 2,= I suspect you will get the result you wish.

On Thu, Jan 6, 2022 at 4:10 AM Robert Fekete <fekete7...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi Everyone,

I'm trying to create PDF o= utput from HTML input, and ran into a weird error:

=
Code samples (for example, YAML or Python) are properly=20 formatted in the pdf, but most of the formatting is lost when=20 copy-pasting the code from the PDF into a text editor or terminal.=20 Depending on the PDF viewer, either:
  • line breaks are retained, but indentation is lost (evi= nce, preview, adobe reader), or
  • line breaks are lost and everything= becomes a single line, but=20 whitespace is retained (built-in pdf viewer of Firefox and VS Code)
  • I'm currently using pandoc 2.14.2 on MacOS Big Sur.
    I have attached two test files (input and output), I created t= he pdf with the wkhtml2pdf engine, but I've tested other engines as wel= l and the results were similar (xelatex, weasyprint).

    Has anyone seen a similar problem? Any pointers are appreciated.

    Kind Regards,
    Robert

    --
    You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
    To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus...@googleg= roups.com.
    To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a976= bf18-7019-43cf-84c2-0a2d375cef55n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/d82e995e-040c-44ae-9658-211660d69887n%40googlegroups.= com.
------=_Part_989_1597953221.1641480602635-- ------=_Part_988_637527326.1641480602635--