* Copy-pasting code from the PDF loses formatting @ 2022-01-06 9:10 Robert Fekete [not found] ` <a976bf18-7019-43cf-84c2-0a2d375cef55n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Robert Fekete @ 2022-01-06 9:10 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1296 bytes --] Hi Everyone, I'm trying to create PDF output from HTML input, and ran into a weird error: Code samples (for example, YAML or Python) are properly formatted in the pdf, but most of the formatting is lost when copy-pasting the code from the PDF into a text editor or terminal. Depending on the PDF viewer, either: - line breaks are retained, but indentation is lost (evince, preview, adobe reader), or - line breaks are lost and everything becomes a single line, but whitespace is retained (built-in pdf viewer of Firefox and VS Code) I'm currently using pandoc 2.14.2 on MacOS Big Sur. I have attached two test files (input and output), I created the pdf with the wkhtml2pdf engine, but I've tested other engines as well and the results were similar (xelatex, weasyprint). Has anyone seen a similar problem? Any pointers are appreciated. Kind Regards, Robert -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a976bf18-7019-43cf-84c2-0a2d375cef55n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 1781 bytes --] [-- Attachment #2: test-2.html --] [-- Type: text/html, Size: 2769 bytes --] [-- Attachment #3: test-2.pdf --] [-- Type: application/pdf, Size: 26120 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <a976bf18-7019-43cf-84c2-0a2d375cef55n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Copy-pasting code from the PDF loses formatting [not found] ` <a976bf18-7019-43cf-84c2-0a2d375cef55n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2022-01-06 13:59 ` Leonard Rosenthol 2022-01-06 14:50 ` Robert Fekete 0 siblings, 1 reply; 3+ messages in thread From: Leonard Rosenthol @ 2022-01-06 13:59 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 2520 bytes --] Robert - the reason why none of the viewers are copyring out indentation is that there isn't actually indentation there (aka no spaces are tab characters), the text is simply "moved". Normally PDF viewers are able to apply heuristics to "guess" when the amount of "movement" is supposed to mean indentation - but this particular amount of "movement" is too small for consideration. If you make the indent say 4 spaces worth instead of 2, I suspect you will get the result you wish. On Thu, Jan 6, 2022 at 4:10 AM Robert Fekete <fekete77.robert-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Hi Everyone, > > I'm trying to create PDF output from HTML input, and ran into a weird > error: > > Code samples (for example, YAML or Python) are properly formatted in the > pdf, but most of the formatting is lost when copy-pasting the code from the > PDF into a text editor or terminal. Depending on the PDF viewer, either: > > - line breaks are retained, but indentation is lost (evince, preview, > adobe reader), or > - line breaks are lost and everything becomes a single line, but > whitespace is retained (built-in pdf viewer of Firefox and VS Code) > > I'm currently using pandoc 2.14.2 on MacOS Big Sur. > > I have attached two test files (input and output), I created the pdf with > the wkhtml2pdf engine, but I've tested other engines as well and the > results were similar (xelatex, weasyprint). > > Has anyone seen a similar problem? Any pointers are appreciated. > > Kind Regards, > Robert > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/a976bf18-7019-43cf-84c2-0a2d375cef55n%40googlegroups.com > <https://groups.google.com/d/msgid/pandoc-discuss/a976bf18-7019-43cf-84c2-0a2d375cef55n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALu%3Dv3LO_f8GBNxwre9mTrMT%2BMttf6-b4eA45iKS1SUb8vSs%3DQ%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 3514 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Copy-pasting code from the PDF loses formatting 2022-01-06 13:59 ` Leonard Rosenthol @ 2022-01-06 14:50 ` Robert Fekete 0 siblings, 0 replies; 3+ messages in thread From: Robert Fekete @ 2022-01-06 14:50 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 3033 bytes --] Hi Leonard, Thanks a lot for the tip, unfortunately it doesn't seem to solve the problem, but I'll play with it some more. Is there any way to force this, maybe from the HTML side, like replacing spaces with tabs? (Sorry if this doesn't make sense, I don't know much about the inner workings of the PDF format). Leonard Rosenthol a következőt írta (2022. január 6., csütörtök, 15:00:08 UTC+1): > Robert - the reason why none of the viewers are copyring out indentation > is that there isn't actually indentation there (aka no spaces are tab > characters), the text is simply "moved". Normally PDF viewers are able > to apply heuristics to "guess" when the amount of "movement" is supposed to > mean indentation - but this particular amount of "movement" is too small > for consideration. If you make the indent say 4 spaces worth instead of 2, > I suspect you will get the result you wish. > > On Thu, Jan 6, 2022 at 4:10 AM Robert Fekete <fekete7...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> Hi Everyone, >> >> I'm trying to create PDF output from HTML input, and ran into a weird >> error: >> >> Code samples (for example, YAML or Python) are properly formatted in the >> pdf, but most of the formatting is lost when copy-pasting the code from the >> PDF into a text editor or terminal. Depending on the PDF viewer, either: >> >> - line breaks are retained, but indentation is lost (evince, preview, >> adobe reader), or >> - line breaks are lost and everything becomes a single line, but >> whitespace is retained (built-in pdf viewer of Firefox and VS Code) >> >> I'm currently using pandoc 2.14.2 on MacOS Big Sur. >> >> I have attached two test files (input and output), I created the pdf with >> the wkhtml2pdf engine, but I've tested other engines as well and the >> results were similar (xelatex, weasyprint). >> >> Has anyone seen a similar problem? Any pointers are appreciated. >> >> Kind Regards, >> Robert >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/a976bf18-7019-43cf-84c2-0a2d375cef55n%40googlegroups.com >> <https://groups.google.com/d/msgid/pandoc-discuss/a976bf18-7019-43cf-84c2-0a2d375cef55n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d82e995e-040c-44ae-9658-211660d69887n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 4541 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-01-06 14:50 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-06 9:10 Copy-pasting code from the PDF loses formatting Robert Fekete [not found] ` <a976bf18-7019-43cf-84c2-0a2d375cef55n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-01-06 13:59 ` Leonard Rosenthol 2022-01-06 14:50 ` Robert Fekete
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).