* [NTG-context] Copy PDF text without hyphenated words
@ 2025-02-14 1:19 Gerion Entrup
2025-02-14 7:48 ` [NTG-context] " Hans Hagen via ntg-context
0 siblings, 1 reply; 6+ messages in thread
From: Gerion Entrup @ 2025-02-14 1:19 UTC (permalink / raw)
To: ntg-context
[-- Attachment #1.1: Type: text/plain, Size: 668 bytes --]
Hi,
I recently learned that Typst seems to be able to produce PDFs where a hyphenated text can be copied without the hyphenation (so all words in the copied text are not hyphenated).
I seem to recall that the PDF format has an extra mode for this, where the creation program can embed some text that should only appear when copied and replace the word parts that are visible on the page.
ConTeXt, in it's default mode, seems not to embed this text. When copying hyphenated words, the hyphenated word parts appear as distinct words (even without the hyphen). Is there a way to tell ConTeXt to produce PDF where the text can be copied without hyphenated words?
Gerion
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 659 bytes --]
[-- Attachment #2: Type: text/plain, Size: 511 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 6+ messages in thread
* [NTG-context] Re: Copy PDF text without hyphenated words
2025-02-14 1:19 [NTG-context] Copy PDF text without hyphenated words Gerion Entrup
@ 2025-02-14 7:48 ` Hans Hagen via ntg-context
2025-02-16 3:04 ` Gerion Entrup
0 siblings, 1 reply; 6+ messages in thread
From: Hans Hagen via ntg-context @ 2025-02-14 7:48 UTC (permalink / raw)
To: ntg-context; +Cc: Hans Hagen
On 2/14/2025 2:19 AM, Gerion Entrup wrote:
> Hi,
>
> I recently learned that Typst seems to be able to produce PDFs where a hyphenated text can be copied without the hyphenation (so all words in the copied text are not hyphenated).
> I seem to recall that the PDF format has an extra mode for this, where the creation program can embed some text that should only appear when copied and replace the word parts that are visible on the page.
>
> ConTeXt, in it's default mode, seems not to embed this text. When copying hyphenated words, the hyphenated word parts appear as distinct words (even without the hyphen). Is there a way to tell ConTeXt to produce PDF where the text can be copied without hyphenated words?
This is a fuzzy area and has always depended on how pdf viewers see
things. The standard has some suggestions and oenm is to use soft
hyphens which is what we do (can be turned off). From your description
it looks like actual text is used and in this case, although one can
make that work, to me it is not a solution, it not only polutes the page
stream, it also can interferes with other features and increases overhead.
When a viewer sees aoft hyphen it is assumed that it looks for the next
part of the word. Afaik acrobat reader can handle both variants. The
other (open source) viewers that I use are a mixed bag (in areas like
these).
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 6+ messages in thread
* [NTG-context] Re: Copy PDF text without hyphenated words
2025-02-14 7:48 ` [NTG-context] " Hans Hagen via ntg-context
@ 2025-02-16 3:04 ` Gerion Entrup
2025-02-16 7:33 ` Marco Patzer
2025-02-17 11:44 ` Ulrike Fischer
0 siblings, 2 replies; 6+ messages in thread
From: Gerion Entrup @ 2025-02-16 3:04 UTC (permalink / raw)
To: ntg-context
[-- Attachment #1.1: Type: text/plain, Size: 1869 bytes --]
Am Freitag, 14. Februar 2025, 08:48:06 Mitteleuropäische Normalzeit schrieb Hans Hagen via ntg-context:
> On 2/14/2025 2:19 AM, Gerion Entrup wrote:
> > Hi,
> >
> > I recently learned that Typst seems to be able to produce PDFs where a hyphenated text can be copied without the hyphenation (so all words in the copied text are not hyphenated).
> > I seem to recall that the PDF format has an extra mode for this, where the creation program can embed some text that should only appear when copied and replace the word parts that are visible on the page.
> >
> > ConTeXt, in it's default mode, seems not to embed this text. When copying hyphenated words, the hyphenated word parts appear as distinct words (even without the hyphen). Is there a way to tell ConTeXt to produce PDF where the text can be copied without hyphenated words?
>
> This is a fuzzy area and has always depended on how pdf viewers see
> things. The standard has some suggestions and oenm is to use soft
> hyphens which is what we do (can be turned off). From your description
> it looks like actual text is used and in this case, although one can
> make that work, to me it is not a solution, it not only polutes the page
> stream, it also can interferes with other features and increases overhead.
>
> When a viewer sees aoft hyphen it is assumed that it looks for the next
> part of the word. Afaik acrobat reader can handle both variants. The
> other (open source) viewers that I use are a mixed bag (in areas like
> these).
Thanks for the answer. I researched this for my default PDF-viewer, Okular from KDE, and this program seems to be really special in this regard. It should be actively responsible for the behavior described in my original mail.
See https://bugs.kde.org/show_bug.cgi?id=447094#c5 and https://bugs.kde.org/show_bug.cgi?id=233604.
Gerion
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 659 bytes --]
[-- Attachment #2: Type: text/plain, Size: 511 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 6+ messages in thread
* [NTG-context] Re: Copy PDF text without hyphenated words
2025-02-16 3:04 ` Gerion Entrup
@ 2025-02-16 7:33 ` Marco Patzer
2025-02-16 8:06 ` Mikael Sundqvist
2025-02-17 11:44 ` Ulrike Fischer
1 sibling, 1 reply; 6+ messages in thread
From: Marco Patzer @ 2025-02-16 7:33 UTC (permalink / raw)
To: mailing list for ConTeXt users
On Sun, 16 Feb 2025 04:04:11 +0100
Gerion Entrup <gerion.entrup@flump.de> wrote:
> I researched this for my default PDF-viewer, Okular from KDE, and
> this program seems to be really special in this regard.
I don't think it is. I quickly tested a few other viewers and they
all behave the same way:
- zathura (0.5.11)
- evince (46.3.1)
- pdftotext (25.01.0)
- Firefox PDF Viewer (135.0)
Marco
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 6+ messages in thread
* [NTG-context] Re: Copy PDF text without hyphenated words
2025-02-16 7:33 ` Marco Patzer
@ 2025-02-16 8:06 ` Mikael Sundqvist
0 siblings, 0 replies; 6+ messages in thread
From: Mikael Sundqvist @ 2025-02-16 8:06 UTC (permalink / raw)
To: mailing list for ConTeXt users
[-- Attachment #1.1: Type: text/plain, Size: 591 bytes --]
Hi!
Den sön 16 feb. 2025 08:40Marco Patzer <lists@homerow.info> skrev:
> On Sun, 16 Feb 2025 04:04:11 +0100
> Gerion Entrup <gerion.entrup@flump.de> wrote:
>
> > I researched this for my default PDF-viewer, Okular from KDE, and
> > this program seems to be really special in this regard.
>
> I don't think it is. I quickly tested a few other viewers and they
> all behave the same way:
>
> - zathura (0.5.11)
> - evince (46.3.1)
> - pdftotext (25.01.0)
> - Firefox PDF Viewer (135.0)
>
The only one that worked for me was the viewer in the chrome browser.
/Mikael
[-- Attachment #1.2: Type: text/html, Size: 1108 bytes --]
[-- Attachment #2: Type: text/plain, Size: 511 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 6+ messages in thread
* [NTG-context] Re: Copy PDF text without hyphenated words
2025-02-16 3:04 ` Gerion Entrup
2025-02-16 7:33 ` Marco Patzer
@ 2025-02-17 11:44 ` Ulrike Fischer
1 sibling, 0 replies; 6+ messages in thread
From: Ulrike Fischer @ 2025-02-17 11:44 UTC (permalink / raw)
To: ntg-context
Am Sun, 16 Feb 2025 04:04:11 +0100 schrieb Gerion Entrup:
> Thanks for the answer. I researched this for my default
> PDF-viewer, Okular from KDE, and this program seems to be really
> special in this regard. It should be actively responsible for the
> behavior described in my original mail. See
> https://bugs.kde.org/show_bug.cgi?id=447094#c5 and
> https://bugs.kde.org/show_bug.cgi?id=233604.
Well 14 years old bug reports are not really good references.
In any case: at least the second bug report is not relevant, the
attached PDF uses a hard hyphen 002D and not a soft hyphen 00AD as
context (correctly) does.
I don't know what typst does (you didn't attach an example pdf) but
if they use ActualText I wouldn't recommend to copy that. Support
for ActualText is worse than support for soft hyphen (and I had even
examples where words or sylables got lost in copy & paste if
ActualText was involved).
--
Ulrike Fischer
http://www.troubleshooting-tex.de/
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-02-17 11:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-14 1:19 [NTG-context] Copy PDF text without hyphenated words Gerion Entrup
2025-02-14 7:48 ` [NTG-context] " Hans Hagen via ntg-context
2025-02-16 3:04 ` Gerion Entrup
2025-02-16 7:33 ` Marco Patzer
2025-02-16 8:06 ` Mikael Sundqvist
2025-02-17 11:44 ` Ulrike Fischer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).