ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* [NTG-context] How to make words searchable without diacritics
@ 2023-08-05 19:16 Marcus Vinicius Mesquita
  2023-08-06 14:05 ` [NTG-context] " Bruce Horrocks
  2023-08-06 18:37 ` Pablo Rodriguez
  0 siblings, 2 replies; 10+ messages in thread
From: Marcus Vinicius Mesquita @ 2023-08-05 19:16 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Dear List,

I have a lot of latin words in a document with the length of the
vowels indicated by diacritics, for example: fīlĭa.

Is it possible somehow to make these words searchable without the diacritics?
That is, if I make a search for filia in the final pdf file, fīlĭa
would also be found?

Regards

Marcus Vinicius

-- 
Todas as coisas fatigam o corpo, salvo a música, que não fatiga nem o
corpo nem seus membros, por ser descanso da alma, primavera do
coração, distração do aflito, entretenimento do solitário, e viático
do viajante.

Kunnâsh al-Hâ'ik (Cancioneiro de al-Hâ'ik)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [NTG-context] Re: How to make words searchable without diacritics
  2023-08-05 19:16 [NTG-context] How to make words searchable without diacritics Marcus Vinicius Mesquita
@ 2023-08-06 14:05 ` Bruce Horrocks
  2023-08-06 18:37 ` Pablo Rodriguez
  1 sibling, 0 replies; 10+ messages in thread
From: Bruce Horrocks @ 2023-08-06 14:05 UTC (permalink / raw)
  To: ntg-context mailing list

In Adobe Reader there is an option Preferences › Categories › Search › [ ] Ignore Diacritics and Accents which you can tick to search on the underlying letter only.

If the search is for your own use only then this might be a solution rather than change the generated PDF.

> On 5 Aug 2023, at 20:16, Marcus Vinicius Mesquita <marcusvinicius.mesquita@gmail.com> wrote:
> 
> Dear List,
> 
> I have a lot of latin words in a document with the length of the
> vowels indicated by diacritics, for example: fīlĭa.
> 
> Is it possible somehow to make these words searchable without the diacritics?
> That is, if I make a search for filia in the final pdf file, fīlĭa
> would also be found?

—
Bruce Horrocks
Hampshire, UK

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [NTG-context] Re: How to make words searchable without diacritics
  2023-08-05 19:16 [NTG-context] How to make words searchable without diacritics Marcus Vinicius Mesquita
  2023-08-06 14:05 ` [NTG-context] " Bruce Horrocks
@ 2023-08-06 18:37 ` Pablo Rodriguez
  2023-08-07  6:11   ` Henning Hraban Ramm
  1 sibling, 1 reply; 10+ messages in thread
From: Pablo Rodriguez @ 2023-08-06 18:37 UTC (permalink / raw)
  To: ntg-context

On 8/5/23 21:16, Marcus Vinicius Mesquita wrote:
> Dear List,
>
> I have a lot of latin words in a document with the length of the
> vowels indicated by diacritics, for example: fīlĭa.
>
> Is it possible somehow to make these words searchable without the diacritics?
> That is, if I make a search for filia in the final pdf file, fīlĭa
> would also be found?

Dear Marcus Vinicius,

in PDF (the format itself), ActualText is a way of providing a text
replacement for the displayed element.

If you use ActualText, the string you search is the text replacement you
provide. That way, you could find literally “whatever you want” (being
"filia" its ActualText).

Hans provides this jewel in back-imp-pdf.mkxl and back-pdf.mkiv (adapter
for your needs):

  \starttext
  text \pdfbackendactualtext{whatever you want}{filia} text
  \stoptext

That being said, I think this is the wrong approach to your issue.

Firefox also disables diacritics by default (at least for me, this is
not a minor issue).

In any case, the PDF viewer used to search must have ActualText implemented.

I hope it helps,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [NTG-context] Re: How to make words searchable without diacritics
  2023-08-06 18:37 ` Pablo Rodriguez
@ 2023-08-07  6:11   ` Henning Hraban Ramm
  2023-08-07 12:17     ` Marcus Vinicius Mesquita
  0 siblings, 1 reply; 10+ messages in thread
From: Henning Hraban Ramm @ 2023-08-07  6:11 UTC (permalink / raw)
  To: ntg-context

Am 06.08.23 um 20:37 schrieb Pablo Rodriguez:
> Hans provides this jewel in back-imp-pdf.mkxl and back-pdf.mkiv (adapter
> for your needs):
> 
>    \starttext
>    text \pdfbackendactualtext{whatever you want}{filia} text
>    \stoptext
> 

> In any case, the PDF viewer used to search must have ActualText implemented.

Exactly. And e.g. Apple’s PDF library has not; it is used not only by 
Preview.app, but also by Skim and TeXshop. (I should check this with 
other viewers/libs.)

Hraban

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [NTG-context] Re: How to make words searchable without diacritics
  2023-08-07  6:11   ` Henning Hraban Ramm
@ 2023-08-07 12:17     ` Marcus Vinicius Mesquita
  2023-08-07 17:14       ` Ulrike Fischer
  2023-08-07 17:19       ` Henning Hraban Ramm
  0 siblings, 2 replies; 10+ messages in thread
From: Marcus Vinicius Mesquita @ 2023-08-07 12:17 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Thank you for the answers, Bruce, Pablo and Hraban. I was not aware of
ActualText.

I work on a manjaro linux, and I tested the example Pablo sent on
several programs:

mupdf-gl or mupdf: fails! [mupdf-gl is what I customarily use for its
blazing speed]
firefox: fails
vivaldi: passes
okular: passes
qpdfview: passes
evince: passes

But \pdfbackendactualtext is actually just what I needed since it can
be used also for other things like:

\starttext
what a \pdfbackendactualtext{\hyphenatedword{wonderful}}{wonderful} text
\stoptext

Best regards

Marcus Vinicius

On Mon, Aug 7, 2023 at 3:13 AM Henning Hraban Ramm <texml@fiee.net> wrote:
>
> Am 06.08.23 um 20:37 schrieb Pablo Rodriguez:
> > Hans provides this jewel in back-imp-pdf.mkxl and back-pdf.mkiv (adapter
> > for your needs):
> >
> >    \starttext
> >    text \pdfbackendactualtext{whatever you want}{filia} text
> >    \stoptext
> >
>
> > In any case, the PDF viewer used to search must have ActualText implemented.
>
> Exactly. And e.g. Apple’s PDF library has not; it is used not only by
> Preview.app, but also by Skim and TeXshop. (I should check this with
> other viewers/libs.)
>
> Hraban
>
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
> archive  : https://bitbucket.org/phg/context-mirror/commits/
> wiki     : https://contextgarden.net
> ___________________________________________________________________________________



-- 
Todas as coisas fatigam o corpo, salvo a música, que não fatiga nem o
corpo nem seus membros, por ser descanso da alma, primavera do
coração, distração do aflito, entretenimento do solitário, e viático
do viajante.

Kunnâsh al-Hâ'ik (Cancioneiro de al-Hâ'ik)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [NTG-context] Re: How to make words searchable without diacritics
  2023-08-07 12:17     ` Marcus Vinicius Mesquita
@ 2023-08-07 17:14       ` Ulrike Fischer
  2023-08-07 17:19       ` Henning Hraban Ramm
  1 sibling, 0 replies; 10+ messages in thread
From: Ulrike Fischer @ 2023-08-07 17:14 UTC (permalink / raw)
  To: ntg-context

Am Mon, 7 Aug 2023 09:17:19 -0300 schrieb Marcus Vinicius Mesquita:

> Thank you for the answers, Bruce, Pablo and Hraban. I was not aware of
> ActualText.
> 
> But \pdfbackendactualtext is actually just what I needed since it can
> be used also for other things like:

I don't think that it would be a good idea to use ActualText for
this. You are effectivly changing the content and meaning of your
text, not only for search, but also for copy&paste, screen reader,
html export etc.  If you think it is okay to claim that the text is
filia and then the accents are irrelevant, then why don't you print
filia directly?


-- 
Ulrike Fischer 
http://www.troubleshooting-tex.de/

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [NTG-context] Re: How to make words searchable without diacritics
  2023-08-07 12:17     ` Marcus Vinicius Mesquita
  2023-08-07 17:14       ` Ulrike Fischer
@ 2023-08-07 17:19       ` Henning Hraban Ramm
  2023-08-07 18:58         ` Marcus Vinicius Mesquita
  1 sibling, 1 reply; 10+ messages in thread
From: Henning Hraban Ramm @ 2023-08-07 17:19 UTC (permalink / raw)
  To: ntg-context

Am 07.08.23 um 14:17 schrieb Marcus Vinicius Mesquita:
> Thank you for the answers, Bruce, Pablo and Hraban. I was not aware of
> ActualText.
> 
> I work on a manjaro linux, and I tested the example Pablo sent on
> several programs:
> 
> mupdf-gl or mupdf: fails! [mupdf-gl is what I customarily use for its
> blazing speed]
> firefox: fails
> vivaldi: passes
> okular: passes
> qpdfview: passes
> evince: passes

Thank you for researching! I’ll include this in my viewer matrix. (But 
probably not before the ConTeXt meeting.)

> But \pdfbackendactualtext is actually just what I needed since it can
> be used also for other things like:
> 
> \starttext
> what a \pdfbackendactualtext{\hyphenatedword{wonderful}}{wonderful} text
> \stoptext

I’m not sure but I’d guess ActualText is also suitable for alternative 
texts (AltText) of images? Wouldn’t it make sense to have an alttext key 
in \externalfigure for accessibility (PDF/UA)?

Hraban
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [NTG-context] Re: How to make words searchable without diacritics
  2023-08-07 17:19       ` Henning Hraban Ramm
@ 2023-08-07 18:58         ` Marcus Vinicius Mesquita
  2023-08-07 19:57           ` Hans Hagen
  0 siblings, 1 reply; 10+ messages in thread
From: Marcus Vinicius Mesquita @ 2023-08-07 18:58 UTC (permalink / raw)
  To: mailing list for ConTeXt users

@ Ulrike: This is what my client wants, and the client is always right.

Regards

Marcus Vinicius

On Mon, Aug 7, 2023 at 2:23 PM Henning Hraban Ramm <texml@fiee.net> wrote:
>
> Am 07.08.23 um 14:17 schrieb Marcus Vinicius Mesquita:
> > Thank you for the answers, Bruce, Pablo and Hraban. I was not aware of
> > ActualText.
> >
> > I work on a manjaro linux, and I tested the example Pablo sent on
> > several programs:
> >
> > mupdf-gl or mupdf: fails! [mupdf-gl is what I customarily use for its
> > blazing speed]
> > firefox: fails
> > vivaldi: passes
> > okular: passes
> > qpdfview: passes
> > evince: passes
>
> Thank you for researching! I’ll include this in my viewer matrix. (But
> probably not before the ConTeXt meeting.)
>
> > But \pdfbackendactualtext is actually just what I needed since it can
> > be used also for other things like:
> >
> > \starttext
> > what a \pdfbackendactualtext{\hyphenatedword{wonderful}}{wonderful} text
> > \stoptext
>
> I’m not sure but I’d guess ActualText is also suitable for alternative
> texts (AltText) of images? Wouldn’t it make sense to have an alttext key
> in \externalfigure for accessibility (PDF/UA)?
>
> Hraban
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
> archive  : https://bitbucket.org/phg/context-mirror/commits/
> wiki     : https://contextgarden.net
> ___________________________________________________________________________________



-- 
Todas as coisas fatigam o corpo, salvo a música, que não fatiga nem o
corpo nem seus membros, por ser descanso da alma, primavera do
coração, distração do aflito, entretenimento do solitário, e viático
do viajante.

Kunnâsh al-Hâ'ik (Cancioneiro de al-Hâ'ik)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [NTG-context] Re: How to make words searchable without diacritics
  2023-08-07 18:58         ` Marcus Vinicius Mesquita
@ 2023-08-07 19:57           ` Hans Hagen
  2023-08-08  1:22             ` Marcus Vinicius Mesquita
  0 siblings, 1 reply; 10+ messages in thread
From: Hans Hagen @ 2023-08-07 19:57 UTC (permalink / raw)
  To: ntg-context

On 8/7/2023 8:58 PM, Marcus Vinicius Mesquita wrote:
> @ Ulrike: This is what my client wants, and the client is always right.
You can try this:

\starttext

\protected\def\ProofOfConcept#1#2%
   {{#1\llap{\effect[hidden]{#2}}}}

test test \ProofOfConcept{föö}{foo} test

\stoptext

but forget about hyphenation (actualtext probably also doesn't always 
work well across lines in viewers).

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [NTG-context] Re: How to make words searchable without diacritics
  2023-08-07 19:57           ` Hans Hagen
@ 2023-08-08  1:22             ` Marcus Vinicius Mesquita
  0 siblings, 0 replies; 10+ messages in thread
From: Marcus Vinicius Mesquita @ 2023-08-08  1:22 UTC (permalink / raw)
  To: mailing list for ConTeXt users

This is perfect, as it works also with mupdf-gl and firefox!

Thank you, Hans

Kind regards

Marcus Vinicius

On Mon, Aug 7, 2023 at 4:58 PM Hans Hagen <j.hagen@xs4all.nl> wrote:
>
> On 8/7/2023 8:58 PM, Marcus Vinicius Mesquita wrote:
> > @ Ulrike: This is what my client wants, and the client is always right.
> You can try this:
>
> \starttext
>
> \protected\def\ProofOfConcept#1#2%
>    {{#1\llap{\effect[hidden]{#2}}}}
>
> test test \ProofOfConcept{föö}{foo} test
>
> \stoptext
>
> but forget about hyphenation (actualtext probably also doesn't always
> work well across lines in viewers).
>
> Hans
>
> -----------------------------------------------------------------
>                                            Hans Hagen | PRAGMA ADE
>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -----------------------------------------------------------------
>
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
> archive  : https://bitbucket.org/phg/context-mirror/commits/
> wiki     : https://contextgarden.net
> ___________________________________________________________________________________



-- 
Todas as coisas fatigam o corpo, salvo a música, que não fatiga nem o
corpo nem seus membros, por ser descanso da alma, primavera do
coração, distração do aflito, entretenimento do solitário, e viático
do viajante.

Kunnâsh al-Hâ'ik (Cancioneiro de al-Hâ'ik)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-08  1:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-05 19:16 [NTG-context] How to make words searchable without diacritics Marcus Vinicius Mesquita
2023-08-06 14:05 ` [NTG-context] " Bruce Horrocks
2023-08-06 18:37 ` Pablo Rodriguez
2023-08-07  6:11   ` Henning Hraban Ramm
2023-08-07 12:17     ` Marcus Vinicius Mesquita
2023-08-07 17:14       ` Ulrike Fischer
2023-08-07 17:19       ` Henning Hraban Ramm
2023-08-07 18:58         ` Marcus Vinicius Mesquita
2023-08-07 19:57           ` Hans Hagen
2023-08-08  1:22             ` Marcus Vinicius Mesquita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).