public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: "Pablo Rodríguez" <oinos-S0/GAf8tV78@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Is Pandoc OCR Capable
Date: Tue, 26 Jun 2018 16:03:14 +0200	[thread overview]
Message-ID: <a30d5e0e-9e4e-2379-cd5c-901a530ce5e6@web.de> (raw)
In-Reply-To: <CAAcdRmnLizCoq2Y8_x-M7nqb7R=D_WpHXAzdi1nz5S+vtXwY4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 06/26/2018 03:41 PM, Wei Wooi Peh wrote:
> [...] 
> May be I try to but it this way, After I've converted my markdown (Text
> + images) to pdf, then I try to open up the pdf document and click on
> the images, I should be able to select the text inside the images and
> without send to any OCR program again to recover the text that inside
> the image.

Hi WeiWooi,

I thought you were interested in tagged PDF documents, but I don’t think
that fits in your explanation above.

> In markdown, I only able to select the text from my statement (text) but
> not the text inside the images (because is in .jpg or .png)

You mean to have vector images with text inside, don’t you?

I don’t think pandoc (or LaTeX) has anything to do here with the images.
Bitmap images cannot contain text. They only contain text as bitmaps.
OCR software doesn’t read that, but only recognize patterns and render
them as text.

If you want text in your images, you will need to generate them as
vector images in a format that also allows text inside (either SVG,
PostScript, EPS or PDF).

I hope it helps,

Pablo
-- 
http://www.ousia.tk

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a30d5e0e-9e4e-2379-cd5c-901a530ce5e6%40web.de.
For more options, visit https://groups.google.com/d/optout.


  parent reply	other threads:[~2018-06-26 14:03 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-25 10:47 Wei Wooi Peh
     [not found] ` <c375467c-7686-4111-9b92-778b7d8f3b59-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-06-25 12:26   ` Robert Zenz
     [not found]     ` <5B30DF86.8020307-q1xk7osDwJUWQnjQ7V0W7w@public.gmane.org>
2018-06-25 13:36       ` Wei Wooi Peh
     [not found]         ` <311aad2d-67b2-4b41-be6c-8b23b1054b81-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-06-25 13:46           ` Robert Zenz
2018-06-26 12:46   ` Wei Wooi Peh
     [not found]     ` <ae2eb78f-7f99-4cc1-9cbe-1faecdb8e457-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-06-26 12:51       ` Robert Zenz
2018-06-26 12:58       ` Paulo Ney de Souza
     [not found]         ` <CAFVhNZNEq6SjiPihWiRrdstJ5BM4RUUBNZs2sjkTyWTRMsHeXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-06-26 13:14           ` Wei Wooi Peh
     [not found]             ` <CAAcdRmn8na1nCexX16pWoNv22m8r+KAnERoeMA7gDqWVRSrXVw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-06-26 13:17               ` Robert Zenz
2018-06-26 21:00               ` Jeremy Theler
     [not found]                 ` <07a5f522ae7e34890861a20b74f0fa034f640b63.camel-24em0bpozeFWk0Htik3J/w@public.gmane.org>
2018-06-27  3:28                   ` Ivan Lazar Miljenovic
2018-06-26 13:25   ` Eduardo Grosclaude
2018-06-26 13:31   ` Eduardo Grosclaude
     [not found]     ` <9c33293a-0bf0-48e2-8a51-b18cf6dfd4b8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-06-26 13:41       ` Wei Wooi Peh
     [not found]         ` <CAAcdRmnLizCoq2Y8_x-M7nqb7R=D_WpHXAzdi1nz5S+vtXwY4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-06-26 13:45           ` Paulo Ney de Souza
2018-06-26 14:03           ` Pablo Rodríguez [this message]
2018-06-28 12:36   ` Christophe Demko
     [not found]     ` <6ca16ea7-1a04-4c2e-8856-e75b5465d853-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-06-28 12:51       ` Jeremy Theler
     [not found]         ` <e0252af0901d3a6a413f35a4f5c008ad8a13c41d.camel-24em0bpozeFWk0Htik3J/w@public.gmane.org>
2018-06-28 13:51           ` Shawn H Corey
2018-06-29 15:14           ` Christophe Demko
2018-07-03 12:08   ` CR
     [not found]     ` <cf83dc69-7e1c-44e2-92d0-003de00c4a96-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-07-03 21:27       ` BP Jonsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a30d5e0e-9e4e-2379-cd5c-901a530ce5e6@web.de \
    --to=oinos-s0/gaf8tv78@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).