From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/82516 Path: news.gmane.org!not-for-mail From: luigi scarso Newsgroups: gmane.comp.tex.context Subject: Re: Off-topic: Convert PDF to (Con/La)TeX Date: Wed, 15 May 2013 18:18:57 +0200 Message-ID: References: <518FA7E2.7030105@telefonica.net> <5193B46A.4010400@telefonica.net> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1612162852==" X-Trace: ger.gmane.org 1368634761 2523 80.91.229.3 (15 May 2013 16:19:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 15 May 2013 16:19:21 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Wed May 15 18:19:23 2013 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UceQE-0003Fb-K7 for gctc-ntg-context-518@m.gmane.org; Wed, 15 May 2013 18:19:22 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 63396101E0; Wed, 15 May 2013 18:19:22 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id NVBX20fCJ+YI; Wed, 15 May 2013 18:19:16 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [IPv6:::1]) by balder.ntg.nl (Postfix) with ESMTP id 11244101E5; Wed, 15 May 2013 18:19:16 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 99360101E5 for ; Wed, 15 May 2013 18:19:14 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id NiTEDhOXkvjd for ; Wed, 15 May 2013 18:19:09 +0200 (CEST) Original-Received: from filter5-til.mf.surf.net (filter5-til.mf.surf.net [194.171.167.221]) by balder.ntg.nl (Postfix) with ESMTP id 6C510101E0 for ; Wed, 15 May 2013 18:18:59 +0200 (CEST) Original-Received: from mail-la0-x230.google.com (mail-la0-x230.google.com [IPv6:2a00:1450:4010:c03::230]) by filter5-til.mf.surf.net (8.14.3/8.14.3/Debian-9.4) with ESMTP id r4FGIwse026125 for ; Wed, 15 May 2013 18:18:58 +0200 Original-Received: by mail-la0-f48.google.com with SMTP id fs12so1933561lab.7 for ; Wed, 15 May 2013 09:18:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=R0naB0ip+irb5RbFCrsK6fjoyJMY42j3EXarkV3f5v4=; b=mGM4Ebjvq/mP5hTJDVJWk/o6eiyDyFuUTQ8mtmViYAFhEVepVKBOQquNQ7SyBbgUL1 WVBSuYlB7h5A9wPGoRRmWEvBSpA4ol+Yj1u8Ta5Xt74zawxVy8VmCgF76TTLECfQZCKp hKOkVr0dh78jgwpwhXSiPqw0AS5APoja+H4A/BRU43uOngMQVvwXrdIrVy9K28ERFhgD PDOBqcF3EO6TwGBu52lnGOU6DxTMChmEgQT6QSEBdHmF//IWBf/K39Xzq3zcTCN6ZEdg 19za2hgJs9diyJUwysV8HJRl5ATApJWlSKlpFo6Fc2Q/kgHyHdnT5teEzM5s3r160xI3 Z66Q== X-Received: by 10.152.21.132 with SMTP id v4mr18323616lae.53.1368634738043; Wed, 15 May 2013 09:18:58 -0700 (PDT) Original-Received: by 10.114.172.134 with HTTP; Wed, 15 May 2013 09:18:57 -0700 (PDT) In-Reply-To: <5193B46A.4010400@telefonica.net> X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=2a00:1450:4010:c03::230; country=IE X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0WJAgiWPL - 66dabc279844 - 20130515 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.14 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:82516 Archived-At: --===============1612162852== Content-Type: multipart/alternative; boundary=089e0158b6c863763604dcc41be7 --089e0158b6c863763604dcc41be7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable \starttext $\int_{i=3D1}^{\infty} x^2$ \stoptext $ mudraw -ttt test.pdf warning: ignoring surrogate pair mapping in cmap On Wed, May 15, 2013 at 6:14 PM, Xan wrote: > Al 13/05/13 09:55, En/na luigi scarso ha escrit: > > > > > On Sun, May 12, 2013 at 4:32 PM, Xan wrote: > >> Hi, >> >> I just want to know if there is any tool to convert a pdf (generated by >> latex or context) to latex source or context source file. Does anyone ha= ve >> got an experience on that? >> >> I'm thinking about two alternatives: >> * libraries for reading like podofo and custom script for passing pdf >> context (text) to context commands >> * pass pdf to jpg, and apply http://detexify.kirelabs.org/classify.htmlf= or passing to tex symbols. >> >> For me it's vital to pass mathematical symbols like (\int) to tex symbol >> and not like utf-8 symbols. >> >> Thanks a lot, >> Xan. >> >> > Have you seen the mudraw program of mupdf > http://www.mupdf.com/ > ? > It has a -t switch that outputs txt and a -tt and -ttt switches that > output xml. > > -- > luigi > > Thank you for answering and sorry for delay. I will check it, but I > suspect that if I have > > $$\int_{i=3D1}^{\infty} x^2$$ > > in one latex document and it generates pdf, then mupdf -t of that documen= t > does not generate that formula, else "S i=3D1 x=C2=B2". > > > Thanks, > Xan. > --=20 luigi --089e0158b6c863763604dcc41be7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
\starttext
$\int_{i=3D1}^{\infty} x^2$
\stoptext
=


$ mudraw -ttt test.pdf
<?xml version=3D"1.0"?&g= t;
<document name=3D"test.pdf">
warning: ignoring sur= rogate pair mapping in cmap
<page>
<block bbox=3D"280.534 67.7741 286.392 84.595"= >
<line bbox=3D"280.534 67.7741 286.392 84.595">
&= lt;span bbox=3D"280.534 67.7741 286.392 84.595" font=3D"LMRo= man12-Regular" size=3D"11.9552">
<char bbox=3D"280.534 67.7741 286.392 84.595" c=3D"1"= ;/>
</span>
</line>
</block>
<block bbo= x=3D"75.4124 101.448 84.9435 156.849">
<line bbox=3D&quo= t;75.4124 101.448 84.9435 156.849">
<span bbox=3D"75.4124 101.448 84.9435 156.849" font=3D"La= tinModernMath-Regular" size=3D"8.36861">
<char bbox= =3D"75.4124 101.448 84.9435 156.849" c=3D"&#x221e;"= />
</span>
</line>
</block>
<block bbox=3D"= 72.2341 100.779 80.1837 179.922">
<line bbox=3D"72.2341 = 100.779 80.1837 179.922">
<span bbox=3D"72.2341 100.779 = 80.1837 179.922" font=3D"LatinModernMath-Regular" size=3D&qu= ot;11.9552">
<char bbox=3D"72.2341 100.779 80.1837 179.922" c=3D"&= #x222b;"/>
</span>
</line>
</block>
&= lt;block bbox=3D"68.8824 124.759 83.5355 180.159">
<line= bbox=3D"68.8824 124.759 83.5355 180.159">
<span bbox=3D"68.8824 124.759 83.5355 180.159" font=3D"La= tinModernMath-Regular" size=3D"8.36861">
<char bbox= =3D"68.8824 124.759 70.5728 180.159" c=3D"u"/>
&l= t;char bbox=3D"70.5728 124.759 72.2631 180.159" c=3D"&#x= dc56;"/>
<char bbox=3D"72.2633 124.759 78.7736 180.159" c=3D"=3D&q= uot;/>
<char bbox=3D"78.7741 124.759 83.5355 180.159" c= =3D"1"/>
</span>
</line>
</block><block bbox=3D"87.513 100.785 99.7105 179.928">
<line bbox=3D"87.513 100.785 99.7105 179.928">
<span = bbox=3D"87.513 100.785 94.3509 179.928" font=3D"LatinModernM= ath-Regular" size=3D"11.9552">
<char bbox=3D"8= 7.513 100.785 94.3509 179.928" c=3D"?"/>
</span>
<span bbox=3D"94.9491 109.213 99.7105 164.614"= ; font=3D"LatinModernMath-Regular" size=3D"8.36861">=
<char bbox=3D"94.9491 109.213 99.7105 164.614" c=3D"2= "/>
</span>
</line>
</block>
</page>
</d= ocument>



On Wed, May 15, 2013 at 6:14 PM, Xan <dxpublica@telefon= ica.net> wrote:
=20 =20 =20
Al 13/05/13 09:55, En/na luigi scarso ha escrit:



On Sun, May 12, 2013 at 4:32 PM, Xan <dxpublica@telefonica.net> wrote:
Hi,

I just want to know if there is any tool to convert a pdf (generated by latex or context) to latex source or context source file. Does anyone have got an experience on that?

I'm thinking about two alternatives:
* libraries for reading like podofo and custom script for passing pdf context (text) to context commands
* pass pdf to jpg, and apply http://detexify.kirelabs.org/class= ify.html for passing to tex symbols.

For me it's vital to pass mathematical symbols like (\int= ) to tex symbol and not like utf-8 symbols.

Thanks =C2=A0a lot,
Xan.


Have you seen the mudraw program of mupdf
http://ww= w.mupdf.com/
?
It has a -t switch that outputs txt=C2=A0 and a -tt and -t= tt switches that output xml.

--
luigi
Thank you for answering and sorry for delay. I will che= ck it, but I suspect that if I have

=C2=A0$$\int_{i=3D1}^{\infty} x^2$$

in one latex document and it generates pdf, then mupdf -t of that document does not generate that formula, else "S i=3D1 x=C2=B2&q= uot;.


Thanks,
Xan.



--
luigi
--089e0158b6c863763604dcc41be7-- --===============1612162852== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============1612162852==--