From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/11811 Path: news.gmane.org!not-for-mail From: Jesse Rosenthal Newsgroups: gmane.text.pandoc Subject: Re: Cannot: (i) extract images from the input docx files; (ii) convert 2 docx input files into 1 html Date: Wed, 21 Jan 2015 13:47:10 -0500 Message-ID: <87zj9cc6oh.fsf@jhu.edu> References: <4e05f6a9-4c84-49f2-bbf3-f0b5c5479a59@googlegroups.com> <87k30i7q2d.fsf@jhu.edu> <87ppaad8t1.fsf@jhu.edu> <87twzmcojp.fsf@jhu.edu> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1421866034 30576 80.91.229.3 (21 Jan 2015 18:47:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 21 Jan 2015 18:47:14 +0000 (UTC) To: Andrew Yim , pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDF7DMU574PBBL7I76SQKGQEDO5AW7Y-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Jan 21 19:47:13 2015 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ob0-f189.google.com ([209.85.214.189]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YE0J6-0005IY-GX for gtp-pandoc-discuss@m.gmane.org; Wed, 21 Jan 2015 19:47:12 +0100 Original-Received: by mail-ob0-f189.google.com with SMTP id uz6sf4621095obc.6 for ; Wed, 21 Jan 2015 10:47:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=from:to:subject:in-reply-to:references:user-agent:date:message-id :mime-version:content-type:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=6CgO7K45YesS7cA04vOMGhH0iDrb3INc+2JWKuU/U6Q=; b=RfN1+brUsFs7OpaFadwHzUXIN94YfA8XdkF82i4JwF1qYwC/Llf6+DhPduvtLv3S9S o5cuiqAUv6guhXqBsvJkli1EzAGyTBQuV5DVgn9YlO6pQmpDK05dbWQp7wfDxBH9SfK9 4WRcKAB9rbCK3bR9W1Bimixl2azXNJ7/X1Nbx9da8G4BomQ1kNiJKQZoCkRr8VSEzNYe WFS3ui62W7cUW0cxu/WfX5CvCen+O6tcPq2P1qk9bMjUgXfCtzPZzWGp+8NdXAwYOnRx GzQu+BBj2YtGIoE+fvfixkZm5slhTLdG7fZXXuvx/q2GayH6VRp5yKaPsZNGGisOL3o9 JYbA== X-Received: by 10.182.219.13 with SMTP id pk13mr16386obc.39.1421866031736; Wed, 21 Jan 2015 10:47:11 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.182.94.199 with SMTP id de7ls94285obb.57.gmail; Wed, 21 Jan 2015 10:47:10 -0800 (PST) X-Received: by 10.182.110.196 with SMTP id ic4mr7246888obb.41.1421866030984; Wed, 21 Jan 2015 10:47:10 -0800 (PST) Original-Received: from smtpauth.johnshopkins.edu (smtpauth.johnshopkins.edu. [162.129.8.150]) by gmr-mx.google.com with ESMTPS id v10si710368qcf.3.2015.01.21.10.47.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 21 Jan 2015 10:47:10 -0800 (PST) Received-SPF: none (google.com: prvs=45645e71b=jrosenthal-4GNroTWusrE@public.gmane.org does not designate permitted sender hosts) client-ip=162.129.8.150; X-IronPort-AV: E=Sophos;i="5.09,443,1418101200"; d="scan'208";a="141890430" Original-Received: from guppy.hwcampus.jhu.edu (HELO localhost) ([10.161.32.234]) by IPEB2.johnshopkins.edu with ESMTP/TLS/AES128-SHA; 21 Jan 2015 13:47:11 -0500 In-Reply-To: User-Agent: Notmuch/0.19+29~g7fcd100 (http://notmuchmail.org) Emacs/24.4.1 (i686-pc-linux-gnu) X-Original-Sender: jrosenthal-4GNroTWusrE@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=none (google.com: prvs=45645e71b=jrosenthal-4GNroTWusrE@public.gmane.org does not designate permitted sender hosts) smtp.mail=prvs=45645e71b=jrosenthal-4GNroTWusrE@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:11811 Archived-At: Dear Andrew, I cut some of the text out of it, and used it as a test. The fix has now been pushed. Note though that it will be quite easy to get weird result using these sort of images, since they're often placed absolutely on the page, and anchored to text in weird ways. I'll try to solve the problems as they come up. Best, Jesse Andrew Yim writes: > Hi Jesse > > Would this one be ok? Thanks. > > Andrew > > On Mon, Jan 19, 2015 at 11:56 PM, Jesse Rosenthal > wrote: > >> Hi Andrew, >> >> This is actually a bit heavy for a test case, since it has to become >> part of the permanent repo -- and because it also exhibits a lot of >> behavior that might lead to failing tests for different reasons. What >> would be great is something with one line of text and one small image, >> that the current pandoc doesn't convert properly. >> >> The saved images would be in their original format. >> >> Best, >> Jesse >> >> Andrew Yim writes: >> >> > Hi Jesse >> > >> > Thanks. Not sure if the attached docx file can serve the test function. >> It >> > has 7 pages with 3 images, taken out from the file in my original post. >> It >> > was saved in Word 2007 .docx format with my Word 2003. The pdf shows >> what's >> > insider the file. Would the extracted images be saved as .png files or >> what >> > format? >> > >> > Thanks again. >> > >> > On Mon, Jan 19, 2015 at 4:39 PM, Jesse Rosenthal >> wrote: >> > >> >> Okay -- I implemented this change to read images on old versions of >> >> word. If Andrew (or anyone else with an older word using vml) could send >> >> a minimal file to use as a test case, I'll push it. (Unfortunately, I >> >> can't make the test file myself, since I don't have access to a version >> >> of word that uses vml.) >> >> >> >> -- >> >> You received this message because you are subscribed to a topic in the >> >> Google Groups "pandoc-discuss" group. >> >> To unsubscribe from this topic, visit >> >> >> https://groups.google.com/d/topic/pandoc-discuss/Xsg-phpJSCk/unsubscribe. >> >> To unsubscribe from this group and all its topics, send an email to >> >> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> >> To view this discussion on the web visit >> >> >> https://groups.google.com/d/msgid/pandoc-discuss/87ppaad8t1.fsf%40jhu.edu. >> >> For more options, visit https://groups.google.com/d/optout. >> >> >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/CAFbbbDPWanTwpD6uXU%3DfqLasQkM36iAEDj7sbLN7_uih8H%2B1Kw%40mail.gmail.com >> . >> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "pandoc-discuss" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/pandoc-discuss/Xsg-phpJSCk/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/87twzmcojp.fsf%40jhu.edu. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFbbbDMo7OOWCt2-L9euCGmWqU_LDWMYPACwvZCeaZ9JRWiUWg%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout.