public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Z T <zzztirr-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: New custom reader for extracting content from web pages
Date: Sat, 9 Apr 2022 23:58:32 -0700 (PDT)	[thread overview]
Message-ID: <1ae43988-8d34-4bdf-ba11-f875d8f69943n@googlegroups.com> (raw)
In-Reply-To: <5d9fa569-19d0-490b-88cf-1fb5fe73a400n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 2848 bytes --]

I'm in a similar position as the last response. This reader looks great, 
but despite having read as much documentation as I could on lua filters, I 
haven't been able to get this to work.

Is it possible to get this to work with copying the script or is it 
necessary to use 'npm install -g readability-cli'? Are there other 
prerequisites to using lua filters? I've 
seen https://pandoc.org/lua-filters.html#lua-interpreter-initialization but 
not sure I understand what the requirements are.


On Sunday, January 23, 2022 at 9:02:05 AM UTC+11 myke...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:

> Tried your suggestion from private (please excuse my shyness) message. 
>  Found out that this is well beyond my abilities.  Got node installed. 
>  Installed readability-cli.  Discovered that I had to have an init.lua 
> file.  Found one and saved it in directory holding pandoc.  Copied your 
> script and saved it as readable.lua, in the same directory.  Discovered 
> that I don't seem to know where init.lua and readable.lua need to go.  So 
> far, when I run "pandoc -f readable.lua [et cetera]", the response is: 
> "error running Lua: [new line] cannot open readable.lua: No such file or 
> directory."  All this was on MacOS.  Tried same on MSFT WinOS; similar 
> result.
> My weariness convinces me that I am Well Out of My Depth.  Don't seem to 
> know enough about Node, Lua, or whatever pandoc is written in.  Learned 
> some things about Node and Lua.  Success is over an horizon too far. 
>  Someday.
> Thanks anyway.
> Pandoc is still wonderful.
>  
>
> On Sunday, January 16, 2022 at 1:59:14 PM UTC-5 John MacFarlane wrote:
>
>>
>> I've added a new example of a custom reader, which runs 
>> the 'readability-cli' program on HTML input before processing 
>> it with pandoc, extracting the content and omitting navigation 
>> and layout. 
>>
>> See 
>>
>> https://pandoc.org/custom-readers.html#example-extracting-the-content-from-web-pages 
>>
>> This shows how the new custom reader interface, when combined 
>> with pandoc.read in the Lua API, can be used to add 
>> preprocessors. 
>>
>> (Of course, you could do something similar in a shell script. 
>> But doing it this way ensures that pandoc will be able to 
>> retrieve resources (e.g. images) from the URL. In addition, 
>> the filter does some further processing to remove structural 
>> Divs that clutter the output, and it is easily customizable.) 
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1ae43988-8d34-4bdf-ba11-f875d8f69943n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3997 bytes --]

      parent reply	other threads:[~2022-04-10  6:58 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-16 18:58 John MacFarlane
     [not found] ` <m2o84b1p8u.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2022-01-22 22:02   ` Michael Love
     [not found]     ` <5d9fa569-19d0-490b-88cf-1fb5fe73a400n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-04-10  6:58       ` Z T [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ae43988-8d34-4bdf-ba11-f875d8f69943n@googlegroups.com \
    --to=zzztirr-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).