From: Z T <zzztirr-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: New custom reader for extracting content from web pages
Date: Sat, 9 Apr 2022 23:58:32 -0700 (PDT) [thread overview]
Message-ID: <1ae43988-8d34-4bdf-ba11-f875d8f69943n@googlegroups.com> (raw)
In-Reply-To: <5d9fa569-19d0-490b-88cf-1fb5fe73a400n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
[-- Attachment #1.1: Type: text/plain, Size: 2848 bytes --]
I'm in a similar position as the last response. This reader looks great,
but despite having read as much documentation as I could on lua filters, I
haven't been able to get this to work.
Is it possible to get this to work with copying the script or is it
necessary to use 'npm install -g readability-cli'? Are there other
prerequisites to using lua filters? I've
seen https://pandoc.org/lua-filters.html#lua-interpreter-initialization but
not sure I understand what the requirements are.
On Sunday, January 23, 2022 at 9:02:05 AM UTC+11 myke...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
> Tried your suggestion from private (please excuse my shyness) message.
> Found out that this is well beyond my abilities. Got node installed.
> Installed readability-cli. Discovered that I had to have an init.lua
> file. Found one and saved it in directory holding pandoc. Copied your
> script and saved it as readable.lua, in the same directory. Discovered
> that I don't seem to know where init.lua and readable.lua need to go. So
> far, when I run "pandoc -f readable.lua [et cetera]", the response is:
> "error running Lua: [new line] cannot open readable.lua: No such file or
> directory." All this was on MacOS. Tried same on MSFT WinOS; similar
> result.
> My weariness convinces me that I am Well Out of My Depth. Don't seem to
> know enough about Node, Lua, or whatever pandoc is written in. Learned
> some things about Node and Lua. Success is over an horizon too far.
> Someday.
> Thanks anyway.
> Pandoc is still wonderful.
>
>
> On Sunday, January 16, 2022 at 1:59:14 PM UTC-5 John MacFarlane wrote:
>
>>
>> I've added a new example of a custom reader, which runs
>> the 'readability-cli' program on HTML input before processing
>> it with pandoc, extracting the content and omitting navigation
>> and layout.
>>
>> See
>>
>> https://pandoc.org/custom-readers.html#example-extracting-the-content-from-web-pages
>>
>> This shows how the new custom reader interface, when combined
>> with pandoc.read in the Lua API, can be used to add
>> preprocessors.
>>
>> (Of course, you could do something similar in a shell script.
>> But doing it this way ensures that pandoc will be able to
>> retrieve resources (e.g. images) from the URL. In addition,
>> the filter does some further processing to remove structural
>> Divs that clutter the output, and it is easily customizable.)
>>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1ae43988-8d34-4bdf-ba11-f875d8f69943n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 3997 bytes --]
prev parent reply other threads:[~2022-04-10 6:58 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-16 18:58 John MacFarlane
[not found] ` <m2o84b1p8u.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2022-01-22 22:02 ` Michael Love
[not found] ` <5d9fa569-19d0-490b-88cf-1fb5fe73a400n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-04-10 6:58 ` Z T [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1ae43988-8d34-4bdf-ba11-f875d8f69943n@googlegroups.com \
--to=zzztirr-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).