public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Making a filter to convert ruby characters from HTML to ConTeXt
@ 2019-08-08 16:26 Patrick Kenny
       [not found] ` <2e8d8fde-b107-41d3-ad59-bc249f8f0ae8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Patrick Kenny @ 2019-08-08 16:26 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 838 bytes --]

I'm trying to write a filter (in panflute) to convert ruby characters 
<https://en.wikipedia.org/wiki/Ruby_character> from HTML to ConTeXt.

Input HTML: 

<p>This is an example: <ruby>例<rt>レイ</rt></ruby></p>

After conversion (what I want my output to look like):

This is an example: \ruby{例}{レイ}

What's a good way to approach this kind of conversion?  I don't know how to 
target the ruby tags.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2e8d8fde-b107-41d3-ad59-bc249f8f0ae8%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2771 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Making a filter to convert ruby characters from HTML to ConTeXt
       [not found] ` <2e8d8fde-b107-41d3-ad59-bc249f8f0ae8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-08-08 17:49   ` John MacFarlane
       [not found]     ` <yh480kh86rtum3.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: John MacFarlane @ 2019-08-08 17:49 UTC (permalink / raw)
  To: Patrick Kenny, pandoc-discuss


This will show you how pandoc parses this content:

% pandoc -t native
<ruby>aa<rt>bb</rt></ruby>
^D
[Para [RawInline (Format "html") "<ruby>",Str "aa",RawInline
(Format "html") "<rt>",Str "bb",RawInline (Format "html")
"</rt>",RawInline (Format "html") "</ruby>"]]

So now you know what kind of structure you'll have to
intercept and deal with in your filter. Does that help?


Patrick Kenny <ptmkenny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I'm trying to write a filter (in panflute) to convert ruby characters 
> <https://en.wikipedia.org/wiki/Ruby_character> from HTML to ConTeXt.
>
> Input HTML: 
>
> <p>This is an example: <ruby>例<rt>レイ</rt></ruby></p>
>
> After conversion (what I want my output to look like):
>
> This is an example: \ruby{例}{レイ}
>
> What's a good way to approach this kind of conversion?  I don't know how to 
> target the ruby tags.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2e8d8fde-b107-41d3-ad59-bc249f8f0ae8%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/yh480kh86rtum3.fsf%40johnmacfarlane.net.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Making a filter to convert ruby characters from HTML to ConTeXt
       [not found]     ` <yh480kh86rtum3.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-08-09 16:52       ` Patrick Kenny
  0 siblings, 0 replies; 3+ messages in thread
From: Patrick Kenny @ 2019-08-09 16:52 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2954 bytes --]

Thank you, that does help.

I managed to walk through the document and identify the parts I want to 
change (confirmed with debugging), but they don't get changed properly (in 
the output document, the conversion doesn't occur).

How can I take the HTML and change it to TeX?

def ruby_convert(elem, doc):
    if isinstance(elem, pf.RawInline):
        if elem.text == '<ruby>':
            pf.debug(elem.text)
            elem = pf.RawInline('\\ruby', 'tex')
        elif elem.text == '<rt>':
            pf.debug(elem.text)
            elem = pf.RawInline('}{', 'tex')
        elif elem.text == '</rt>':
            pf.debug(elem.text)
            elem = pf.RawInline('}', 'tex')
        elif elem.text == '</ruby>':
            pf.debug(elem.text)
            # We can delete this because we already processed the end tag 
in </rt> 
            return []

def action(elem, doc):
    if isinstance(elem, pf.Para) and (doc.format == 'context'):
        return elem.walk(ruby_convert)


On Friday, August 9, 2019 at 2:50:10 AM UTC+9, John MacFarlane wrote:
>
>
> This will show you how pandoc parses this content: 
>
> % pandoc -t native 
> <ruby>aa<rt>bb</rt></ruby> 
> ^D 
> [Para [RawInline (Format "html") "<ruby>",Str "aa",RawInline 
> (Format "html") "<rt>",Str "bb",RawInline (Format "html") 
> "</rt>",RawInline (Format "html") "</ruby>"]] 
>
> So now you know what kind of structure you'll have to 
> intercept and deal with in your filter. Does that help? 
>
>
> Patrick Kenny <ptmk...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > I'm trying to write a filter (in panflute) to convert ruby characters 
> > <https://en.wikipedia.org/wiki/Ruby_character> from HTML to ConTeXt. 
> > 
> > Input HTML: 
> > 
> > <p>This is an example: <ruby>例<rt>レイ</rt></ruby></p> 
> > 
> > After conversion (what I want my output to look like): 
> > 
> > This is an example: \ruby{例}{レイ} 
> > 
> > What's a good way to approach this kind of conversion?  I don't know how 
> to 
> > target the ruby tags. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/2e8d8fde-b107-41d3-ad59-bc249f8f0ae8%40googlegroups.com. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4121b894-b876-48cd-b5d1-1d110f5c98bc%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 14320 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-08-09 16:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-08 16:26 Making a filter to convert ruby characters from HTML to ConTeXt Patrick Kenny
     [not found] ` <2e8d8fde-b107-41d3-ad59-bc249f8f0ae8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-08 17:49   ` John MacFarlane
     [not found]     ` <yh480kh86rtum3.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-09 16:52       ` Patrick Kenny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).