Issue with lua filter parsing a string including space

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Issue with lua filter parsing a string including space
@ 2019-12-09 18:13 JooYoung Seo
       [not found] ` <4089c1e2-a9b8-4943-875c-3ef454155536-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: JooYoung Seo @ 2019-12-09 18:13 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 820 bytes --]

Hello,

I was wondering if anyone could give me a hand.

To bold a specific author's name (Smith, J.) in my MD file, I used the 
following lua filter; however, the regex %s does not take effect at all.

```lua
function Str(el)
  if el.text:match("Smith,%sJ.") then
    return pandoc.Strong(el)
  end
end
```

Why doesn't pandoc catch a space with the regex above? Would there be any 
way to achieve this goal?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4089c1e2-a9b8-4943-875c-3ef454155536%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1312 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issue with lua filter parsing a string including space
       [not found] ` <4089c1e2-a9b8-4943-875c-3ef454155536-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-12-09 19:14   ` John MacFarlane
  2019-12-09 20:10   ` EBkysko
  1 sibling, 0 replies; 5+ messages in thread
From: John MacFarlane @ 2019-12-09 19:14 UTC (permalink / raw)
  To: JooYoung Seo, pandoc-discuss


Use 'pandoc -t native' to see what AST pandoc produces for
your input. You'll see that "Smith, J." turns into three
elements (Str, Space, Str), so the whole string isn't
there to be matched in one Str element.

JooYoung Seo <sjysky-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hello,
>
> I was wondering if anyone could give me a hand.
>
> To bold a specific author's name (Smith, J.) in my MD file, I used the 
> following lua filter; however, the regex %s does not take effect at all.
>
> ```lua
> function Str(el)
>   if el.text:match("Smith,%sJ.") then
>     return pandoc.Strong(el)
>   end
> end
> ```
>
> Why doesn't pandoc catch a space with the regex above? Would there be any 
> way to achieve this goal?
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4089c1e2-a9b8-4943-875c-3ef454155536%40googlegroups.com.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issue with lua filter parsing a string including space
       [not found] ` <4089c1e2-a9b8-4943-875c-3ef454155536-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2019-12-09 19:14   ` John MacFarlane
@ 2019-12-09 20:10   ` EBkysko
       [not found]     ` <e19b92d4-29a8-4d3d-bec9-8b8e288f9ed7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  1 sibling, 1 reply; 5+ messages in thread
From: EBkysko @ 2019-12-09 20:10 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 3001 bytes --]

Firstly, I'll repeat what I told you on the pandoc Issues: inserting a non 
breaking space (`Smith,\ J.`) would help keep things being converted in a 
single Str... although if it was followed e.g. by a comma, that would be 
included too: `Smith,\ J.,` would convert to `Str("Smith,\ J.,")`, so that 
would have to be accounted for.

Secondly, one quick way to deal with the problem is a filter that could 
look like:

function Block (el)
  if el.t == "Para" or el.t == "Plain" then
    for k,_ in ipairs(el.content) do

      if el.content[k].t == "Str" and el.content[k].text == "Smith,"
      and el.content[k+1].t == "Space"
      and el.content[k+2].t == "Str" and el.content[k+2].text:find("^J.") 
then

          local _,e = el.content[k+2].text:find("^J.")
          local rest = el.content[k+2].text:sub(e+1)  -- empty if e+1>length
          el.content[k] = pandoc.Strong { pandoc.Str("Smith, J.") }
          el.content[k+1] = pandoc.Str(rest)
          table.remove(el.content, k+2)  -- safe? another way would be to 
set element k+2 to Str("")
          -- no real need to skip ipairs items here

      end

    end
  end
  return el
end

In the above, we looked for `Str "Smith,"`, and then made sure it is 
followed by Space, and then by a Str whose content *begins* with "J." 
(because if we have something like "Smith, J., lawyer", then we'll have Str 
"J.,").

That may *not *be the best approach, because it would not work if the text 
"Smith, J." is in some inline span or inline element (like an emphasis); in 
those cases, you'd have to apply inside there also.
And it only works as if for "Smith, J."; a better way would be with 
variable name as input.
There may be other subtleties not covered by that code.
Also, if that bit of text is only in some portion of the document, try to 
run the filter only there.

Another way, if those author names are in paragraph with pure text (only 
Str and Space elements), would be to stringify the Para/Plain, split that 
string into parts, put the author name part in Strong element, and rebuild 
the Para/Plain. But only you know how your text look like, and this might 
not apply.

Finally, the way I personnaly prefer, especially if there's lot of these 
author names, would be to markup these names in the MD document itself, 
like [Smith, J.]{.author} or [Smith, J.]{.name}. It's still legible, and 
would be easier to manipulate.
That could be done by preprocessing the text in your editor, or some other 
language (python, native lua on your system, perl, etc), which can be 
easier than through navigating AST nodes.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e19b92d4-29a8-4d3d-bec9-8b8e288f9ed7%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 20945 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issue with lua filter parsing a string including space
       [not found]     ` <e19b92d4-29a8-4d3d-bec9-8b8e288f9ed7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-12-09 20:25       ` EBkysko
       [not found]         ` <08120a2f-45da-43b7-9b8d-3a6a05769faa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: EBkysko @ 2019-12-09 20:25 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 694 bytes --]

/Edit: well, was too quick with that code... you have to check k+1 and k+2 
exist of course (are less or equal than #el.content)... An error occurs if 
you have the weird case of having a paragraph *ending* with the text 
"Smith," or "Smith, ".

So take that code as a rough guide...

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/08120a2f-45da-43b7-9b8d-3a6a05769faa%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1071 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issue with lua filter parsing a string including space
       [not found]         ` <08120a2f-45da-43b7-9b8d-3a6a05769faa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-12-09 22:50           ` JooYoung Seo
  0 siblings, 0 replies; 5+ messages in thread
From: JooYoung Seo @ 2019-12-09 22:50 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1052 bytes --]

Thank you so much for kindly walking me through each detail, @EBkysko
and @John!!

Finally, I have been able to make it bolded!!
I am so new to lua's interesting syntax so I was not able to solve
this problem by myself.

I really appreciate it again!

All the best,

JooYoung

On Monday, December 9, 2019 at 3:25:43 PM UTC-5, EBkysko wrote:
>
> /Edit: well, was too quick with that code... you have to check k+1 and k+2 
> exist of course (are less or equal than #el.content)... An error occurs if 
> you have the weird case of having a paragraph *ending* with the text 
> "Smith," or "Smith, ".
>
> So take that code as a rough guide...
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ae916500-6d3d-4426-9de5-257fc7637d0f%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3228 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-12-09 22:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09 18:13 Issue with lua filter parsing a string including space JooYoung Seo
     [not found] ` <4089c1e2-a9b8-4943-875c-3ef454155536-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-12-09 19:14   ` John MacFarlane
2019-12-09 20:10   ` EBkysko
     [not found]     ` <e19b92d4-29a8-4d3d-bec9-8b8e288f9ed7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-12-09 20:25       ` EBkysko
     [not found]         ` <08120a2f-45da-43b7-9b8d-3a6a05769faa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-12-09 22:50           ` JooYoung Seo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).