public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Normalizing spaces in italics
@ 2022-07-01 16:36 r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
       [not found] ` <bd84993b-b1cd-4128-aab2-ce1eff2c9768n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2022-07-01 16:36 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1073 bytes --]

I am a bit sloppy typing italics in my wordprocessor, and generally only 
turn off the italics after I hit the space at the end of the word, so I end 
up with markdown output that looks like this (when I convert from rtf to 
md):

Strictly speaking the qualities that are imposed by the *logos *of a 
certain thing are the *activities *of the *logos*

This looks ugly when I open it up in Emacs etc. I can fix these with regex 
replace in Emacs; but I thought pandoc had normalization by default now, 
which is supposed to fix these kinds of stylistic errors? I tried passing 
the markdown again through pandoc, to generate markdown, but it made no 
difference.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/bd84993b-b1cd-4128-aab2-ce1eff2c9768n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1402 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Normalizing spaces in italics
       [not found] ` <bd84993b-b1cd-4128-aab2-ce1eff2c9768n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-07-02  8:49   ` BPJ
       [not found]     ` <CADAJKhCj=dCQ+1BkzkK7++bJn8ajpKkbxYHYVrHaC_NRjVQ15Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: BPJ @ 2022-07-02  8:49 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 2500 bytes --]

I use this Lua filter to clean up when I convert from DOCX.

``````lua
local function handler (elem)
  -- Get the length of the content
  len = #elem.content
  -- Check that the content isn't empty
  if 0 < len then
    -- Is the last child a space?
    if 'Space' == elem.content[len].tag then
      -- Remove the space (last child)
      elem.content:remove()
      -- Return a space *after* the element
      return { elem, pandoc.Space() }
    end
  end
  return nil
end

return {
  {
    Emph      = handler,
    Strong    = handler,
    Strikeout = handler,
    SmallCaps = handler,
    Underline = handler,
    Span      = handler,
    Link      = handler,
  }
}
``````

Den fre 1 juli 2022 18:37r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <r.d.goulding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> I am a bit sloppy typing italics in my wordprocessor, and generally only
> turn off the italics after I hit the space at the end of the word, so I end
> up with markdown output that looks like this (when I convert from rtf to
> md):
>
> Strictly speaking the qualities that are imposed by the *logos *of a
> certain thing are the *activities *of the *logos*
>
> This looks ugly when I open it up in Emacs etc. I can fix these with regex
> replace in Emacs; but I thought pandoc had normalization by default now,
> which is supposed to fix these kinds of stylistic errors? I tried passing
> the markdown again through pandoc, to generate markdown, but it made no
> difference.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/bd84993b-b1cd-4128-aab2-ce1eff2c9768n%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/bd84993b-b1cd-4128-aab2-ce1eff2c9768n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCj%3DdCQ%2B1BkzkK7%2B%2BbJn8ajpKkbxYHYVrHaC_NRjVQ15Q%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4136 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Normalizing spaces in italics
       [not found]     ` <CADAJKhCj=dCQ+1BkzkK7++bJn8ajpKkbxYHYVrHaC_NRjVQ15Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2022-07-02 21:13       ` r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
  2022-07-05  8:26       ` John MacFarlane
  1 sibling, 0 replies; 4+ messages in thread
From: r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2022-07-02 21:13 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2697 bytes --]

It works perfectly! Thanks, saved me a lot of manual fixing of files

On Saturday, July 2, 2022 at 4:50:05 AM UTC-4 BP wrote:

> I use this Lua filter to clean up when I convert from DOCX.
>
> ``````lua
> local function handler (elem)
>   -- Get the length of the content
>   len = #elem.content
>   -- Check that the content isn't empty
>   if 0 < len then
>     -- Is the last child a space?
>     if 'Space' == elem.content[len].tag then
>       -- Remove the space (last child)
>       elem.content:remove()
>       -- Return a space *after* the element
>       return { elem, pandoc.Space() }
>     end
>   end
>   return nil
> end
>
> return {
>   {
>     Emph      = handler,
>     Strong    = handler,
>     Strikeout = handler,
>     SmallCaps = handler,
>     Underline = handler,
>     Span      = handler,
>     Link      = handler,
>   }
> }
> ``````
>
> Den fre 1 juli 2022 18:37r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>
>> I am a bit sloppy typing italics in my wordprocessor, and generally only 
>> turn off the italics after I hit the space at the end of the word, so I end 
>> up with markdown output that looks like this (when I convert from rtf to 
>> md):
>>
>> Strictly speaking the qualities that are imposed by the *logos *of a 
>> certain thing are the *activities *of the *logos*
>>
>> This looks ugly when I open it up in Emacs etc. I can fix these with 
>> regex replace in Emacs; but I thought pandoc had normalization by default 
>> now, which is supposed to fix these kinds of stylistic errors? I tried 
>> passing the markdown again through pandoc, to generate markdown, but it 
>> made no difference.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/bd84993b-b1cd-4128-aab2-ce1eff2c9768n%40googlegroups.com 
>> <https://groups.google.com/d/msgid/pandoc-discuss/bd84993b-b1cd-4128-aab2-ce1eff2c9768n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/eb95ef1e-8b32-4454-95ca-94794db16961n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4755 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Normalizing spaces in italics
       [not found]     ` <CADAJKhCj=dCQ+1BkzkK7++bJn8ajpKkbxYHYVrHaC_NRjVQ15Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2022-07-02 21:13       ` r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
@ 2022-07-05  8:26       ` John MacFarlane
  1 sibling, 0 replies; 4+ messages in thread
From: John MacFarlane @ 2022-07-05  8:26 UTC (permalink / raw)
  To: BPJ, pandoc-discuss


Might be good to build this into the docx reader.

BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I use this Lua filter to clean up when I convert from DOCX.
>
> ``````lua
> local function handler (elem)
>   -- Get the length of the content
>   len = #elem.content
>   -- Check that the content isn't empty
>   if 0 < len then
>     -- Is the last child a space?
>     if 'Space' == elem.content[len].tag then
>       -- Remove the space (last child)
>       elem.content:remove()
>       -- Return a space *after* the element
>       return { elem, pandoc.Space() }
>     end
>   end
>   return nil
> end
>
> return {
>   {
>     Emph      = handler,
>     Strong    = handler,
>     Strikeout = handler,
>     SmallCaps = handler,
>     Underline = handler,
>     Span      = handler,
>     Link      = handler,
>   }
> }
> ``````
>
> Den fre 1 juli 2022 18:37r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <r.d.goulding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>
>> I am a bit sloppy typing italics in my wordprocessor, and generally only
>> turn off the italics after I hit the space at the end of the word, so I end
>> up with markdown output that looks like this (when I convert from rtf to
>> md):
>>
>> Strictly speaking the qualities that are imposed by the *logos *of a
>> certain thing are the *activities *of the *logos*
>>
>> This looks ugly when I open it up in Emacs etc. I can fix these with regex
>> replace in Emacs; but I thought pandoc had normalization by default now,
>> which is supposed to fix these kinds of stylistic errors? I tried passing
>> the markdown again through pandoc, to generate markdown, but it made no
>> difference.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/bd84993b-b1cd-4128-aab2-ce1eff2c9768n%40googlegroups.com
>> <https://groups.google.com/d/msgid/pandoc-discuss/bd84993b-b1cd-4128-aab2-ce1eff2c9768n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCj%3DdCQ%2B1BkzkK7%2B%2BbJn8ajpKkbxYHYVrHaC_NRjVQ15Q%40mail.gmail.com.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-05  8:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-01 16:36 Normalizing spaces in italics r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
     [not found] ` <bd84993b-b1cd-4128-aab2-ce1eff2c9768n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-07-02  8:49   ` BPJ
     [not found]     ` <CADAJKhCj=dCQ+1BkzkK7++bJn8ajpKkbxYHYVrHaC_NRjVQ15Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-07-02 21:13       ` r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2022-07-05  8:26       ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).