public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* html checkbox to markdown
@ 2023-05-05 13:09 姓名
       [not found] ` <c528195a-0d1a-4795-9f53-5c5ddad34b4fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: 姓名 @ 2023-05-05 13:09 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 100 bytes --]

html checkbox :
<input type="checkbox" />
to markdown_github, but these nothing.
how to convert it?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html checkbox to markdown
       [not found] ` <c528195a-0d1a-4795-9f53-5c5ddad34b4fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2023-05-05 13:51   ` Albert Krewinkel
       [not found]     ` <87zg6ian3w.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Albert Krewinkel @ 2023-05-05 13:51 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


姓名 <iamhermit-9Onoh4P/yGk@public.gmane.org> writes:

> html checkbox :
> <input type="checkbox" />
> to markdown_github, but these nothing.
> how to convert it?

I'm not sure I understand what you mean. What kind of output do you
want to achieve?



-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87zg6ian3w.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html checkbox to markdown
       [not found]     ` <87zg6ian3w.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2023-05-05 14:56       ` 姓名
       [not found]         ` <0d96eb75-e25a-44b3-880d-94106f0b2cdbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: 姓名 @ 2023-05-05 14:56 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1155 bytes --]

well, there's a "input.html", the content like this:
*     <input type="checkbox" />*
and  I use command:
*    pandoc input.html -o output.md*
now open the "output.md" you will see a empty file, while I want it can be 
converted like:
*   - [ ]  *
so it is possible to achieve it?

在2023年5月5日星期五 UTC+8 22:07:10<Albert Krewinkel> 写道:

>
> 姓名 <iamh...-9Onoh4P/yGk@public.gmane.org> writes:
>
> > html checkbox :
> > <input type="checkbox" />
> > to markdown_github, but these nothing.
> > how to convert it?
>
> I'm not sure I understand what you mean. What kind of output do you
> want to achieve?
>
>
>
> -- 
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/0d96eb75-e25a-44b3-880d-94106f0b2cdbn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1826 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html checkbox to markdown
       [not found]         ` <0d96eb75-e25a-44b3-880d-94106f0b2cdbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2023-05-05 15:27           ` Gwern Branwen
       [not found]             ` <CAMwO0gwyAFsVjJFyxJBB18p6innDv0ssH1Dx4NBo3Je5BvuoeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2023-05-05 16:06           ` Albert Krewinkel
  1 sibling, 1 reply; 6+ messages in thread
From: Gwern Branwen @ 2023-05-05 15:27 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

The Pandoc HTML reader is, perhaps surprisingly, worse for reading HTML
than the Markdown reader, which will generally preserve HTML (because
Markdown is defined as a superset of HTML). So if you want to read HTML
without erasing stuff, you are generally better off specifying the
*Markdown* reader. The results can be kinda ugly, but there's no way around
it: there is no 'native' Markdown for a checkbox input, so it uses the
fallback.

Example:

    $ echo '<p><input type="checkbox" /></p>' | pandoc -f html -w markdown
    $ echo '<p><input type="checkbox" /></p>' | pandoc -f markdown -w
markdown
    ```{=html}
    <p>
    ```
    `<input type="checkbox" />`{=html}
    ```{=html}
    </p>
    ```

The HTML reader can't understand the <input> so it is silently dropped. The
Markdown reader treats it as a HTML fragment embedded in Markdown, which is
preserved as a literal, and passed through.

-- 
gwern
https://gwern.net <https://www.gwern.net>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAMwO0gwyAFsVjJFyxJBB18p6innDv0ssH1Dx4NBo3Je5BvuoeQ%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 2112 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html checkbox to markdown
       [not found]             ` <CAMwO0gwyAFsVjJFyxJBB18p6innDv0ssH1Dx4NBo3Je5BvuoeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2023-05-05 16:04               ` John MacFarlane
  0 siblings, 0 replies; 6+ messages in thread
From: John MacFarlane @ 2023-05-05 16:04 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

The proper way to do this is:

 % echo '<p><input type="checkbox" /></p>' | pandoc -f html+raw_html -t markdown
`<input type="checkbox">`{=html}`</input>`{=html}

Using the `raw_html` extension with the html reader will cause the unknown things to be included as raw HTML rather than dropped.  If you don't want the pandoc 'raw attribute' syntax, you can disable that:

% echo '<p><input type="checkbox" /></p>' | pandoc -f html+raw_html -w markdown-raw_attribute
<input type="checkbox"></input>


> On May 5, 2023, at 8:27 AM, Gwern Branwen <gwern-v26ZT+9V8bxeoWH0uzbU5w@public.gmane.org> wrote:
> 
> The Pandoc HTML reader is, perhaps surprisingly, worse for reading HTML than the Markdown reader, which will generally preserve HTML (because Markdown is defined as a superset of HTML). So if you want to read HTML without erasing stuff, you are generally better off specifying the *Markdown* reader. The results can be kinda ugly, but there's no way around it: there is no 'native' Markdown for a checkbox input, so it uses the fallback.
> 
> Example:
> 
>     $ echo '<p><input type="checkbox" /></p>' | pandoc -f html -w markdown
>     $ echo '<p><input type="checkbox" /></p>' | pandoc -f markdown -w markdown
>     ```{=html}
>     <p>
>     ```
>     `<input type="checkbox" />`{=html}
>     ```{=html}
>     </p>
>     ```
> 
> The HTML reader can't understand the <input> so it is silently dropped. The Markdown reader treats it as a HTML fragment embedded in Markdown, which is preserved as a literal, and passed through.
> 
> -- 
> gwern
> https://gwern.net
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAMwO0gwyAFsVjJFyxJBB18p6innDv0ssH1Dx4NBo3Je5BvuoeQ%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3C5955E2-B09A-4805-873C-345300ED17F7%40gmail.com.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html checkbox to markdown
       [not found]         ` <0d96eb75-e25a-44b3-880d-94106f0b2cdbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2023-05-05 15:27           ` Gwern Branwen
@ 2023-05-05 16:06           ` Albert Krewinkel
  1 sibling, 0 replies; 6+ messages in thread
From: Albert Krewinkel @ 2023-05-05 16:06 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


姓名 <iamhermit-9Onoh4P/yGk@public.gmane.org> writes:

> well, there's a "input.html", the content like this:
>      <input type="checkbox" />
> and  I use command:
>     pandoc input.html -o output.md
> now open the "output.md" you will see a empty file, while I want it
> can be converted like:
>    - [ ]  
> so it is possible to achieve it?

The HTML reader does not support the `task_lists` extension, so this is
not possible unfortunately. However, it might be possible to combine
the method mentiond by jgm with a filter to get the desired output,
although that would be a bit of work.


> 在2023年5月5日星期五 UTC+8 22:07:10<Albert Krewinkel> 写道:
>
>    
>     姓名 <iamh...-9Onoh4P/yGk@public.gmane.org> writes:
>    
>     > html checkbox :
>     > <input type="checkbox" />
>     > to markdown_github, but these nothing.
>     > how to convert it?
>    
>     I'm not sure I understand what you mean. What kind of output do
>     you
>     want to achieve?
>    
>    
>    
>     --
>     Albert Krewinkel
>     GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87v8h6ahgg.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-05 16:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-05 13:09 html checkbox to markdown 姓名
     [not found] ` <c528195a-0d1a-4795-9f53-5c5ddad34b4fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-05-05 13:51   ` Albert Krewinkel
     [not found]     ` <87zg6ian3w.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2023-05-05 14:56       ` 姓名
     [not found]         ` <0d96eb75-e25a-44b3-880d-94106f0b2cdbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-05-05 15:27           ` Gwern Branwen
     [not found]             ` <CAMwO0gwyAFsVjJFyxJBB18p6innDv0ssH1Dx4NBo3Je5BvuoeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-05-05 16:04               ` John MacFarlane
2023-05-05 16:06           ` Albert Krewinkel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).