public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* converting html to markdown; want: simple results as in 'w3m -dump'
@ 2018-07-26 11:20 pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ
       [not found] ` <201807261120.w6QBKd5l002412-Iacv5gYTstuYo1hQQC0LMg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ @ 2018-07-26 11:20 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Consider: 

    wget -O - https://ptpb.pw/i_Fw |lynx -stdin 

        # I would like to use pandoc to convert above html to a simple
        # "bare bones" markdown version that I can edit with vim. I
        # tried various approaches but results are not friendly.

Pls show me a commandline example that converts this html to a
markdown version that is fairly simple - ideally w/an empty line
between paragraphs of text.

When I run: 

    wget -O - https://ptpb.pw/i_Fw| w3m -dump  -cols 78 -T text/html -O utf-8 
        # It seems to figure out the paragraphs, inserting empty
        # lines. Unfortunately there are no links.

--
thanks!
Tom


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: converting html to markdown; want: simple results as in 'w3m -dump'
       [not found] ` <201807261120.w6QBKd5l002412-Iacv5gYTstuYo1hQQC0LMg@public.gmane.org>
@ 2018-07-26 13:05   ` Joseph Reagle
       [not found]     ` <62483e07-d68d-1460-3260-36fbd9c3f45b-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph Reagle @ 2018-07-26 13:05 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 7/26/18 7:20 AM, pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ@public.gmane.org wrote:
> Pls show me a commandline example that converts this html to a
> markdown version that is fairly simple - ideally w/an empty line
> between paragraphs of text.

I don't understand if your issue is with the format of the source or the pandoc conversion?

The URL you are providing is not served as HTML but as text/plain.

And even as HTML, it's invalid HTML. It can be cleaned up some with `tidy -clean` or `lynx -preparsed -source` but that doesn't help with the ultimate problem. Markdown permits HTML within it, and pandoc tends to pass it through.

1. If you ask pandoc to convert to "markdown", it'll try to render the tables and styled content and pass other stuff.
2. If you ask pandoc to convert to "markdown_strict", it'll pass most all of it through verbatim.

You might be able to strip it down with a pandoc filter or XSLT but that is no longer fairly simple.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: converting ("bad"?) html to markdown; want: simple results as in 'w3m -dump'
       [not found]     ` <62483e07-d68d-1460-3260-36fbd9c3f45b-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
@ 2018-07-26 15:43       ` pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ
       [not found]         ` <201807261543.w6QFhPHi027357-Iacv5gYTstuYo1hQQC0LMg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ @ 2018-07-26 15:43 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hi Joseph:

On Thu 7/26/18 9:05 -0400 (Joseph Reagle) pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org wrote:
>On 7/26/18 7:20 AM, pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ@public.gmane.org wrote:
>> Pls show me a commandline example that converts this html to a
>> markdown version that is fairly simple - ideally w/an empty line
>> between paragraphs of text.
>
>I don't understand if your issue is with the format of the source or the pandoc conversion?

Probably both.  

If the source is "garbage" or poorly written, I would like to give the
author constructive criticism, but I'm not knowledgable enough - what should
I say to them?

>The URL you are providing is not served as HTML but as text/plain.

I uploaded the html mime "part" of an email I received to the
pastebin https://ptpb.pw/i_Fw (search this email for mhlist).

>And even as HTML, it's invalid HTML. It can be cleaned up some with
>`tidy -clean` or `lynx -preparsed -source` but that doesn't help with
>the ultimate problem. Markdown permits HTML within it, and pandoc
>tends to pass it through.

I tried your `tidy -clean` or `lynx -preparsed -source` suggestions, thanks;
but the results were still not close to simple - probably no surprise to you.

Also tried:

    wget -O - https://ptpb.pw/i_Fw   |pandoc -f html -t markdown  --columns=160|egrep -v '^[-+| ]+$' |less

At least the grep got rid of the pipe delimited lines with no info.

>1. If you ask pandoc to convert to "markdown", it'll try to render the tables and styled content and pass other stuff.
>2. If you ask pandoc to convert to "markdown_strict", it'll pass most all of it through verbatim.
>
>You might be able to strip it down with a pandoc filter or XSLT but that is no longer fairly simple.

Is there a way to get pandoc to behave more like lynx -dump, for this case?
I'm guessing the answer is no.

Thanks again for your help.  Sorry for my lack of background in HTML and CSS.

--
regards,
Tom
--
$ mhlist
 msg part  type/subtype              size description
  67       multipart/alternative      93K
             boundary="----=_Part_593049238_1281275443.1532459126632"
     1     text/html                  80K
             charset="utf-8"
     2     text/plain                7199
             charset="utf-8"
$ mhstore -auto
storing message 67 part 1 as file /a/Areopagitet/tmp/nmh/67.1.html.1
$ head /a/Areopagitet/tmp/nmh/67.1.html.1
<!DOCTYPE html>
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<meta content="width=device-width, initial-scale=1.0" name="viewport">
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<!--[if gte mso 9]>
<style id="ol-styles">
/* OUTLOOK-SPECIFIC STYLES */


^ permalink raw reply	[flat|nested] 6+ messages in thread

* converting ("bad"?) html to markdown; want: simple results as in 'w3m -dump'
       [not found]         ` <201807261543.w6QFhPHi027357-Iacv5gYTstuYo1hQQC0LMg@public.gmane.org>
@ 2018-07-26 20:43           ` Joseph Reagle
       [not found]             ` <b79653fb-8f2d-07a9-03da-09184a4742ee-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph Reagle @ 2018-07-26 20:43 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 7/26/18 11:43 AM, pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ@public.gmane.org wrote:
> If the source is "garbage" or poorly written, I would like to give
> the author constructive criticism, but I'm not knowledgable enough -
> what should I say to them?

It looks like it's being written in some WYSWIG editor that cares nothing for semantic structure, so there's not much to recommend. It's all just divs and spans with tons of formatting attributes.

> wget -O - https://ptpb.pw/i_Fw   |pandoc -f html -t markdown
> --columns=160|egrep -v '^[-+| ]+$' |less

This is the best I can get it but it ain't easy. Using wget, python (with bleach library), pandoc, and perl!

wget -O - https://ptpb.pw/i_Fw | python3 -c "import sys, bleach; print(bleach.clean(''.join(sys.stdin), tags=['a', 'br', 'html', 'span', 'style'], attributes=['href'], styles=[], strip=True))" | pandoc -f html -t markdown | perl -pe 's/\\/\n/g'


---

There are so many urgent issues
[](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEYH21mRwFvV3hvNztjfxUZHDlEykS_h7XyN_bRo0cmrHSkKHA5mG19yNdezbc8SiptkE9GY9pRJVS3XGc-751XG7Esc0I6DolQ==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)
Protecting the Rights of People & Nature From the Local Up 
[Subscribe](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEeSGfQXfiZtn6ktIkxJyXflFaG4ydPRu4QQEROpA2bSnxMsxkP_ULCxiIgJXRaUXgyARBgheyei6CEfoRkO6cnt-_9iRjVXCougNim-4U3sJK2p21rfKbGl6e5gqG62ol6A1U5MlJGY7nRj1JhYA0MT15m6k_akr9o04vScbHIiy8NQ1yMWkvgvmcRkBjgYrFyGFn4D7hMen8PAWj3ZsnNKnpStWhO8ZsT7pzP1_8k6gdix2SVzjd3z_U37PjA_nKpahOe_LxafGjeQL5rKeOjql56S6kbLrZw==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

| [CRUS
homepage](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEYH21mRwFvV3hvNztjfxUZHDlEykS_h7XyN_bRo0cmrHSkKHA5mG19yNdezbc8SiptkE9GY9pRJVS3XGc-751XG7Esc0I6DolQ==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

|
[Donate](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBES2kpyMahtqo0l_KkeuVNB0aYX1GTDAbCz-kakeAKmaWxKidpc4R1_nRCkbbmvGFXZb9ve35pn704cpsNOA1kv7PeM9vFLt6twe3N_zyS8hIvHS86gViv9_jWdk3IeNpdu83jaKCtQjv&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)
Hi Nil,

Out here on the West Coast, we're getting some gorgeous hot days, which
I am absolutely loving, having spent too many previous years living in
the coastal fog zone. 

And in just a few days, I go off grid for a two-week group retreat in
the woods to recharge and breathe deep.

In the interim, we've been busy reorganizing ourselves after our
crowd-funding adventure brought in a substantial amount of money last
winter & spring
--about 
$23,000 (though less than we had anticipated).


Thank you for your ongoing support of our critical work across the
country. (And if you haven't already contributed a one-time or monthly
donation of any size - no amount too small! - you can do
so [HERE](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBES2kpyMahtqo0l_KkeuVNB0aYX1GTDAbCz-kakeAKmaWxKidpc4R1_nRCkbbmvGFXZb9ve35pn704cpsNOA1kv7PeM9vFLt6twe3N_zyS8hIvHS86gViv9_jWdk3IeNpdu83jaKCtQjv&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)).
We've launched a new working [Board of
Directors](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SgoyTbbaMjtEOqYmLg10jUdDdGHHFALn1NM_CqrZvJ1qyiG7JHAsLBam8ptOBLzGIHx5NIf-Ubo7GQsnX0gZxAqhUzcfejaWV-tO6wipnOEo=&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==),
which now meets monthly via web conferencing. 

And our
[website homepage](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEYH21mRwFvV3hvNztjfxUZHDlEykS_h7XyN_bRo0cmrHSkKHA5mG19yNdezbc8SiptkE9GY9pRJVS3XGc-751XG7Esc0I6DolQ==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==) has
become THE go-to spot for all news and analysis from and about the
larger Community Rights movement, thanks to our intrepid part-time staff
person Curt Hubatch, up in NW Wisconsin.  I recently returned home from
my 29th teaching visit to the Driftless region of our country, where
Iowa, Wisconsin, Illinois and Minnesota meet along the Big Muddy - the
breathtaking Mississippi River. 

Highlights of my trip included:

An invitation to present and answer questions for two full hours at a
specially called session of the Viroqua, Wisconsin City Council. The
session was also attended by more than 40 local residents excited about
bringing a Community Rights approach to solving local issues. The local
Crawford County Independent newspaper ran a detailed story about the
presentation [HERE](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SO6Y7rgz5_4TJDOuG7JCEFyvqwlfFmiipdQB88qQ5T2ya_VQSjA5OMrWHR1vGAkAAzVw_tW_Zq8Kd-aGo4esXin8Ha_jTbjNfSF23qxbwrGYQq-0MTkdqbmpCQhDF2zds1yiHwsjwpchQJVo1OHS0A8wBqP-kf5_LwYGQZCD7L4U=&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==).
And you can view my
presentation [HERE](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SpJ53hAkMgP94GCwSxY0qLEwOBQDWQYNA6Mb7BpK-lhhdISspFbS0oIGz7S80GRCCjF7rKFbigot9ppWIxhx168-3ao6ak5sVVhMD57x4unJK7xYftHDz1I62MVP7uLuuhDWVVO_2lCcL_5S8nr3Nvw==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==).

Introductory Community Rights workshops in a number of Wisconsin and
Iowa towns, including in Baraboo, Wisconsin, where my co-host is a
long-time elected member of the Sauk County Board of Supervisors; and in
Lansing, Iowa, where my co-host is running for a seat on the Allamakee
County Board of Supervisors. 

A day-long planning session in preparation to launch our new national
ThinkTank on Rethinking the Regulatory State. It will publish policy
papers, and constructively nudge journalists and single-issue activists
to deepen their understanding of the true history and purpose
of [regulatory
agencies](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6S_QZz1_1z4HVlqvMv39yrU3zo-xBERvTOmFBoPETqWpJBask5UYyErEHEWWnW3cntoXo-K9WfNZVcSAgrxYwe9SOeFI00lHhz1nTBa60FsTXx7_PjsshPzA==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==).
Contrary to popular belief, regulatory law is not and never has been
about protecting human health, labor rights, and the environment. Our
president's dismantling of the regulatory state has created a golden
opportunity for us to re-imagine what We want our government agencies to
do to protect us.

An [OpEd piece](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SU0-x1I7RbdvkfphJeYknGEIRViVcV7XhO-4wbTXRHLHVg8y2FngWndV5YeOeby3dir2_2CjNplUY9Uz7JI8t7pKa8Gtzq7j9naS4-J4fCctIuU-tE8v2UJLXvQCqeuvUMO7ZWKaZwXbSIA0wvhPBZSmGcg7wOwu3M6o0xG5jv6WpZwywB8mNuUPZfVXgD6eWntz9KvNDaB-Rzsdp2LUx88byYKbcvCcBB1b34pu_IHo-UPy_d0nSqQ==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)
I wrote that was published in the Decorah, Iowa local newspaper,
reminding local residents that they have the power and authority to take
on Alliant Corporation in its drive to stop the community from creating
its own public electric utility.

One of the most significant political changes that have happened due to
my dozens of visits to this region since 2013 is an increasingly
widespread realization among local residents that their local elected
officials are the wrong people, and that those with a lot more political
backbone and vision should be the ones making the key decisions that
affect the people and their environment. 

So these newly awakened folks are running for local office and most of
them are winning! 

There are so many urgent issues not being properly addressed in this
region, such as rapidly increasing factory farms, new and expanded
fossil fuel pipelines, and frac sand mines serving the fracking
industry. So we're beginning to change the predominant narrative in town
and county halls across this region.

And I feel confident that in this next year we're going to witness an
uprising of rural residents who have had enough of their state
governments pushing them around. 

Many state elected officials are going to lose their seats to
pro-democracy anti-corporatist candidates who are ready to stand with
working people and defend our human and other natural communities
against corporate atrocities. 

This is an incredibly exciting moment for social change movements all
across this beautiful country of ours. I hope you will [contact
us](mailto:info-Kj657vGfITAHso8PZWHMfRQ3xjPIlXSw@public.gmane.org) soon at Community Rights US and get
involved!  All my best,

Paul Cienfuegos Founding Director, [Community Rights
US](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEYH21mRwFvV3hvNztjfxUZHDlEykS_h7XyN_bRo0cmrHSkKHA5mG19yNdezbc8SiptkE9GY9pRJVS3XGc-751XG7Esc0I6DolQ==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

P.S. If you're considering bringing me to lead a workshop in your
community, I'd love to hear from you, as my Fall calendar is already
starting to fill up! Contact me at
[info
@CommunityRights.US](mailto:info-Kj657vGfITAHso8PZWHMfRQ3xjPIlXSw@public.gmane.org). In other
Community Rights US news In Menomonie, WI: Local Community Rights leader
Joan Pougiales is beginning to conceptualize and organize a network of
mutual aid and solidarity between this region's local governments and
local residents, preparing for the day that these local governments
start passing Community Rights ordinances and then face possible
expensive lawsuits. 

The idea is that communities could offer each other financial support in
such situations, or even run identical Community Rights ordinances
simultaneously in multiple communities.



In Athens, Georgia Community Rights activist Carla Cao, until recently a
grad student at the University of Georgia in Athens, has been working
actively with me this past year towards developing a new Rights of
Nature consciousness in that community. 

She has brought artists, musicians, biologists, and other
environmentally concerned residents together to conceptualize a
Community Rights ordinance campaign to begin to protect the heavily
contaminated North Oconee River, which is the drinking water source for
the city. 

In April, they organized an extraordinary and very successful "[Arts on
the River
Celebration](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SzXzQb1lHtFsmHzwvNcEUAukkeAI7ia8MaDup_83JKijJ2CuOFjYSkPgWeAR3QQNPrkCneMCStyRhR1uziP4YsFAWTDpx8oXjf2lQ6PUANB5cRJr9MKjosDYP44j1zzpP4ujJNwImT42RIenA5eRJRUPryh9LjpC1b2JM4k2JpgG4YO90QCc2QA==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)"
to begin to realize this bold community-wide vision.

In the Midwest & Colorado: We're excited but not quite yet able to
report to all of you the details of a keynote speech invitation for a
Farmers Union conference in the Midwest, as well as a conference
workshop in Denver later this year. Details in our next newsletter.
Essential CR News from the Web
[](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEYH21mRwFvV3hvNztjfxUZHDlEykS_h7XyN_bRo0cmrHSkKHA5mG19yNdezbc8SiptkE9GY9pRJVS3XGc-751XG7Esc0I6DolQ==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)
[Ending the Environmental Exploitation of Boulder County Through Direct
Democracy](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SUEK5SMvpyW5_HyJSm6Rny72ZpZvirD_u2LurG7kVVXayOiDr2RPKi4UQNb4OTUHuZx5sqhGYyVt3Fa8mO_BapdsTJ4Jxqq-QXRS9Vt5DyIEsEfF9St5n1WoppsNoOdqEbU2iqcbhMAqgrIulz0Je0yB-Vh34ouQez0yuk2zqIGeqrxUWAaEpodoQtdqp9bgnFx5fsSLfKkbzYVUq0hgI-Q==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

[Fake Grassroots Campaigns Deserve
Uprooting](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SvygXyaUeEV3pnVhiziUSHKLrW2BaecUQhgE53BdqtPtXsgM2-F7ytzRkJGVDWK6c-GpuOYUskEE4bpsP4oxAb69Pp4I1S5ieLAaB1p4vHxYv8UaiY08AS4HB16iLOhdD6FpHGZzQgGgU__IjUa2-XRAAbucsLfK62Prl1JPPFfAA4oKo7K_ETu9r5pnHkEjk&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

[The Florida Legislature Keeps Stomping on Local
Laws](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SoRHdFEuz3Rbdd4UNowPTVOJIhHIPkCkrf5CiOq0sZNntZT1d8YDkYBRLRPO9TQIAEwwqhliTRLLaZIQJlVlDiIZDwHq5COcxfWyf4WCNVBhnBVI_p0OndqhRGzpS2Yq_-qKTSWzPtEDV0AJipPPLMfRuXCCDhItPXZSMHL4yAItx0Le-KCmzvA==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

[Editorial: We're Smoking Mad That Harrisburg Won't Let Philly
Govern](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SDYAyndr9aJjRchMcmx8UJqgdJe8Q4hyCUY7btN9UKLfzKltu87UTQf_6OSf3aBC_Lz4XqfR1wzujibH8AiigcpcK5TAvm6rv4JOmIosVrmYwY1kKmrYc1T55VXS6a4vzAtyUfhGfn0t_2ladu7elxPeFJWngrYQXU_gZ36ZKdn-QTLOpy7qDxyWXa3P7EC9Q1aSNybdJS6Q=&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

[Wireless Industry Using First Amendment as a Cudgel in Its Battle
Against Safety
Warnings](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SQ4bOfR7qb0a26vVAnaSbzQkgf0OD47uUOHmYEJQxwf0lg3NYXGtiwvl8aq_PoRUx_m6qonkH1FiLwoiuaHcusjzHyUvQsn1838oPO-xqLLbnymGm76kQLOu9yCgc9LJAEJ4rNNqlZ7zHctJtDzO_6V2lKeKjlcuaGGhXDm1uc_e0IbLp42Tgt_vLj12hxtZ9VVjRB-ZedUOxzA6pB1jFtpkHyZFSawimMpkEryw4xLY=&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

[Press Release: Pennsylvania Township Bans Corporate Industrial
Farming](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SxRrG4NHSQlNQsIlfjB_GnIXfDrdZbpbIeCxyNtsad0iime6RfCCkcM6R4tQA9LQzsbgr0WzrHlg2ljCjxgs9h2JWGzskKo4m12OHOTNIAPCJ7I7Vu4RzGkZnSnQsgliQ1sw9P0pEJfGgIyra-Je2EwI8H9LXwmhibW5EpIp8fTKGHwicjugd3Szwkdcm_1Zhs3HuIn_K6B8=&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

[The Laconia Daily Sun: Our Creator Never Granted Unalienable Rights to
Corporations](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEfff9QduyM6SJm6XZv2E5s4jzlVdTmvE__XjCXpVLLVV0Z1qIv3-0_jBg56dmJU6vhCTL0WyfyiU-X4QuE9zVxZWC6qYCYEzkTOEsSk91_qDTBYC4TdExN_R6NhVyAH01fpYGmiwrh2h4ztu-3T33GQ54E1VUkBeld5yahoMjX1pNHDWh1flEg8H13Rz3x7gTl0fBBhG9Mv2_DdFkXg46ybMiRSYCNZlyqKSE1p_PBda&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

Want the latest News & Analysis from and about the Community Rights
Movement? [CLICK
HERE](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEYH21mRwFvV3TRz9HmctThGycE2ZdHaxP2ImmN2tc_okzAMs_8ssqlGgnMGEB5VzCzQI03KpsmVZJokCchh5RBNo51-1cYv5tg==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)

Want to access all archived articles? [CLICK
HERE](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBES2kpyMahtqolsb4X8u4gYZbNDkgJ3RWwjv41rEY9ga3Cu-91CvQeH8_qZnNb-Ti48Ohy5PVJye7tAelwIbtNmVqNSaiO16yBqDabEArQ0MOUGzP3nkgeV8nziU2DF4ESQ==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)
Check out the latest in Community Rights!
[‌](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBERDgvWZsStPoj4Z8tnrfjvq9xvy7mHJ0TzZ_49fa5jlMBhjqwyFrNrYwnaAi9ngG7nhKLxh1CBeNRh4tDI4EMakPgtEgiwNch3nd6htSnVLxY_dMjbvJvIo=&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)
[‌](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBERDgvWZsStPo7OrpEBiffceqzlB5MoK3D4nk5K1nuKAPzxgdUkuqFvx2MaHZDkTF4JPEpnRRCn_KvWm7-xnrokymjXb00rJ4qQ==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)
[‌](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBERDgvWZsStPoiynSYmz2kEJeuNVDAB_PTAOko2edcYY3JxuuRweopTwfTg_WsL1jgJtVsmhQ7DVw80UWVyiWfOsOyZa0NnbjEP2Rxf4NXxnitVuGEsBq86M=&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)
[Yes! I want to DONATE
now](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBES2kpyMahtqo0l_KkeuVNB0aYX1GTDAbCz-kakeAKmaWxKidpc4R1_nRCkbbmvGFXZb9ve35pn704cpsNOA1kv7PeM9vFLt6twe3N_zyS8hIvHS86gViv9_jWdk3IeNpdu83jaKCtQjv&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==)
Did a friend forward you this email? [Join our mailing
list](http://r20.rs6.net/tn.jsp?f=001viS32bI1pE9SN0zn_EfECxpprCB8r9kxebMM0JFIKAUDg8glMsTBEeSGfQXfiZtn6ktIkxJyXflFaG4ydPRu4QQEROpA2bSnxMsxkP_ULCxiIgJXRaUXgyARBgheyei6CEfoRkO6cnt-_9iRjVXCougNim-4U3sJK2p21rfKbGl6e5gqG62ol6A1U5MlJGY7nRj1JhYA0MT15m6k_akr9o04vScbHIiy8NQ1yMWkvgvmcRkBjgYrFyGFn4D7hMen8PAWj3ZsnNKnpStWhO8ZsT7pzP1_8k6gdix2SVzjd3z_U37PjA_nKpahOe_LxafGjeQL5rKeOjql56S6kbLrZw==&c=0lRYX78uFhH7s5PQsTm0PpR0tB6VzJhscEF8ND6vckryEMek8cuJJA==&ch=Yl2hVU3YAAMSkn0_DKrMCNfLIQYbSTBsMgeedJkmuACvxYALEo5fyw==),
and join the Community Rights US momentum.

Please also forward this to family, friends, and colleagues who may be
interested. Thanks! Community Rights US 
| PO Box 86605, Portland, OR
97286 [Unsubscribe
cienfuegos
@Nounce.com](https://visitor.constantcontact.com/do?p=un&m=001j7y8Yp2rsZjj1lBi5PMqGg%3D&ch=15a7a8a0-aca5-11e7-bf5a-d4ae5292c47d&ca=5a9e5fd3-18f9-47f5-9245-d4a323b2e57a)
[Update
Profile](https://visitor.constantcontact.com/do?p=oo&m=001j7y8Yp2rsZjj1lBi5PMqGg%3D&ch=15a7a8a0-aca5-11e7-bf5a-d4ae5292c47d&ca=5a9e5fd3-18f9-47f5-9245-d4a323b2e57a)

| [About our service
provider](http://www.constantcontact.com/legal/service-provider?cc=about-service-provider)
Sent by <contact-/6JGXy0y6WMHso8PZWHMfZyrNiDsoR7y@public.gmane.org>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b79653fb-8f2d-07a9-03da-09184a4742ee%40reagle.org.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: converting ("bad"?) html to markdown; want: simple results as in 'w3m -dump'
       [not found]             ` <b79653fb-8f2d-07a9-03da-09184a4742ee-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
@ 2018-07-27 20:17               ` pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ
       [not found]                 ` <201807272017.w6RKHDkU023593-Iacv5gYTstuYo1hQQC0LMg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ @ 2018-07-27 20:17 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hi Joseph:

On Thu 7/26/18 16:43 -0400 Joseph Reagle wrote:
>On 7/26/18 11:43 AM, pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ@public.gmane.org wrote:
>> If the source is "garbage" or poorly written, I would like to give
>> the author constructive criticism, but I'm not knowledgable enough -
>> what should I say to them?
>
>It looks like it's being written in some WYSWIG editor that
>cares nothing for semantic structure, so there's not much to
>recommend. It's all just divs and spans with tons of formatting
>attributes.

Thanks, sad, but that makes sense. 
Possibly some WYSWIG tool from constantcontact.com, who knows?

--snip
>This is the best I can get it but it ain't easy. Using wget, python
>(with bleach library), pandoc, and perl!
>
>wget -O - https://ptpb.pw/i_Fw | python3 -c "import sys, bleach; print(bleach.clean(''.join(sys.stdin), tags=['a', 'br', 'html', 'span', 'style'], attributes=['href'], styles=[], strip=True))" | pandoc -f html -t markdown | perl -pe 's/\\/\n/g'
--snip

Nice!  I have python3, but 'pip install bleach' installs for python2... (I'm more perl guy than pathon, ... never mind).

Again, thank you for digging into this!  I may come back and get your
pipeline to work for myself, as imperfect as it is, it is interesting.

--
regards,
Tom


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: converting ("bad"?) html to markdown; want: simple results as in 'w3m -dump'
       [not found]                 ` <201807272017.w6RKHDkU023593-Iacv5gYTstuYo1hQQC0LMg@public.gmane.org>
@ 2018-07-30 13:37                   ` BP Jonsson
  0 siblings, 0 replies; 6+ messages in thread
From: BP Jonsson @ 2018-07-30 13:37 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw,
	pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ

Den 2018-07-27 kl. 22:17, skrev pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ@public.gmane.org:
> Nice!  I have python3, but 'pip install bleach' installs for python2... (I'm more perl guy than pathon, ... never mind).

I believe it's `pip3 install bleach`, but I could be wrong as I'm 
just another Perl guy... :-)

/bpj


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-07-30 13:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-26 11:20 converting html to markdown; want: simple results as in 'w3m -dump' pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ
     [not found] ` <201807261120.w6QBKd5l002412-Iacv5gYTstuYo1hQQC0LMg@public.gmane.org>
2018-07-26 13:05   ` Joseph Reagle
     [not found]     ` <62483e07-d68d-1460-3260-36fbd9c3f45b-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2018-07-26 15:43       ` converting ("bad"?) " pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ
     [not found]         ` <201807261543.w6QFhPHi027357-Iacv5gYTstuYo1hQQC0LMg@public.gmane.org>
2018-07-26 20:43           ` Joseph Reagle
     [not found]             ` <b79653fb-8f2d-07a9-03da-09184a4742ee-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2018-07-27 20:17               ` pcolabgooglegroups-hYYWqA7vhVBBDgjK7y7TUQ
     [not found]                 ` <201807272017.w6RKHDkU023593-Iacv5gYTstuYo1hQQC0LMg@public.gmane.org>
2018-07-30 13:37                   ` BP Jonsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).