public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* To Convert Raw Tables to Pandoc tables
@ 2015-07-01 20:46 sami.losoi-Re5JQEeQqe8AvxtiuMwx3w
       [not found] ` <6aa384be-0f9d-40fe-ad7f-32af347f8cb4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: sami.losoi-Re5JQEeQqe8AvxtiuMwx3w @ 2015-07-01 20:46 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 903 bytes --]

Dear All, 

I am trying to find a standard tool to convert raw tables to Pandoc tables. 
There may exists some combination of flags in pandoc how to do this. 
I opened a thread about this 
here http://tex.stackexchange.com/q/253224/13173

How can you combine pandoc flags to get Pandoc tables from raw tables?

Best regards,

Sami

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6aa384be-0f9d-40fe-ad7f-32af347f8cb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1450 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: To Convert Raw Tables to Pandoc tables
       [not found] ` <6aa384be-0f9d-40fe-ad7f-32af347f8cb4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-07-01 21:11   ` Daniel Staal
  2015-07-01 21:39     ` sami.losoi-Re5JQEeQqe8AvxtiuMwx3w
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Staal @ 2015-07-01 21:11 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

--As of July 1, 2015 1:46:39 PM -0700, sami.losoi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org is alleged to 
have said:

> I am trying to find a standard tool to convert raw tables to Pandoc
> tables.
> There may exists some combination of flags in pandoc how to do this.
> I opened a thread about this here
> http://tex.stackexchange.com/q/253224/13173
>
>
> How can you combine pandoc flags to get Pandoc tables from raw tables?

--As for the rest, it is mine.

You'll need to define 'raw tables' someplace, as it's not a format I've 
seen around anywhere.  (Unless that sample is tab-delimited.)  Is it is, it 
would be near-impossible for a computer to distinguish that from a normal 
paragraph.

That said, it would not be hard to write a script to mirror the first line 
onto the second, replacing all non-space characters with dashes, which 
would allow it to be read as Pandoc's 'simple table' type.  (And you could 
put a fixed-width dashed line at the start and the end to convert to 
Pandoc's normal 'multiline tables'.

The question I really want to ask though is where are you getting this data 
from?  If it's coming from some standard tool there may be a better way to 
handle this than to blindly separate on whitespace.  (In fact, there may 
well be tools in existence.)  Your question actually trips most of my flags 
for an 'XY Problem' - I'd be interested in what you are actually trying to 
do, to see if there's a better overall solution.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: To Convert Raw Tables to Pandoc tables
  2015-07-01 21:11   ` Daniel Staal
@ 2015-07-01 21:39     ` sami.losoi-Re5JQEeQqe8AvxtiuMwx3w
       [not found]       ` <3040a90a-0e7f-498b-a513-80970ea8d1d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: sami.losoi-Re5JQEeQqe8AvxtiuMwx3w @ 2015-07-01 21:39 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 2894 bytes --]

 

Dear Staal, 


This data is coming from AWK. 

I am actually starting processing initially from Pandoc data > raw data > 
computation > back to Pandoc data.
Field separator is space.

Best regards,


Sami

On Thursday, 2 July 2015 00:11:16 UTC+3, Daniel Staal wrote:
>
> --As of July 1, 2015 1:46:39 PM -0700, sami....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:> 
> is alleged to 
> have said: 
>
> > I am trying to find a standard tool to convert raw tables to Pandoc 
> > tables. 
> > There may exists some combination of flags in pandoc how to do this. 
> > I opened a thread about this here 
> > http://tex.stackexchange.com/q/253224/13173 
> > 
> > 
> > How can you combine pandoc flags to get Pandoc tables from raw tables? 
>
> --As for the rest, it is mine. 
>
> You'll need to define 'raw tables' someplace, as it's not a format I've 
> seen around anywhere.  (Unless that sample is tab-delimited.)  Is it is, 
> it 
> would be near-impossible for a computer to distinguish that from a normal 
> paragraph. 
>
> That said, it would not be hard to write a script to mirror the first line 
> onto the second, replacing all non-space characters with dashes, which 
> would allow it to be read as Pandoc's 'simple table' type.  (And you could 
> put a fixed-width dashed line at the start and the end to convert to 
> Pandoc's normal 'multiline tables'. 
>
> The question I really want to ask though is where are you getting this 
> data 
> from?  If it's coming from some standard tool there may be a better way to 
> handle this than to blindly separate on whitespace.  (In fact, there may 
> well be tools in existence.)  Your question actually trips most of my 
> flags 
> for an 'XY Problem' - I'd be interested in what you are actually trying to 
> do, to see if there's a better overall solution. 
>
> Daniel T. Staal 
>
> --------------------------------------------------------------- 
> This email copyright the author.  Unless otherwise noted, you 
> are expressly allowed to retransmit, quote, or otherwise use 
> the contents for non-commercial purposes.  This copyright will 
> expire 5 years after the author's death, or in 30 years, 
> whichever is longer, unless such a period is in excess of 
> local copyright law. 
> --------------------------------------------------------------- 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3040a90a-0e7f-498b-a513-80970ea8d1d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4623 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: To Convert Raw Tables to Pandoc tables
       [not found]       ` <3040a90a-0e7f-498b-a513-80970ea8d1d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-07-01 22:03         ` Daniel Staal
  2015-07-02  6:49           ` Sami Losoi
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Staal @ 2015-07-01 22:03 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

--As of July 1, 2015 2:39:56 PM -0700, sami.losoi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org is alleged to 
have said:

> This data is coming from AWK.
>
> I am actually starting processing initially from Pandoc data > raw data >
> computation > back to Pandoc data. Field separator is space.

--As for the rest, it is mine.

Ok, so your problem is that you are throwing away the table structure to do 
some computation, and now you want it back into a table.  (Well, assuming 
your real input and output is Pandoc data...)  I'm not an awk/sed expert, 
but it should be possible to get them to insert a line of text after the 
first line.  (I could probably do it with a bit of Perl, but I think the 
*correct* solution is to go from Pandoc data to Pandoc data; have the 
computation done in the same script that reads the initial data, so that 
you aren't trying to do multiple data format conversions.)

It'd probably be easiest to change the separator as you go, and write out 
pipe tables.  It reduces the problem to changing the separator to pipes and 
adding a line (with the same separators) of dashes after the first line.  I 
know awk or column can change the separators for you fairly easily.

As a side note, hearing that the field separator is a space worries me - 
it's *way* to easy to get an extra space in some future input, which would 
then cascade a bug through your whole setup.  Pandoc's table formats all 
handle this in different ways; pipe tables would be the easiest to parse.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: To Convert Raw Tables to Pandoc tables
  2015-07-01 22:03         ` Daniel Staal
@ 2015-07-02  6:49           ` Sami Losoi
       [not found]             ` <28067F1C-7187-4142-A22C-A056B2B4BD8D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Sami Losoi @ 2015-07-02  6:49 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Dear Staal, 

I changed the separator to be a pipe, instead of a space. 
Previously, I did `pandoc sandbox.data -f markdown -t html` with markdown 
tables, as described here by Kurt's excellent answer:
http://tex.stackexchange.com/a/247003/13173

Now, the table format looks like this

------------------------
Size     | File  |  EventSize 
------------------------
L805067  |  009  |  L805+4 

L805067  |  001  |  L805+4
------------------------


which looks like markdown with the field separator, pipe. 
I did not find any field separator flag in pandoc. 

How can you get this into good html with fields detected by Pandoc? 

Best regards,

Sami






On 02/07/15 01:03, "Daniel Staal" <DStaal-Jdbf3xiKgS8@public.gmane.org> wrote:

>--As of July 1, 2015 2:39:56 PM -0700, sami.losoi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org is alleged to 
>have said:
>
>> This data is coming from AWK.
>>
>> I am actually starting processing initially from Pandoc data > raw data 
>>>
>> computation > back to Pandoc data. Field separator is space.
>
>--As for the rest, it is mine.
>
>Ok, so your problem is that you are throwing away the table structure to 
>do 
>some computation, and now you want it back into a table.  (Well, assuming 
>your real input and output is Pandoc data...)  I'm not an awk/sed expert, 
>but it should be possible to get them to insert a line of text after the 
>first line.  (I could probably do it with a bit of Perl, but I think the 
>*correct* solution is to go from Pandoc data to Pandoc data; have the 
>computation done in the same script that reads the initial data, so that 
>you aren't trying to do multiple data format conversions.)
>
>It'd probably be easiest to change the separator as you go, and write out 
>pipe tables.  It reduces the problem to changing the separator to pipes 
>and 
>adding a line (with the same separators) of dashes after the first line.  
>I 
>know awk or column can change the separators for you fairly easily.
>
>As a side note, hearing that the field separator is a space worries me - 
>it's *way* to easy to get an extra space in some future input, which 
>would 
>then cascade a bug through your whole setup.  Pandoc's table formats all 
>handle this in different ways; pipe tables would be the easiest to parse.
>
>Daniel T. Staal
>
>---------------------------------------------------------------
>This email copyright the author.  Unless otherwise noted, you
>are expressly allowed to retransmit, quote, or otherwise use
>the contents for non-commercial purposes.  This copyright will
>expire 5 years after the author's death, or in 30 years,
>whichever is longer, unless such a period is in excess of
>local copyright law.
>---------------------------------------------------------------
>
>-- 
>You received this message because you are subscribed to a topic in the 
>Google Groups "pandoc-discuss" group.
>To unsubscribe from this topic, visit 
>https://groups.google.com/d/topic/pandoc-discuss/aCoZRR7IZtg/unsubscribe.
>To unsubscribe from this group and all its topics, send an email to 
>pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>To view this discussion on the web visit 
>https://groups.google.com/d/msgid/pandoc-discuss/09A2F737C3287A281D3D2957%
>40%5B192.168.1.50%5D.
>For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: To Convert Raw Tables to Pandoc tables
       [not found]             ` <28067F1C-7187-4142-A22C-A056B2B4BD8D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-07-02 17:11               ` Daniel Staal
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Staal @ 2015-07-02 17:11 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

--As of July 2, 2015 9:49:51 AM +0300, Sami Losoi is alleged to have said:

> Now, the table format looks like this
>
> ------------------------
> Size     | File  |  EventSize
> ------------------------
> L805067  |  009  |  L805+4
>
> L805067  |  001  |  L805+4
> ------------------------
>
>
> which looks like markdown with the field separator, pipe.
> I did not find any field separator flag in pandoc.
>
> How can you get this into good html with fields detected by Pandoc?

--As for the rest, it is mine.

Ok, that's *close* to Pandoc's pipe tables, but not quite.  Here's a pipe 
table version:

Size     | File  |  EventSize
---------|-------|-----------
L805067  |  009  |  L805+4
L805067  |  001  |  L805+4

The lines above and below the table don't hurt - but they don't get 
translated into the table either.  They become `<hr />`, or a horizontal 
rule.  The important part is that the second line has as many pipes as the 
other lines, separated by dashes.  Note that how many dashes separate them 
is actually irrelevant - one is all you need.  But the pipes have to be 
there.  Pipes at the beginning or end are optional.  Therefore, this also 
works:

Size     | File  |  EventSize
|-|-|-
L805067  |  009  |  L805+4
L805067  |  001  |  L805+4


Your sample also isn't far from Pandoc's grid tables mode.  A version of 
that:

+---------+-------+-------------+
|Size     | File  |  EventSize  |
+=========+=======+=============+
|L805067  |  009  |  L805+4     |
+---------+-------+-------------+
|L805067  |  001  |  L805+4     |
+---------+-------+-------------+

Note that this format is a quite a bit more picky: Everything has to 
actually line up for Pandoc to read it correctly.  However it does allow a 
bit more flexibility for what's inside the cells.  (Not that I think you 
need it for this case.)

Your `pandoc sandbox.data -f markdown -t html` should work with either of 
those without problems, although you could include them explicitly if you 
wanted.  (I tested: You don't need to.)

Basically, for this whole discussion: Pandoc tables need some way to 
determine that they are a table and how many cells you have.  Pure 
whitespace doesn't work, because people sometimes type funny. ;)  Looking 
into the same line as text doesn't work, for the same reason.  Pandoc has 
several solutions it allows, but *all* of them look for a line of dashes 
with some sort of separator marking how many columns there are.  But that 
separator does need to be on that line.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-02 17:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-01 20:46 To Convert Raw Tables to Pandoc tables sami.losoi-Re5JQEeQqe8AvxtiuMwx3w
     [not found] ` <6aa384be-0f9d-40fe-ad7f-32af347f8cb4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-07-01 21:11   ` Daniel Staal
2015-07-01 21:39     ` sami.losoi-Re5JQEeQqe8AvxtiuMwx3w
     [not found]       ` <3040a90a-0e7f-498b-a513-80970ea8d1d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-07-01 22:03         ` Daniel Staal
2015-07-02  6:49           ` Sami Losoi
     [not found]             ` <28067F1C-7187-4142-A22C-A056B2B4BD8D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-02 17:11               ` Daniel Staal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).