public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
@ 2015-08-24 14:01 kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
       [not found] ` <97277b03-f86a-4120-a07d-eecbf08a17e3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg @ 2015-08-24 14:01 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1409 bytes --]



When I try to create a self-contained HTML document which uses 
*bootstrap.min.css*, I encounter the following error:

        pandoc: Cannot decode byte '\xa1': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

My command line includes --to html --self-contained --css 
http://cups.org/css/bootstrap.min.css. 

I’m unable to locate the position of the bye '\xa1' (because — quite untrue 
to its name! — this CSS is almost 1 MByte in filesize!

The problem occurs with any, even the most minimal Markdown input. Changing 
the command to --standalone does get rid of the problem (as is to be 
expected from the symptoms).

   - Is this a problem with the specific CSS? 
   - Is it a problem with all CSS which are derived from bootstrap.css? 
   - Or is this a bug in Pandoc? 

​

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/97277b03-f86a-4120-a07d-eecbf08a17e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5701 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found] ` <97277b03-f86a-4120-a07d-eecbf08a17e3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-08-24 16:56   ` BPJ
  2015-08-24 17:06   ` John MACFARLANE
  2017-01-05  0:41   ` Grady D
  2 siblings, 0 replies; 17+ messages in thread
From: BPJ @ 2015-08-24 16:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2784 bytes --]

It probably means that you are getting Latin1 or cp1252 encoding instead of
UTF-8 somewhere along the chain. If you have access to a *nix system, what
does the 'file' command say about the file/data you are giving to pandoc?

Let's hope that you are not getting corrupt data with a mixture of UTF-8
and a single-byte encoding!

måndag 24 augusti 2015 skrev <kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>:

> When I try to create a self-contained HTML document which uses
> *bootstrap.min.css*, I encounter the following error:
>
>         pandoc: Cannot decode byte '\xa1': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
>
> My command line includes --to html --self-contained --css
> http://cups.org/css/bootstrap.min.css.
>
> I’m unable to locate the position of the bye '\xa1' (because — quite
> untrue to its name! — this CSS is almost 1 MByte in filesize!
>
> The problem occurs with any, even the most minimal Markdown input.
> Changing the command to --standalone does get rid of the problem (as is
> to be expected from the symptoms).
>
>    - Is this a problem with the specific CSS?
>    - Is it a problem with all CSS which are derived from bootstrap.css?
>    - Or is this a bug in Pandoc?
>
> ​
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> <javascript:_e(%7B%7D,'cvml','pandoc-discuss%2Bunsubscribe@googlegroups.com');>
> .
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> <javascript:_e(%7B%7D,'cvml','pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/97277b03-f86a-4120-a07d-eecbf08a17e3%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/97277b03-f86a-4120-a07d-eecbf08a17e3%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCDvSp6vfJ%2B2%2BeRiNpvRvAFdgNWZMMCRKvrgkNNwDDGzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 7366 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found] ` <97277b03-f86a-4120-a07d-eecbf08a17e3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2015-08-24 16:56   ` BPJ
@ 2015-08-24 17:06   ` John MACFARLANE
       [not found]     ` <20150824170637.GA45262-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org>
  2017-01-05  0:41   ` Grady D
  2 siblings, 1 reply; 17+ messages in thread
From: John MACFARLANE @ 2015-08-24 17:06 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

The CSS file is 109,518 bytes, and contains only ASCII
characters.

This is the portion that causes the problem (confirmed by
removing it):

```
@font-face{font-family:'Glyphicons Halflings';src:url(../fonts/glyphicons-halfl\
ings-regular.eot);src:url(../fonts/glyphicons-halflings-regular.eot?#iefix) for\
mat('embedded-opentype'),url(../fonts/glyphicons-halflings-regular.woff) format\
('woff'),url(../fonts/glyphicons-halflings-regular.ttf) format('truetype'),url(\
../fonts/glyphicons-halflings-regular.svg#glyphicons_halflingsregular) format('\
svg')}
```

I expect, then, that the problem is the following:  pandoc is expecting
the source files (under `src`) to be UTF-8 encoded, and this is not the
case for all of these font files.  This needs further investigation.

+++ kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org [Aug 24 15 07:01 ]:
>   When I try to create a self-contained HTML document which uses
>   bootstrap.min.css, I encounter the following error:
>        pandoc: Cannot decode byte '\xa1': Data.Text.Internal.Encoding.Fusion.st
>reamUtf8: Invalid UTF-8 stream
>
>   My command line includes --to html --self-contained --css
>   http://cups.org/css/bootstrap.min.css.
>
>   I’m unable to locate the position of the bye '\xa1' (because — quite
>   untrue to its name! — this CSS is almost 1 MByte in filesize!
>
>   The problem occurs with any, even the most minimal Markdown input.
>   Changing the command to --standalone does get rid of the problem (as is
>   to be expected from the symptoms).
>     * Is this a problem with the specific CSS?
>     * Is it a problem with all CSS which are derived from bootstrap.css?
>     * Or is this a bug in Pandoc?
>
>   ​
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/97277b03-f86a-4120-
>   a07d-eecbf08a17e3%40googlegroups.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/97277b03-f86a-4120-a07d-eecbf08a17e3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20150824170637.GA45262%40D25Q40BGFY13.Berkeley.EDU.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]     ` <20150824170637.GA45262-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org>
@ 2015-08-25  7:26       ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
       [not found]         ` <a1a5ced8-ce61-4091-ab33-8ea9d200343c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg @ 2015-08-25  7:26 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3968 bytes --]

Hi John,

I'm curious (and interested to learn:

Am Montag, 24. August 2015 19:06:52 UTC+2 schrieb John MacFarlane:
>
> The CSS file is 109,518 bytes, and contains only ASCII 
> characters. 
>
> This is the portion that causes the problem (confirmed by 
> removing it): 
>

How exactly did you narrow down to that section?

Did you use Pandoc running in some kind of "debugging mode" which told you 
at which line(s) of input(s) the problem(s) occured?
Did you convert the input to "native" and used that for analyzing?
 

>
> ``` 
> @font-face{font-family:'Glyphicons 
> Halflings';src:url(../fonts/glyphicons-halfl\ 
> ings-regular.eot);src:url(../fonts/glyphicons-halflings-regular.eot?#iefix) 
> for\ 
> mat('embedded-opentype'),url(../fonts/glyphicons-halflings-regular.woff) 
> format\ 
> ('woff'),url(../fonts/glyphicons-halflings-regular.ttf) 
> format('truetype'),url(\ 
> ../fonts/glyphicons-halflings-regular.svg#glyphicons_halflingsregular) 
> format('\ 
> svg')} 
> ``` 
>
> I expect, then, that the problem is the following:  pandoc is expecting 
> the source files (under `src`) to be UTF-8 encoded, and this is not the 
> case for all of these font files.  This needs further investigation. 
>
> +++ kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org <javascript:> [Aug 24 15 07:01 ]: 
> >   When I try to create a self-contained HTML document which uses 
> >   bootstrap.min.css, I encounter the following error: 
> >        pandoc: Cannot decode byte '\xa1': 
> Data.Text.Internal.Encoding.Fusion.st 
> >reamUtf8: Invalid UTF-8 stream 
> > 
> >   My command line includes --to html --self-contained --css 
> >   http://cups.org/css/bootstrap.min.css. 
> > 
> >   I’m unable to locate the position of the bye '\xa1' (because — quite 
> >   untrue to its name! — this CSS is almost 1 MByte in filesize! 
> > 
> >   The problem occurs with any, even the most minimal Markdown input. 
> >   Changing the command to --standalone does get rid of the problem (as 
> is 
> >   to be expected from the symptoms). 
> >     * Is this a problem with the specific CSS? 
> >     * Is it a problem with all CSS which are derived from bootstrap.css? 
> >     * Or is this a bug in Pandoc? 
> > 
> >   ​ 
> > 
> >   -- 
> >   You received this message because you are subscribed to the Google 
> >   Groups "pandoc-discuss" group. 
> >   To unsubscribe from this group and stop receiving emails from it, send 
> >   an email to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To post to this group, send email to 
> >   [2]pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To view this discussion on the web visit 
> >   [3]
> https://groups.google.com/d/msgid/pandoc-discuss/97277b03-f86a-4120- 
> >   a07d-eecbf08a17e3%40googlegroups.com. 
> >   For more options, visit [4]https://groups.google.com/d/optout. 
> > 
> >References 
> > 
> >   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   2. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   3. 
> https://groups.google.com/d/msgid/pandoc-discuss/97277b03-f86a-4120-a07d-eecbf08a17e3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer 
> >   4. https://groups.google.com/d/optout 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a1a5ced8-ce61-4091-ab33-8ea9d200343c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 8760 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]         ` <a1a5ced8-ce61-4091-ab33-8ea9d200343c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-08-25 20:16           ` John MACFARLANE
       [not found]             ` <20150825201606.GC98439-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: John MACFARLANE @ 2015-08-25 20:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org [Aug 25 15 00:26 ]:
>   Hi John,
>   I'm curious (and interested to learn:
>   Am Montag, 24. August 2015 19:06:52 UTC+2 schrieb John MacFarlane:
>
>     The CSS file is 109,518 bytes, and contains only ASCII
>     characters.
>     This is the portion that causes the problem (confirmed by
>     removing it):
>
>   How exactly did you narrow down to that section?
>   Did you use Pandoc running in some kind of "debugging mode" which told
>   you at which line(s) of input(s) the problem(s) occured?

I downloaded the CSS file.  I used 'file' to determine that
it was all ASCII, and verified this in Haskell ghci.
So I figured that any non-ascii character would have to
be coming in from a url referenced in the CSS file.
I then opened the CSS file in an editor and searched for
'url'.

To verify that this was the issue, I just took out that
declaration and tried again.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]             ` <20150825201606.GC98439-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org>
@ 2015-08-25 20:20               ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
       [not found]                 ` <a330fdd1-22fa-491c-b6b0-4888ddbfb68a-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg @ 2015-08-25 20:20 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1718 bytes --]

Thanks for this explanation!

I appreciate this very much.  :-)


Am Dienstag, 25. August 2015 22:16:24 UTC+2 schrieb John MacFarlane:
>
> +++ kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org <javascript:> [Aug 25 15 00:26 ]: 
> >   Hi John, 
> >   I'm curious (and interested to learn: 
> >   Am Montag, 24. August 2015 19:06:52 UTC+2 schrieb John MacFarlane: 
> > 
> >     The CSS file is 109,518 bytes, and contains only ASCII 
> >     characters. 
> >     This is the portion that causes the problem (confirmed by 
> >     removing it): 
> > 
> >   How exactly did you narrow down to that section? 
> >   Did you use Pandoc running in some kind of "debugging mode" which told 
> >   you at which line(s) of input(s) the problem(s) occured? 
>
> I downloaded the CSS file.  I used 'file' to determine that 
> it was all ASCII, and verified this in Haskell ghci. 
> So I figured that any non-ascii character would have to 
> be coming in from a url referenced in the CSS file. 
> I then opened the CSS file in an editor and searched for 
> 'url'. 
>
> To verify that this was the issue, I just took out that 
> declaration and tried again. 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a330fdd1-22fa-491c-b6b0-4888ddbfb68a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2621 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]                 ` <a330fdd1-22fa-491c-b6b0-4888ddbfb68a-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-08-27  7:47                   ` Melroch
       [not found]                     ` <CADAJKhDqj+g2Kot-gqOq4osi6VOb7U8+tdZ4Q7Ny1+TGF8hoOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Melroch @ 2015-08-27  7:47 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2735 bytes --]

I'm curious too. I guess that when creating a standalone document pandoc
needs to care about linked files, but shouldn't then font files and such be
treated as binary files?
Den 25 aug 2015 22:20 skrev <kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>:

> Thanks for this explanation!
>
> I appreciate this very much.  :-)
>
>
> Am Dienstag, 25. August 2015 22:16:24 UTC+2 schrieb John MacFarlane:
>>
>> +++ kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org [Aug 25 15 00:26 ]:
>> >   Hi John,
>> >   I'm curious (and interested to learn:
>> >   Am Montag, 24. August 2015 19:06:52 UTC+2 schrieb John MacFarlane:
>> >
>> >     The CSS file is 109,518 bytes, and contains only ASCII
>> >     characters.
>> >     This is the portion that causes the problem (confirmed by
>> >     removing it):
>> >
>> >   How exactly did you narrow down to that section?
>> >   Did you use Pandoc running in some kind of "debugging mode" which
>> told
>> >   you at which line(s) of input(s) the problem(s) occured?
>>
>> I downloaded the CSS file.  I used 'file' to determine that
>> it was all ASCII, and verified this in Haskell ghci.
>> So I figured that any non-ascii character would have to
>> be coming in from a url referenced in the CSS file.
>> I then opened the CSS file in an editor and searched for
>> 'url'.
>>
>> To verify that this was the issue, I just took out that
>> declaration and tried again.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/a330fdd1-22fa-491c-b6b0-4888ddbfb68a%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/a330fdd1-22fa-491c-b6b0-4888ddbfb68a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDqj%2Bg2Kot-gqOq4osi6VOb7U8%2BtdZ4Q7Ny1%2BTGF8hoOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 4021 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]                     ` <CADAJKhDqj+g2Kot-gqOq4osi6VOb7U8+tdZ4Q7Ny1+TGF8hoOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-08-27 19:24                       ` John MACFARLANE
       [not found]                         ` <20150827192406.GC66925-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: John MACFARLANE @ 2015-08-27 19:24 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Melroch [Aug 27 15 09:47 ]:
>   I'm curious too. I guess that when creating a standalone document
>   pandoc needs to care about linked files, but shouldn't then font files
>   and such be treated as binary files?

Yes, I think that's right.  The current treatment for `src`
just assumed that src would be a text file, but this shows
that's not so.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]                         ` <20150827192406.GC66925-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org>
@ 2016-02-03 19:35                           ` Alex Palecek
       [not found]                             ` <fae36051-8a58-492b-a8f2-5f8f0ff7f9d2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Palecek @ 2016-02-03 19:35 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1139 bytes --]

I'm receiving a similar error message upon trying to use a reference-docx 
on OS X. 

pandoc: Cannot decode byte '\xa1': Data.Text.Internal.Encoding.Fusion.
streamUtf8: Invalid UTF-8 stream

As suggested in the docs, I've used a base docx generated by pandoc, and 
then modified it using Word for Mac 2011 (v. 14.4.8). I've also tried it a 
few other ways-- exporting an existing styled doc from RTF to docx using 
Nisus, importing the styled RTF from Nisus into Word... 

When I remove the --reference-docx argument when invoking pandoc, 
everything processes fine.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-492b-a8f2-5f8f0ff7f9d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3536 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]                             ` <fae36051-8a58-492b-a8f2-5f8f0ff7f9d2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-02-03 21:26                               ` John MACFARLANE
       [not found]                                 ` <20160203212640.GA43660-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: John MACFARLANE @ 2016-02-03 21:26 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Can you give the full command line you're using?

+++ Alex Palecek [Feb 03 16 11:35 ]:
>   I'm receiving a similar error message upon trying to use a
>   reference-docx on OS X.
>   pandoc: Cannot decode byte '\xa1':
>   Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
>   As suggested in the docs, I've used a base docx generated by pandoc,
>   and then modified it using Word for Mac 2011 (v. 14.4.8). I've also
>   tried it a few other ways-- exporting an existing styled doc from RTF
>   to docx using Nisus, importing the styled RTF from Nisus into Word...
>   When I remove the --reference-docx argument when invoking pandoc,
>   everything processes fine.
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-492b-
>   a8f2-5f8f0ff7f9d2%40googlegroups.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-492b-a8f2-5f8f0ff7f9d2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]                                 ` <20160203212640.GA43660-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
@ 2016-02-04  6:12                                   ` Alex Palecek
       [not found]                                     ` <CA+Vuhd1RgSdnxR-jGhMcLcuGi0FjuLGFA7Nm1gAwewZ2BR8s-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Palecek @ 2016-02-04  6:12 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3447 bytes --]

Sure!

pandoc -s -S --normalize --bibliography \
~/Documents/reference\ management/My\ Library.bib \
--csl ~/Documents/reference\ management/styles\ for\ pandoc/apa.csl \
--reference-docx= ~/Documents/reference\ management/styles\ for\
pandoc/apa-6th-example.docx \
-f markdown -t docx \
-o example-doc.docx example-doc.txt

On Wed, Feb 3, 2016 at 1:26 PM, John MACFARLANE <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

> Can you give the full command line you're using?
>
> +++ Alex Palecek [Feb 03 16 11:35 ]:
>
>>   I'm receiving a similar error message upon trying to use a
>>   reference-docx on OS X.
>>   pandoc: Cannot decode byte '\xa1':
>>   Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
>>   As suggested in the docs, I've used a base docx generated by pandoc,
>>   and then modified it using Word for Mac 2011 (v. 14.4.8). I've also
>>   tried it a few other ways-- exporting an existing styled doc from RTF
>>   to docx using Nisus, importing the styled RTF from Nisus into Word...
>>   When I remove the --reference-docx argument when invoking pandoc,
>>   everything processes fine.
>>
>>   --
>>   You received this message because you are subscribed to the Google
>>   Groups "pandoc-discuss" group.
>>   To unsubscribe from this group and stop receiving emails from it, send
>>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>   To post to this group, send email to
>>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>   To view this discussion on the web visit
>>   [3]https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-492b-
>>   a8f2-5f8f0ff7f9d2%40googlegroups.com.
>>   For more options, visit [4]https://groups.google.com/d/optout.
>>
>> References
>>
>>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>   3.
>> https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-492b-a8f2-5f8f0ff7f9d2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>>   4. https://groups.google.com/d/optout
>>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "pandoc-discuss" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/pandoc-discuss/KGE-0N4LxOo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/20160203212640.GA43660%40protagoras.berkeley.edu
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CA%2BVuhd1RgSdnxR-jGhMcLcuGi0FjuLGFA7Nm1gAwewZ2BR8s-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 6126 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]                                     ` <CA+Vuhd1RgSdnxR-jGhMcLcuGi0FjuLGFA7Nm1gAwewZ2BR8s-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-02-04 20:33                                       ` John MACFARLANE
       [not found]                                         ` <20160204203356.GB58697-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: John MACFARLANE @ 2016-02-04 20:33 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Try removing the space after --reference-docx=

If you use the =, the argument needs to come right after it.
Or you can leave off the = and just have a space.


+++ Alex Palecek [Feb 03 16 22:12 ]:
>   Sure!
>   pandoc -s -S --normalize --bibliography \
>   ~/Documents/reference\ management/My\ Library.bib \
>   --csl ~/Documents/reference\ management/styles\ for\ pandoc/apa.csl \
>   --reference-docx= ~/Documents/reference\ management/styles\ for\
>   pandoc/apa-6th-example.docx \
>   -f markdown -t docx \
>   -o example-doc.docx example-doc.txt
>   [cleardot.gif]
>
>   On Wed, Feb 3, 2016 at 1:26 PM, John MACFARLANE <[1]jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
>   wrote:
>
>     Can you give the full command line you're using?
>     +++ Alex Palecek [Feb 03 16 11:35 ]:
>
>       I'm receiving a similar error message upon trying to use a
>       reference-docx on OS X.
>       pandoc: Cannot decode byte '\xa1':
>       Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8
>     stream
>       As suggested in the docs, I've used a base docx generated by
>     pandoc,
>       and then modified it using Word for Mac 2011 (v. 14.4.8). I've
>     also
>       tried it a few other ways-- exporting an existing styled doc from
>     RTF
>       to docx using Nisus, importing the styled RTF from Nisus into
>     Word...
>       When I remove the --reference-docx argument when invoking pandoc,
>       everything processes fine.
>       --
>       You received this message because you are subscribed to the Google
>       Groups "pandoc-discuss" group.
>       To unsubscribe from this group and stop receiving emails from it,
>     send
>       an email to [1][2]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>       To post to this group, send email to
>       [2][3]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>       To view this discussion on the web visit
>
>     [3][4]https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58
>     -492b-
>       a8f2-5f8f0ff7f9d2%[5]40googlegroups.com.
>       For more options, visit [4][6]https://groups.google.com/d/optout.
>     References
>       1. mailto:[7]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>       2. mailto:[8]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>       3.
>     [9]https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-49
>     2b-a8f2-5f8f0ff7f9d2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=fo
>     oter
>       4. [10]https://groups.google.com/d/optout
>
>     --
>     You received this message because you are subscribed to a topic in
>     the Google Groups "pandoc-discuss" group.
>     To unsubscribe from this topic, visit
>     [11]https://groups.google.com/d/topic/pandoc-discuss/KGE-0N4LxOo/uns
>     ubscribe.
>     To unsubscribe from this group and all its topics, send an email to
>     [12]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>     To post to this group, send email to
>     [13]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>     To view this discussion on the web visit
>     [14]https://groups.google.com/d/msgid/pandoc-discuss/20160203212640.
>     GA43660%40protagoras.berkeley.edu.
>
>   For more options, visit [15]https://groups.google.com/d/optout.
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [16]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [17]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [18]https://groups.google.com/d/msgid/pandoc-discuss/CA%2BVuhd1RgSdnxR-
>   jGhMcLcuGi0FjuLGFA7Nm1gAwewZ2BR8s-Q%40mail.gmail.com.
>   For more options, visit [19]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org
>   2. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   4. https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-492b-
>   5. http://40googlegroups.com/
>   6. https://groups.google.com/d/optout
>   7. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   8. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   9. https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-492b-a8f2-5f8f0ff7f9d2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>  10. https://groups.google.com/d/optout
>  11. https://groups.google.com/d/topic/pandoc-discuss/KGE-0N4LxOo/unsubscribe
>  12. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>  13. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>  14. https://groups.google.com/d/msgid/pandoc-discuss/20160203212640.GA43660-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org
>  15. https://groups.google.com/d/optout
>  16. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>  17. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>  18. https://groups.google.com/d/msgid/pandoc-discuss/CA+Vuhd1RgSdnxR-jGhMcLcuGi0FjuLGFA7Nm1gAwewZ2BR8s-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org?utm_medium=email&utm_source=footer
>  19. https://groups.google.com/d/optout


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]                                         ` <20160204203356.GB58697-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
@ 2016-02-04 23:12                                           ` Alex Palecek
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Palecek @ 2016-02-04 23:12 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 7073 bytes --]

Ah, silly me.

Thank you.

On Thu, Feb 4, 2016 at 12:33 PM, John MACFARLANE <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

> Try removing the space after --reference-docx=
>
> If you use the =, the argument needs to come right after it.
> Or you can leave off the = and just have a space.
>
>
> +++ Alex Palecek [Feb 03 16 22:12 ]:
>
>>   Sure!
>>   pandoc -s -S --normalize --bibliography \
>>   ~/Documents/reference\ management/My\ Library.bib \
>>   --csl ~/Documents/reference\ management/styles\ for\ pandoc/apa.csl \
>>   --reference-docx= ~/Documents/reference\ management/styles\ for\
>>   pandoc/apa-6th-example.docx \
>>   -f markdown -t docx \
>>   -o example-doc.docx example-doc.txt
>>   [cleardot.gif]
>>
>>   On Wed, Feb 3, 2016 at 1:26 PM, John MACFARLANE <[1]jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
>>   wrote:
>>
>>     Can you give the full command line you're using?
>>     +++ Alex Palecek [Feb 03 16 11:35 ]:
>>
>>       I'm receiving a similar error message upon trying to use a
>>       reference-docx on OS X.
>>       pandoc: Cannot decode byte '\xa1':
>>       Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8
>>     stream
>>       As suggested in the docs, I've used a base docx generated by
>>     pandoc,
>>       and then modified it using Word for Mac 2011 (v. 14.4.8). I've
>>     also
>>       tried it a few other ways-- exporting an existing styled doc from
>>     RTF
>>       to docx using Nisus, importing the styled RTF from Nisus into
>>     Word...
>>       When I remove the --reference-docx argument when invoking pandoc,
>>       everything processes fine.
>>       --
>>       You received this message because you are subscribed to the Google
>>       Groups "pandoc-discuss" group.
>>       To unsubscribe from this group and stop receiving emails from it,
>>     send
>>       an email to [1][2]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>       To post to this group, send email to
>>       [2][3]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>       To view this discussion on the web visit
>>
>>     [3][4]https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58
>>     -492b-
>>       a8f2-5f8f0ff7f9d2%[5]40googlegroups.com.
>>       For more options, visit [4][6]https://groups.google.com/d/optout.
>>     References
>>       1. mailto:[7]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>       2. mailto:[8]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>       3.
>>     [9]https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-49
>>     2b-a8f2-5f8f0ff7f9d2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=fo
>>     oter
>>       4. [10]https://groups.google.com/d/optout
>>
>>     --
>>     You received this message because you are subscribed to a topic in
>>     the Google Groups "pandoc-discuss" group.
>>     To unsubscribe from this topic, visit
>>     [11]https://groups.google.com/d/topic/pandoc-discuss/KGE-0N4LxOo/uns
>>     ubscribe.
>>     To unsubscribe from this group and all its topics, send an email to
>>     [12]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>     To post to this group, send email to
>>     [13]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>     To view this discussion on the web visit
>>     [14]https://groups.google.com/d/msgid/pandoc-discuss/20160203212640.
>>     GA43660%40protagoras.berkeley.edu.
>>
>>   For more options, visit [15]https://groups.google.com/d/optout.
>>
>>   --
>>   You received this message because you are subscribed to the Google
>>   Groups "pandoc-discuss" group.
>>   To unsubscribe from this group and stop receiving emails from it, send
>>   an email to [16]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>   To post to this group, send email to
>>   [17]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>   To view this discussion on the web visit
>>   [18]https://groups.google.com/d/msgid/pandoc-discuss/CA%2BVuhd1RgSdnxR-
>>   jGhMcLcuGi0FjuLGFA7Nm1gAwewZ2BR8s-Q%40mail.gmail.com.
>>   For more options, visit [19]https://groups.google.com/d/optout.
>>
>> References
>>
>>   1. mailto:jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org
>>   2. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>   3. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>   4. https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-492b-
>>   5. http://40googlegroups.com/
>>   6. https://groups.google.com/d/optout
>>   7. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>   8. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>   9.
>> https://groups.google.com/d/msgid/pandoc-discuss/fae36051-8a58-492b-a8f2-5f8f0ff7f9d2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>>  10. https://groups.google.com/d/optout
>>  11.
>> https://groups.google.com/d/topic/pandoc-discuss/KGE-0N4LxOo/unsubscribe
>>  12. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>  13. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>  14.
>> https://groups.google.com/d/msgid/pandoc-discuss/20160203212640.GA43660-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org
>>  15. https://groups.google.com/d/optout
>>  16. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>  17. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>  18.
>> https://groups.google.com/d/msgid/pandoc-discuss/CA+Vuhd1RgSdnxR-jGhMcLcuGi0FjuLGFA7Nm1gAwewZ2BR8s-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org?utm_medium=email&utm_source=footer
>>  19. https://groups.google.com/d/optout
>>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "pandoc-discuss" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/pandoc-discuss/KGE-0N4LxOo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/20160204203356.GB58697%40protagoras.berkeley.edu
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CA%2BVuhd3KszNEK9Wb%3Dw%3DpSNk6iqeBurimAzFDJn8-1a--48jLfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 13131 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found] ` <97277b03-f86a-4120-a07d-eecbf08a17e3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2015-08-24 16:56   ` BPJ
  2015-08-24 17:06   ` John MACFARLANE
@ 2017-01-05  0:41   ` Grady D
       [not found]     ` <2fd0ec49-eed4-4a05-8b97-5389a7d4fb97-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2 siblings, 1 reply; 17+ messages in thread
From: Grady D @ 2017-01-05  0:41 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1939 bytes --]

I am having an issue similar to this, but with a few members of a large 
collection of LaTeX files that I am trying to convert to plaintext's. It is 
not feasible for me to do the examination described here, so I'm wondering 
if anyone can recommend me a filter other method that would allow me to 
either ignore or replace the problematic bytes.

Can anyone help me?
Thanks

On Monday, August 24, 2015 at 10:01:32 AM UTC-4, kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org 
wrote:
>
> When I try to create a self-contained HTML document which uses 
> *bootstrap.min.css*, I encounter the following error:
>
>         pandoc: Cannot decode byte '\xa1': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
>
> My command line includes --to html --self-contained --css 
> http://cups.org/css/bootstrap.min.css. 
>
> I’m unable to locate the position of the bye '\xa1' (because — quite 
> untrue to its name! — this CSS is almost 1 MByte in filesize!
>
> The problem occurs with any, even the most minimal Markdown input. 
> Changing the command to --standalone does get rid of the problem (as is 
> to be expected from the symptoms).
>
>    - Is this a problem with the specific CSS? 
>    - Is it a problem with all CSS which are derived from bootstrap.css? 
>    - Or is this a bug in Pandoc? 
>
> ​
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2fd0ec49-eed4-4a05-8b97-5389a7d4fb97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 6616 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]     ` <2fd0ec49-eed4-4a05-8b97-5389a7d4fb97-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-01-05  1:46       ` Grady D
  2017-01-05 20:44       ` BP Jonsson
  1 sibling, 0 replies; 17+ messages in thread
From: Grady D @ 2017-01-05  1:46 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2319 bytes --]

I don't think it is even possible for a filter to allow pandoc to ignore 
characters it doesn't like because I can't even get the documents to AST 
format. Anything else I could try?

Also, the word "plaintext's" in my comment above should be "plaintexts." 

On Wednesday, January 4, 2017 at 7:41:34 PM UTC-5, Grady D wrote:
>
> I am having an issue similar to this, but with a few members of a large 
> collection of LaTeX files that I am trying to convert to plaintext's. It is 
> not feasible for me to do the examination described here, so I'm wondering 
> if anyone can recommend me a filter other method that would allow me to 
> either ignore or replace the problematic bytes.
>
> Can anyone help me?
> Thanks
>
> On Monday, August 24, 2015 at 10:01:32 AM UTC-4, kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org 
> wrote:
>>
>> When I try to create a self-contained HTML document which uses 
>> *bootstrap.min.css*, I encounter the following error:
>>
>>         pandoc: Cannot decode byte '\xa1': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
>>
>> My command line includes --to html --self-contained --css 
>> http://cups.org/css/bootstrap.min.css. 
>>
>> I’m unable to locate the position of the bye '\xa1' (because — quite 
>> untrue to its name! — this CSS is almost 1 MByte in filesize!
>>
>> The problem occurs with any, even the most minimal Markdown input. 
>> Changing the command to --standalone does get rid of the problem (as is 
>> to be expected from the symptoms).
>>
>>    - Is this a problem with the specific CSS? 
>>    - Is it a problem with all CSS which are derived from bootstrap.css? 
>>    - Or is this a bug in Pandoc? 
>>
>> ​
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/36e240f7-05a3-4635-89dc-68c9d11a5d6b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 7168 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]     ` <2fd0ec49-eed4-4a05-8b97-5389a7d4fb97-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-01-05  1:46       ` Grady D
@ 2017-01-05 20:44       ` BP Jonsson
       [not found]         ` <084fc935-ac91-6b27-44c0-e9f424e5ebf1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 17+ messages in thread
From: BP Jonsson @ 2017-01-05 20:44 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Den 2017-01-05 kl. 01:41, skrev Grady D:
> I am having an issue similar to this, but with a few members of a large 
> collection of LaTeX files that I am trying to convert to plaintext's. It is 
> not feasible for me to do the examination described here, so I'm wondering 
> if anyone can recommend me a filter other method that would allow me to 
> either ignore or replace the problematic bytes.
> 
> Can anyone help me?
> Thanks

That's usually a symptom of files being in a legacy encoding.
Unless you have reason to believe that the file is in a legacy
TeX encoding, which is unusual, it is probably in Latin-1 or its
bastard cousin cp1252. If you are on Mac or Linux your best bet
is to run things through the iconv program assuming cp1252:

    iconv -f CP1252 -t UTF8 legacy-file | pandoc [pandoc options]

If you are on Windows the best solution is to use the piconv
program which comes with Perl. You will need to install perl
<http://strawberryperl.com> and then do

    piconv -f cp1252 -t utf-8 legacy-file | pandoc [pandoc options]

If your files are in English that should do the trick. At the
very least the file will be readable to pandoc. If you get a lot
of out-of-place special characters you may have a legacy TeX
encoding. In that case let me know and I may be able to help you,
or find someone/something which can. If the file is in some other
language the correct legacy encoding may depend on the language,
but if you know what language it is there is a limited number of
suspects for each.

/bpj

> 
> On Monday, August 24, 2015 at 10:01:32 AM UTC-4, kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org 
> wrote:
>>
>> When I try to create a self-contained HTML document which uses 
>> *bootstrap.min.css*, I encounter the following error:
>>
>>         pandoc: Cannot decode byte '\xa1': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
>>
>> My command line includes --to html --self-contained --css 
>> http://cups.org/css/bootstrap.min.css. 
>>
>> I’m unable to locate the position of the bye '\xa1' (because — quite 
>> untrue to its name! — this CSS is almost 1 MByte in filesize!
>>
>> The problem occurs with any, even the most minimal Markdown input. 
>> Changing the command to --standalone does get rid of the problem (as is 
>> to be expected from the symptoms).
>>
>>    - Is this a problem with the specific CSS? 
>>    - Is it a problem with all CSS which are derived from bootstrap.css? 
>>    - Or is this a bug in Pandoc? 
>>
>> ​
>>
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/084fc935-ac91-6b27-44c0-e9f424e5ebf1%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream
       [not found]         ` <084fc935-ac91-6b27-44c0-e9f424e5ebf1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-01-05 21:21           ` Grady D
  0 siblings, 0 replies; 17+ messages in thread
From: Grady D @ 2017-01-05 21:21 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 4273 bytes --]

I set aside a few of the problem documents to test with, and iconv fixes
every one I've tested so far. I will proceed to try this on the larger set.

Thank you for your help.


On Thu, Jan 5, 2017 at 3:44 PM, BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Den 2017-01-05 kl. 01:41, skrev Grady D:
> > I am having an issue similar to this, but with a few members of a large
> > collection of LaTeX files that I am trying to convert to plaintext's. It
> is
> > not feasible for me to do the examination described here, so I'm
> wondering
> > if anyone can recommend me a filter other method that would allow me to
> > either ignore or replace the problematic bytes.
> >
> > Can anyone help me?
> > Thanks
>
> That's usually a symptom of files being in a legacy encoding.
> Unless you have reason to believe that the file is in a legacy
> TeX encoding, which is unusual, it is probably in Latin-1 or its
> bastard cousin cp1252. If you are on Mac or Linux your best bet
> is to run things through the iconv program assuming cp1252:
>
>     iconv -f CP1252 -t UTF8 legacy-file | pandoc [pandoc options]
>
> If you are on Windows the best solution is to use the piconv
> program which comes with Perl. You will need to install perl
> <http://strawberryperl.com> and then do
>
>     piconv -f cp1252 -t utf-8 legacy-file | pandoc [pandoc options]
>
> If your files are in English that should do the trick. At the
> very least the file will be readable to pandoc. If you get a lot
> of out-of-place special characters you may have a legacy TeX
> encoding. In that case let me know and I may be able to help you,
> or find someone/something which can. If the file is in some other
> language the correct legacy encoding may depend on the language,
> but if you know what language it is there is a limited number of
> suspects for each.
>
> /bpj
>
> >
> > On Monday, August 24, 2015 at 10:01:32 AM UTC-4,
> kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org
> > wrote:
> >>
> >> When I try to create a self-contained HTML document which uses
> >> *bootstrap.min.css*, I encounter the following error:
> >>
> >>         pandoc: Cannot decode byte '\xa1': Data.Text.Internal.Encoding.Fusion.streamUtf8:
> Invalid UTF-8 stream
> >>
> >> My command line includes --to html --self-contained --css
> >> http://cups.org/css/bootstrap.min.css.
> >>
> >> I’m unable to locate the position of the bye '\xa1' (because — quite
> >> untrue to its name! — this CSS is almost 1 MByte in filesize!
> >>
> >> The problem occurs with any, even the most minimal Markdown input.
> >> Changing the command to --standalone does get rid of the problem (as is
> >> to be expected from the symptoms).
> >>
> >>    - Is this a problem with the specific CSS?
> >>    - Is it a problem with all CSS which are derived from bootstrap.css?
> >>    - Or is this a bug in Pandoc?
> >>
> >> ​
> >>
> >
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "pandoc-discuss" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/pandoc-discuss/KGE-0N4LxOo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/pandoc-discuss/084fc935-ac91-6b27-44c0-e9f424e5ebf1%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAAmd0uOiJB%3D6EWzJXruhJkwT6xd7%3DMfF8mfqYWU7oXARzmBKpw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 6219 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-01-05 21:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-24 14:01 pandoc: Cannot decode byte '\xa1': [....]: Invalid UTF-8 stream kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
     [not found] ` <97277b03-f86a-4120-a07d-eecbf08a17e3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-08-24 16:56   ` BPJ
2015-08-24 17:06   ` John MACFARLANE
     [not found]     ` <20150824170637.GA45262-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org>
2015-08-25  7:26       ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
     [not found]         ` <a1a5ced8-ce61-4091-ab33-8ea9d200343c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-08-25 20:16           ` John MACFARLANE
     [not found]             ` <20150825201606.GC98439-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org>
2015-08-25 20:20               ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
     [not found]                 ` <a330fdd1-22fa-491c-b6b0-4888ddbfb68a-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-08-27  7:47                   ` Melroch
     [not found]                     ` <CADAJKhDqj+g2Kot-gqOq4osi6VOb7U8+tdZ4Q7Ny1+TGF8hoOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-27 19:24                       ` John MACFARLANE
     [not found]                         ` <20150827192406.GC66925-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org>
2016-02-03 19:35                           ` Alex Palecek
     [not found]                             ` <fae36051-8a58-492b-a8f2-5f8f0ff7f9d2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-02-03 21:26                               ` John MACFARLANE
     [not found]                                 ` <20160203212640.GA43660-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
2016-02-04  6:12                                   ` Alex Palecek
     [not found]                                     ` <CA+Vuhd1RgSdnxR-jGhMcLcuGi0FjuLGFA7Nm1gAwewZ2BR8s-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-02-04 20:33                                       ` John MACFARLANE
     [not found]                                         ` <20160204203356.GB58697-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
2016-02-04 23:12                                           ` Alex Palecek
2017-01-05  0:41   ` Grady D
     [not found]     ` <2fd0ec49-eed4-4a05-8b97-5389a7d4fb97-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-01-05  1:46       ` Grady D
2017-01-05 20:44       ` BP Jonsson
     [not found]         ` <084fc935-ac91-6b27-44c0-e9f424e5ebf1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-01-05 21:21           ` Grady D

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).