* Smart quotes recognition @ 2010-11-18 13:59 Joost Kremers [not found] ` <20101118135904.GC15326-4Qa7NeS2ENVPDCrvnpRrPfNq91seawkrvu54Y+ZNwJg@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Joost Kremers @ 2010-11-18 13:59 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Hi, Today, I ran the following (German) text through markdown2pdf: ========== 1) Seite 5: "Ausgehend von der "Uniformity of Theta Assignment Hypothesis" (UTAH) als Leitmotiv faßt Baker genau die morphologischen Konstruktionen als syntaktische Operationen, d.h. als Instanzen syntaktischer X° Bewegung auf, die einen "Grammatical Function-Changing"-Prozeß (GFC-Prozeß) indizieren, [...]." - Ich glaube, ich verstehe gundlegend worum es geht, doch wäre es schön dies nochmal an einem Beispiel zu demonstrieren. ========== When I examined the pdf output, I noticed something weird: the double quotes before «Uniformity» had become closing quotes and the space after «der» had disappeared. When converting to LaTeX, the output is the following: ========== \item Seite 5: ``Ausgehend von der''Uniformity of Theta Assignment Hypothesis" (UTAH) als Leitmotiv faßt Baker genau die morphologischen Konstruktionen als syntatktische Operationen, d.h. als Instanzen syntaktischer X° Bewegung auf, die einen ``Grammatical Function-Changing''-Prozeß (GFC-Prozeß) indizieren, [\ldots{}]." - Ich glaube, ich verstehe gundlegend worum es geht, doch wäre es schön dies nochmal an einem Beispiel zu demonstrieren. ========== I suspect Pandoc keeps track of quotes in order to determine whether a given quote must be an opening or a closing quote, which obviously leads to false results in this case. Not only is the double quote before «Uniformity» interpreted as a closing quote and the space deleted, the double quotes after «Hypothesis» and after the ellipsis are not converted to '' as they should be. (Though I don't really understand why this happens...) Wouldn't it make more sense to determine whether an open or closing quote is needed by examining the character before and after the quote? It would of course not be a simple matter of determining on which side the space is, that would fail for the string «"Grammatical Function-Changing"-Prozeß» in the above example. I guess that would require some sort of character hierarchy, something along the lines of: 1. letters 2. ,.!?;: 3. ()- 4. space (I'm sure I left out other relevant characters). Now, when the character following " ranks higher than the character preceding it, " is converted into a closing quote, and when the preceding character is ranked lower, " is converted to an opening quote. I know one isn't really supposed to use double quotes inside double quotes, but if I use them anyway, I would like them to work... Perhaps an alternative implementation could be considered? Thanks, Joost -- Joost Kremers Life has its moments -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en. ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <20101118135904.GC15326-4Qa7NeS2ENVPDCrvnpRrPfNq91seawkrvu54Y+ZNwJg@public.gmane.org>]
* Re: Smart quotes recognition [not found] ` <20101118135904.GC15326-4Qa7NeS2ENVPDCrvnpRrPfNq91seawkrvu54Y+ZNwJg@public.gmane.org> @ 2010-11-21 21:45 ` John MacFarlane [not found] ` <20101121214556.GA27359-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: John MacFarlane @ 2010-11-21 21:45 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw +++ Joost Kremers [Nov 18 10 14:59 ]: > Hi, > > Today, I ran the following (German) text through markdown2pdf: > > ========== > > 1) Seite 5: "Ausgehend von der "Uniformity of Theta Assignment Hypothesis" (UTAH) > als Leitmotiv faßt Baker genau die morphologischen Konstruktionen als > syntaktische Operationen, d.h. als Instanzen syntaktischer X° Bewegung auf, > die einen "Grammatical Function-Changing"-Prozeß (GFC-Prozeß) indizieren, > [...]." - Ich glaube, ich verstehe gundlegend worum es geht, doch wäre es > schön dies nochmal an einem Beispiel zu demonstrieren. > > ========== > > When I examined the pdf output, I noticed something weird: the double quotes > before «Uniformity» had become closing quotes and the space after «der» had > disappeared. > > When converting to LaTeX, the output is the following: > > ========== > > \item > Seite 5: ``Ausgehend von der''Uniformity of Theta Assignment > Hypothesis" (UTAH) als Leitmotiv faßt Baker genau die > morphologischen Konstruktionen als syntatktische Operationen, d.h. > als Instanzen syntaktischer X° Bewegung auf, die einen > ``Grammatical Function-Changing''-Prozeß (GFC-Prozeß) indizieren, > [\ldots{}]." - Ich glaube, ich verstehe gundlegend worum es geht, > doch wäre es schön dies nochmal an einem Beispiel zu > demonstrieren. > > ========== > > I suspect Pandoc keeps track of quotes in order to determine whether a given > quote must be an opening or a closing quote, which obviously leads to false > results in this case. Not only is the double quote before «Uniformity» > interpreted as a closing quote and the space deleted, the double quotes after > «Hypothesis» and after the ellipsis are not converted to '' as they should be. > (Though I don't really understand why this happens...) > > Wouldn't it make more sense to determine whether an open or closing quote is > needed by examining the character before and after the quote? You're right that pandoc's smart quote parsing works by keeping track of when you've already got an open single or double quote, and then waiting for a matching closer. It does not work by examining the character before and after the quote. It couldn't, because the parser doesn't keep track of the previously parsed character. (And as far as I can see, it would be difficult to make it do so.) The present system assumes that you aren't going to have double quotes within double quotes. It breaks when you violate the convention of alternating quote styles. I'm not sure there's much to be done about this... You can always use unicode open and close quote characters in your markdown document, if you want to use double quotes within double quotes. John -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en. ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <20101121214556.GA27359-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>]
* Re: Smart quotes recognition [not found] ` <20101121214556.GA27359-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> @ 2010-11-23 9:43 ` Joost Kremers 0 siblings, 0 replies; 3+ messages in thread From: Joost Kremers @ 2010-11-23 9:43 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Hi John, [smart quotes] > You're right that pandoc's smart quote parsing works by keeping track > of when you've already got an open single or double quote, and then > waiting for a matching closer. It does not work by examining the > character before and after the quote. It couldn't, because the parser > doesn't keep track of the previously parsed character. (And as far > as I can see, it would be difficult to make it do so.) There's certainly no point in making fundamental changes to how the parser works just for this particular issue. In fact, I probably wouldn't even have run into it if I hadn't blindly copied text from an email into a markdown document. Thanks for your explanation, though. :-) Joost -- Joost Kremers Life has its moments ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-11-23 9:43 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-11-18 13:59 Smart quotes recognition Joost Kremers [not found] ` <20101118135904.GC15326-4Qa7NeS2ENVPDCrvnpRrPfNq91seawkrvu54Y+ZNwJg@public.gmane.org> 2010-11-21 21:45 ` John MacFarlane [not found] ` <20101121214556.GA27359-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> 2010-11-23 9:43 ` Joost Kremers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).