public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* [citeproc] textual citation
@ 2010-11-11  1:49 Andrea Rossato
       [not found] ` <20101111014927.GP24988-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-11  1:49 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1397 bytes --]

Hi,

it took a bit, since in the meantime I tried to move on with the
CSL-1.0 implementation (now citation collapsing works) but I
eventually was able to get the textual citation running.

I didn't touch the parsers, so the syntax is not there, but there is
the '+' modifier to play with...

Now, a textual citation would look like [+@item1], instead of @item1.
A textual citation with multiple citations would look like [+@item1;
@item2], etc.

I'm attaching the code for pandoc:
0009-add-support-for-textual-citation.patch

and pandoc-types:
0001-mv-AuthorOnly-AuthoInText.patch

Here you can find updated tests:
http://gorgias.mine.nu/citeproc/

I didn't switch to the Map, yet.

As for the API: when a citation group with a leading AuthorInText
citation is sent to the processor, the first item in the returned list
is the label to be placed in-text. This citation is then rendered as a
SuppressAuthor one in the citation group.

We should be almost done, I think.

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: 0001-mv-AuthorOnly-AuthoInText.patch --]
[-- Type: text/plain, Size: 897 bytes --]

From 5431832bc9d81d5391f10af463666aee71976a2d Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Wed, 10 Nov 2010 12:13:48 +0100
Subject: [PATCH] mv AuthorOnly AuthoInText

---
 Text/Pandoc/Definition.hs |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Text/Pandoc/Definition.hs b/Text/Pandoc/Definition.hs
index 1c2db89..3b8c7e6 100644
--- a/Text/Pandoc/Definition.hs
+++ b/Text/Pandoc/Definition.hs
@@ -142,7 +142,7 @@ instance Eq Citation where
     (==) (Citation _ _ _ _ _ ha)
          (Citation _ _ _ _ _ hb) = ha == hb
 
-data CitationMode = AuthorOnly | SuppressAuthor | NormalCitation
+data CitationMode = AuthorInText | SuppressAuthor | NormalCitation
                     deriving (Show, Eq, Ord, Read, Typeable, Data)
 
 -- | Applies a transformation on @a@s to matching elements in a @b@.
-- 
1.7.1


[-- Attachment #3: 0009-add-support-for-textual-citation.patch --]
[-- Type: text/plain, Size: 5543 bytes --]

From 32df46a6c66df16e7a125bda023447370b73dac5 Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Thu, 11 Nov 2010 01:56:03 +0100
Subject: [PATCH 9/9] add support for textual citation

---
 src/Text/Pandoc/Biblio.hs           |   45 +++++++++++++++++++++--------------
 src/Text/Pandoc/Readers/Markdown.hs |    2 +-
 2 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/src/Text/Pandoc/Biblio.hs b/src/Text/Pandoc/Biblio.hs
index d8a4659..c334d89 100644
--- a/src/Text/Pandoc/Biblio.hs
+++ b/src/Text/Pandoc/Biblio.hs
@@ -54,19 +54,27 @@ processBiblio cf r p
             result     = citeproc csl r (setNearNote csl $ map (map toCslCite) grps)
             cits_map   = zip grps (citations result)
             biblioList = map (renderPandoc' csl) (bibliography result)
-            Pandoc m b = processWith (processCite csl cits_map) p'
+            Pandoc m b = processWith (procInlines $ processCite csl cits_map) p'
         return . generateNotes nts . Pandoc m $ b ++ biblioList
 
 -- | Substitute 'Cite' elements with formatted citations.
-processCite :: Style -> [([Citation],[FormattedOutput])] -> Inline -> Inline
-processCite s cs il
-    | Cite t _ <- il = Cite t (process t)
-    | otherwise      = il
+processCite :: Style -> [([Citation],[FormattedOutput])] -> [Inline] -> [Inline]
+processCite _ _ [] = []
+processCite s cs (i:is)
+    | Cite t _ <- i = process t ++ processCite s cs is
+    | otherwise     = i          : processCite s cs is
     where
       process t = case lookup t cs of
-                    Just  i -> renderPandoc s i
+                    Just  x -> if isTextualCitation t && x /= []
+                               then renderPandoc s [head x] ++ [Space] ++
+                                    [Cite t $ renderPandoc s $ tail x]
+                               else [Cite t $ renderPandoc s        x]
                     Nothing -> [Str ("Error processing " ++ show t)]
 
+isTextualCitation :: [Citation] -> Bool
+isTextualCitation (c:_) = citationMode c == AuthorInText
+isTextualCitation _     = False
+
 -- | Retrieve all citations from a 'Pandoc' docuument. To be used with
 -- 'queryWith'.
 getCitation :: Inline -> [[Citation]]
@@ -109,22 +117,22 @@ mvCiteInNote is = procInlines mvCite
       mvCite :: [Inline] -> [Inline]
       mvCite inls
           | x:i:xs <- inls, startWithPunct xs
-          , x == Space,   i `elem_` is = split i xs ++ mvCite (tailFirstInlineStr xs)
+          , x == Space,   i `elem_` is = switch i xs ++ mvCite (tailFirstInlineStr xs)
           | x:i:xs <- inls
-          , x == Space,   i `elem_` is = mvInNote i :  mvCite xs
+          , x == Space,   i `elem_` is = mvInNote i :   mvCite xs
           | i:xs <- inls, i `elem_` is
-          , startWithPunct xs          = split i xs ++ mvCite (tailFirstInlineStr xs)
-          | i:xs <- inls, Note _ <- i  = checkNt  i :  mvCite xs
-          | i:xs <- inls               = i          :  mvCite xs
+          , startWithPunct xs          = switch i xs ++ mvCite (tailFirstInlineStr xs)
+          | i:xs <- inls, Note _ <- i  = checkNt  i :   mvCite xs
+          | i:xs <- inls               = i          :   mvCite xs
           | otherwise                  = []
-      elem_ x xs = case x of Cite cs _ -> (Cite cs []) `elem` xs; _ -> False
-      split i xs = Str (headInline xs) : mvInNote i : []
+      elem_  x xs = case x of Cite cs _ -> (Cite cs []) `elem` xs; _ -> False
+      switch i xs = Str (headInline xs) : mvInNote i : []
       mvInNote i
           | Cite t o <- i = Note [Para [Cite t $ sanitize o]]
           | otherwise     = Note [Para [i                  ]]
       sanitize i
-          | endWithPunct i = toCapital i
-          | otherwise      = toCapital (i ++ [Str "."])
+          | endWithPunct   i = toCapital i
+          | otherwise        = toCapital (i ++ [Str "."])
 
       checkPt i
           | Cite c o : xs <- i
@@ -142,10 +150,10 @@ setCitationNoteNum :: Int -> [Citation] -> [Citation]
 setCitationNoteNum i = map $ \c -> c { citationNoteNum = i}
 
 toCslCite :: Citation -> CSL.Cite
-toCslCite (Citation i p l cm nn _)
+toCslCite (Citation i p l cm nn h)
     = let (la,lo) = parseLocator l
           citMode = case cm of
-                      AuthorOnly     -> (True, False)
+                      AuthorInText   -> (True, False)
                       SuppressAuthor -> (False,True )
                       NormalCitation -> (False,False)
       in   emptyCite { CSL.citeId         = i
@@ -153,6 +161,7 @@ toCslCite (Citation i p l cm nn _)
                      , CSL.citeLabel      = la
                      , CSL.citeLocator    = lo
                      , CSL.citeNoteNumber = show nn
-                     , CSL.authorOnly     = fst citMode
+                     , CSL.authorInText   = fst citMode
                      , CSL.suppressAuthor = snd citMode
+                     , CSL.citeHash       = h
                      }
diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs
index b7c5220..7a42d90 100644
--- a/src/Text/Pandoc/Readers/Markdown.hs
+++ b/src/Text/Pandoc/Readers/Markdown.hs
@@ -1345,6 +1345,6 @@ parseLabel = try $ do
                else (p        , False)
       mode = case (na,o) of
                (True, False) -> SuppressAuthor
-               (False,True ) -> AuthorOnly
+               (False,True ) -> AuthorInText
                _             -> NormalCitation
   return $ Citation cit (trim p') (trim loc) mode 0 0

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found] ` <20101111014927.GP24988-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-12  6:36   ` John MacFarlane
       [not found]     ` <20101112063622.GA8676-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-12  6:36 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 11 10 02:49 ]:
> Hi,
> 
> it took a bit, since in the meantime I tried to move on with the
> CSL-1.0 implementation (now citation collapsing works) but I
> eventually was able to get the textual citation running.
> 
> I didn't touch the parsers, so the syntax is not there, but there is
> the '+' modifier to play with...
> 
> Now, a textual citation would look like [+@item1], instead of @item1.
> A textual citation with multiple citations would look like [+@item1;
> @item2], etc.
> 
> I'm attaching the code for pandoc:
> 0009-add-support-for-textual-citation.patch
> 
> and pandoc-types:
> 0001-mv-AuthorOnly-AuthoInText.patch
> 
> Here you can find updated tests:
> http://gorgias.mine.nu/citeproc/
> 
> I didn't switch to the Map, yet.
> 
> As for the API: when a citation group with a leading AuthorInText
> citation is sent to the processor, the first item in the returned list
> is the label to be placed in-text. This citation is then rendered as a
> SuppressAuthor one in the citation group.
> 
> We should be almost done, I think.

Great!  I've pushed these changes, and also the Map change.

I'm working on the parser now.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]     ` <20101112063622.GA8676-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-12  8:43       ` John MacFarlane
       [not found]         ` <20101112084314.GA15038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-12  8:43 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Nov 11 10 22:36 ]:
> +++ Andrea Rossato [Nov 11 10 02:49 ]:
> > Hi,
> > 
> > it took a bit, since in the meantime I tried to move on with the
> > CSL-1.0 implementation (now citation collapsing works) but I
> > eventually was able to get the textual citation running.
> > 
> > I didn't touch the parsers, so the syntax is not there, but there is
> > the '+' modifier to play with...
> > 
> > Now, a textual citation would look like [+@item1], instead of @item1.
> > A textual citation with multiple citations would look like [+@item1;
> > @item2], etc.
> > 
> > I'm attaching the code for pandoc:
> > 0009-add-support-for-textual-citation.patch
> > 
> > and pandoc-types:
> > 0001-mv-AuthorOnly-AuthoInText.patch
> > 
> > Here you can find updated tests:
> > http://gorgias.mine.nu/citeproc/
> > 
> > I didn't switch to the Map, yet.
> > 
> > As for the API: when a citation group with a leading AuthorInText
> > citation is sent to the processor, the first item in the returned list
> > is the label to be placed in-text. This citation is then rendered as a
> > SuppressAuthor one in the citation group.
> > 
> > We should be almost done, I think.
> 
> Great!  I've pushed these changes, and also the Map change.

> 
> I'm working on the parser now.

OK, I've pushed changes to the markdown reader integrating
the new format for textual citations.

I've also added a rudimentary test, similar to what Nathan had,
but with the new syntax.  However, the test currently fails
with a multibyte read error when it tries to access the locale
files in citeproc-hs.   Perhaps those need to be encoded as
UTF-8?

A couple other issues:

@doe99 [p. 30]

should give you

Doe 1999 (30)

but it doesn't work.  The Citation is getting its locator set
all right, but the formatted inlines don't include the locator.
I think this needs to be addressed in citeproc-hs.

Also, I got some bibliography glitches when I added publisher,
address, and some other fields to the test bibliography. You can
try it out with the markdown-citations.txt, biblio.bib,
and chicago-author-date.csl in the tests directory.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]         ` <20101112084314.GA15038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-12 11:16           ` Andrea Rossato
       [not found]             ` <20101112111654.GE19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-11-12 13:45           ` Nathan Gass
                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-12 11:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Fri, Nov 12, 2010 at 12:43:14AM -0800, John MacFarlane wrote:
> OK, I've pushed changes to the markdown reader integrating
> the new format for textual citations.
> 
> I've also added a rudimentary test, similar to what Nathan had,
> but with the new syntax.  However, the test currently fails
> with a multibyte read error when it tries to access the locale
> files in citeproc-hs.   Perhaps those need to be encoded as
> UTF-8?

I'm not able to reproduce the UTF-8 problem (but I didn't try the
test-suite: I'm missing Diff). I'm quite sure locale files should be
correctly encoded. I'll double check, also using the test-suite, and
report back.

> A couple other issues:
> 
> @doe99 [p. 30]
> 
> should give you
> 
> Doe 1999 (30)
>
> but it doesn't work.  The Citation is getting its locator set
> all right, but the formatted inlines don't include the locator.
> I think this needs to be addressed in citeproc-hs.

No, that should give you "Doe (1999, 30)", which is what you seem to
get: "Doe (2005, 30) says blah." That seems to me the correct
behavior.

> Also, I got some bibliography glitches when I added publisher,
> address, and some other fields to the test bibliography. You can
> try it out with the markdown-citations.txt, biblio.bib,
> and chicago-author-date.csl in the tests directory.

This is really strange: I could reproduce the problem, but sometimes I
cannot. For instance, here I got everything right (unless the citation
in the first note):
http://gorgias.mine.nu/citeproc/markdown-citations.html

I'll investigate a bit further.

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]         ` <20101112084314.GA15038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-12 11:16           ` Andrea Rossato
@ 2010-11-12 13:45           ` Nathan Gass
       [not found]             ` <4CDD4501.7030700-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
  2010-11-12 15:38           ` Andrea Rossato
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 71+ messages in thread
From: Nathan Gass @ 2010-11-12 13:45 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 12.11.10 09:43, John MacFarlane wrote:
> +++ John MacFarlane [Nov 11 10 22:36 ]:
>> +++ Andrea Rossato [Nov 11 10 02:49 ]:
>>> Hi,
>>>
>>> it took a bit, since in the meantime I tried to move on with the
>>> CSL-1.0 implementation (now citation collapsing works) but I
>>> eventually was able to get the textual citation running.
>>>
>>> I didn't touch the parsers, so the syntax is not there, but there is
>>> the '+' modifier to play with...
>>>
>>> Now, a textual citation would look like [+@item1], instead of @item1.
>>> A textual citation with multiple citations would look like [+@item1;
>>> @item2], etc.
>>>
>>> I'm attaching the code for pandoc:
>>> 0009-add-support-for-textual-citation.patch
>>>
>>> and pandoc-types:
>>> 0001-mv-AuthorOnly-AuthoInText.patch
>>>
>>> Here you can find updated tests:
>>> http://gorgias.mine.nu/citeproc/
>>>
>>> I didn't switch to the Map, yet.
>>>
>>> As for the API: when a citation group with a leading AuthorInText
>>> citation is sent to the processor, the first item in the returned list
>>> is the label to be placed in-text. This citation is then rendered as a
>>> SuppressAuthor one in the citation group.
>>>
>>> We should be almost done, I think.
>>
>> Great!  I've pushed these changes, and also the Map change.
>
>>
>> I'm working on the parser now.
>
> OK, I've pushed changes to the markdown reader integrating
> the new format for textual citations.

I merged your changes and Andrea's changes into my branch.

>
> I've also added a rudimentary test, similar to what Nathan had,
> but with the new syntax.  However, the test currently fails
> with a multibyte read error when it tries to access the locale
> files in citeproc-hs.

The tests run on my system, but fail. My own branch added more failures, 
but I figured I can fix them once your branch has no failures.

By the way, how should the writers handle multiple AuthorInText in one 
Citation? This question arises because AuthorInText is marked by another 
CitationMode (as I wrongly suggested myself earlier) and not by a Bool 
on the Cite.

Greetings
Nathan



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]         ` <20101112084314.GA15038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-12 11:16           ` Andrea Rossato
  2010-11-12 13:45           ` Nathan Gass
@ 2010-11-12 15:38           ` Andrea Rossato
       [not found]             ` <20101112153829.GG19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-11-12 23:23           ` Andrea Rossato
  2010-11-13  1:11           ` Andrea Rossato
  4 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-12 15:38 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Fri, Nov 12, 2010 at 12:43:14AM -0800, John MacFarlane wrote:
> A couple other issues:
> 
> @doe99 [p. 30]
> 
> should give you
> 
> Doe 1999 (30)
> 
> but it doesn't work.  The Citation is getting its locator set
> all right, but the formatted inlines don't include the locator.
> I think this needs to be addressed in citeproc-hs.

I found the problem and fixed it (it was a recent regression).

> Also, I got some bibliography glitches when I added publisher,
> address, and some other fields to the test bibliography. You can
> try it out with the markdown-citations.txt, biblio.bib,
> and chicago-author-date.csl in the tests directory.

Here there are some issues in the mods parser (on the bibutils side):
the output for "address" seems to have changed recently.

There are also some problems in the reference type parsing, which are
due to the MODS model. I'm trying to address this problem and come
back with something.

There are also issues with the styles you are using for testing. I
believe some of them are still in development, because they show
problems which others, while using similar variables and producing
similar output, do not have. That makes me think we must be careful
also in choosing the styles for testing. We should not confuse a bug
in the style with a bug in the processor.

I'm going to introduce the possibility to read json bibliographic
databases, and thus set directly CSL variables (which is useful when
bibtex is not expressive enough). Related to that: the "--biblio" flag
should be able to take a list of files. What is the best way to
address the problem? Comma separated file names in a single String?
Any idea?

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]             ` <20101112111654.GE19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-12 16:08               ` John MacFarlane
  0 siblings, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-11-12 16:08 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 12 10 12:16 ]:
> On Fri, Nov 12, 2010 at 12:43:14AM -0800, John MacFarlane wrote:
> > OK, I've pushed changes to the markdown reader integrating
> > the new format for textual citations.
> > 
> > I've also added a rudimentary test, similar to what Nathan had,
> > but with the new syntax.  However, the test currently fails
> > with a multibyte read error when it tries to access the locale
> > files in citeproc-hs.   Perhaps those need to be encoded as
> > UTF-8?
> 
> I'm not able to reproduce the UTF-8 problem (but I didn't try the
> test-suite: I'm missing Diff). 

cabal install Diff

> I'm quite sure locale files should be
> correctly encoded. I'll double check, also using the test-suite, and
> report back.
> 
> > A couple other issues:
> > 
> > @doe99 [p. 30]
> > 
> > should give you
> > 
> > Doe 1999 (30)
> >
> > but it doesn't work.  The Citation is getting its locator set
> > all right, but the formatted inlines don't include the locator.
> > I think this needs to be addressed in citeproc-hs.
> 
> No, that should give you "Doe (1999, 30)", which is what you seem to
> get: "Doe (2005, 30) says blah." That seems to me the correct
> behavior.

But I don't get that.  What I get is:

Doe (2005) says blah.

(The file in the test suite reflects what I think it should be,
not what pandoc currently gives me.)

Best,
JOhn


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]             ` <4CDD4501.7030700-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
@ 2010-11-12 16:13               ` John MacFarlane
  2010-11-12 23:26               ` Andrea Rossato
  1 sibling, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-11-12 16:13 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

> By the way, how should the writers handle multiple AuthorInText in
> one Citation? This question arises because AuthorInText is marked by
> another CitationMode (as I wrongly suggested myself earlier) and not
> by a Bool on the Cite.

I think I raised this question earlier.  Andrea, how does it work now
if you have a list of Citations where some of the later Citations
are marked AuthorInText?  The parser will never give you this, but
I assume a behavior is defined...

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]             ` <20101112153829.GG19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-12 16:16               ` John MacFarlane
  0 siblings, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-11-12 16:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 12 10 16:38 ]:
> On Fri, Nov 12, 2010 at 12:43:14AM -0800, John MacFarlane wrote:
> > A couple other issues:
> > 
> > @doe99 [p. 30]
> > 
> > should give you
> > 
> > Doe 1999 (30)
> > 
> > but it doesn't work.  The Citation is getting its locator set
> > all right, but the formatted inlines don't include the locator.
> > I think this needs to be addressed in citeproc-hs.
> 
> I found the problem and fixed it (it was a recent regression).
> 
> > Also, I got some bibliography glitches when I added publisher,
> > address, and some other fields to the test bibliography. You can
> > try it out with the markdown-citations.txt, biblio.bib,
> > and chicago-author-date.csl in the tests directory.
> 
> Here there are some issues in the mods parser (on the bibutils side):
> the output for "address" seems to have changed recently.
> 
> There are also some problems in the reference type parsing, which are
> due to the MODS model. I'm trying to address this problem and come
> back with something.
> 
> There are also issues with the styles you are using for testing. I
> believe some of them are still in development, because they show
> problems which others, while using similar variables and producing
> similar output, do not have. That makes me think we must be careful
> also in choosing the styles for testing. We should not confuse a bug
> in the style with a bug in the processor.

Right.  Can you send me a few styles that are well-tested and would
work well for testing?  An author-date, a footnote style, and a
numerical style would be useful.

> I'm going to introduce the possibility to read json bibliographic
> databases, and thus set directly CSL variables (which is useful when
> bibtex is not expressive enough). Related to that: the "--biblio" flag
> should be able to take a list of files. What is the best way to
> address the problem? Comma separated file names in a single String?
> Any idea?

Another possibility would be to allow the --biblio flag to occur
repeatedly:

pandoc --biblio bib1.bib --biblio bib2.bib ...

That would avoid problems with commas in the filenames.
That's also consistent with the current behavior of flags
like --include-before-body.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]         ` <20101112084314.GA15038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
                             ` (2 preceding siblings ...)
  2010-11-12 15:38           ` Andrea Rossato
@ 2010-11-12 23:23           ` Andrea Rossato
       [not found]             ` <20101112232354.GH19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-11-13  1:11           ` Andrea Rossato
  4 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-12 23:23 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2024 bytes --]

On Fri, Nov 12, 2010 at 12:43:14AM -0800, John MacFarlane wrote:
> I've also added a rudimentary test, similar to what Nathan had,
> but with the new syntax.  However, the test currently fails
> with a multibyte read error when it tries to access the locale
> files in citeproc-hs.   Perhaps those need to be encoded as
> UTF-8?

In RunTest you need to set the LANG variable. See the attached patch.
I also updated the test to the latest citeproc-hs fixes.

If you can run the test (after pulling the latest changes from the
citeproc-hs repo), you'll see that the only left problem seems to be
the parser not eating some characters (the initial '[' and final ']',
only if the initial is followed by a '-' when in-text, always in
footnotes).

I fixed some of the other issues - I'm sure more are to be found. BTW,
there was a nasty and hidden bug so that "ed." was being printed after
the name of the editor, thus messing up everything. The
chicago-author-date style seems pretty fine.

The patches I'm sending:

 - we cannot use Map, since it uses the Ord instance, while we relay
   on the Eq instance to match only those citations that have the same
   citationHash: this is visible only with footnote styles, when the
   citationNoteNumber is being updated during the citation processing
   (great if you can find a better fix);

 - I added support for multiple bibliographic database: you can also
   mix their format (this is the cool part since this way you can use
   json for entries you cannot represent with bibtext). I don't know
   if '--biblio-format' is still consistent: any idea?

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: 0013-do-not-use-Map.lookup-which-uses-the-Ord-instance.patch --]
[-- Type: text/plain, Size: 2219 bytes --]

From ded18807b133e8a256c8b1bc1bb38f219084eb5c Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Fri, 12 Nov 2010 23:57:18 +0100
Subject: [PATCH 13/16] do not use Map.lookup (which uses the Ord instance)

We relay on the Eq instance to match only the citation group whose
citation hashes match.
---
 src/Text/Pandoc/Biblio.hs |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/Text/Pandoc/Biblio.hs b/src/Text/Pandoc/Biblio.hs
index bca24d8..c334d89 100644
--- a/src/Text/Pandoc/Biblio.hs
+++ b/src/Text/Pandoc/Biblio.hs
@@ -32,7 +32,6 @@ module Text.Pandoc.Biblio ( processBiblio ) where
 import Control.Monad ( when )
 import Data.List
 import Data.Unique
-import qualified Data.Map as M
 import Text.CSL hiding ( Cite(..), Citation(..) )
 import qualified Text.CSL as CSL ( Cite(..) )
 import Text.Pandoc.Definition
@@ -53,19 +52,19 @@ processBiblio cf r p
                                   needNt = cits \\ concat ncits
                               in (,) needNt $ getNoteCitations needNt p'
             result     = citeproc csl r (setNearNote csl $ map (map toCslCite) grps)
-            cits_map   = M.fromList $ zip grps (citations result)
+            cits_map   = zip grps (citations result)
             biblioList = map (renderPandoc' csl) (bibliography result)
             Pandoc m b = processWith (procInlines $ processCite csl cits_map) p'
         return . generateNotes nts . Pandoc m $ b ++ biblioList
 
 -- | Substitute 'Cite' elements with formatted citations.
-processCite :: Style -> M.Map [Citation] [FormattedOutput] -> [Inline] -> [Inline]
+processCite :: Style -> [([Citation],[FormattedOutput])] -> [Inline] -> [Inline]
 processCite _ _ [] = []
 processCite s cs (i:is)
     | Cite t _ <- i = process t ++ processCite s cs is
     | otherwise     = i          : processCite s cs is
     where
-      process t = case M.lookup t cs of
+      process t = case lookup t cs of
                     Just  x -> if isTextualCitation t && x /= []
                                then renderPandoc s [head x] ++ [Space] ++
                                     [Cite t $ renderPandoc s $ tail x]
-- 
1.7.1


[-- Attachment #3: 0014-add-support-fot-multiple-bibliographic-databases.patch --]
[-- Type: text/plain, Size: 2057 bytes --]

From 1fc637673d1fb822f2404e7d6dc9a25cbd688819 Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Fri, 12 Nov 2010 23:58:58 +0100
Subject: [PATCH 14/16] add support fot multiple bibliographic databases

---
 src/pandoc.hs |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/pandoc.hs b/src/pandoc.hs
index c8c414a..3ce4649 100644
--- a/src/pandoc.hs
+++ b/src/pandoc.hs
@@ -163,8 +163,8 @@ data Opt = Opt
     , optIndentedCodeClasses :: [String] -- ^ Default classes for indented code blocks
     , optDataDir           :: Maybe FilePath
 #ifdef _CITEPROC
-    , optBiblioFile        :: String
-    , optBiblioFormat      :: String
+    , optBiblioFile        :: [String]
+    , optBiblioFormat      :: [String]
     , optCslFile           :: String
 #endif
     }
@@ -522,12 +522,16 @@ options =
 #ifdef _CITEPROC
     , Option "" ["biblio"]
                  (ReqArg
-                  (\arg opt -> return opt { optBiblioFile = arg} )
+                  (\arg opt -> do
+                     let newFile = arg : optBiblioFile opt
+                     return opt { optBiblioFile = newFile} )
                   "FILENAME")
                  ""
     , Option "" ["biblio-format"]
                  (ReqArg
-                  (\arg opt -> return opt { optBiblioFormat = arg} )
+                  (\arg opt -> do
+                     let newFile = arg : optBiblioFormat opt
+                     return opt { optBiblioFormat = newFile} )
                   "STRING")
                  ""
     , Option "" ["csl"]
@@ -747,7 +751,9 @@ main = do
   let standalone' = standalone || isNonTextOutput writerName'
 
 #ifdef _CITEPROC
-  refs <- if null biblioFile then return [] else readBiblioFile biblioFile biblioFormat
+  refs <- if null biblioFile
+          then return []
+          else concat `fmap` mapM (uncurry readBiblioFile) (zip biblioFile $ biblioFormat ++ repeat [])
 #endif
 
   variables' <- case (writerName', standalone', offline) of
-- 
1.7.1


[-- Attachment #4: 0015-RunTest-must-set-the-LANG-environmental-variable-oth.patch --]
[-- Type: text/plain, Size: 1463 bytes --]

From 33694e9d0b99adfb76b138577307a336a4c98f32 Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Fri, 12 Nov 2010 23:59:43 +0100
Subject: [PATCH 15/16] RunTest must set the LANG environmental variable otherwise hGetContents will fail on multibyte content

---
 tests/RunTests.hs |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/tests/RunTests.hs b/tests/RunTests.hs
index f582218..4684515 100644
--- a/tests/RunTests.hs
+++ b/tests/RunTests.hs
@@ -114,6 +114,7 @@ main = do
              then mapM runLhsReaderTest lhsReaderFormats
              else putStrLn "Skipping lhs reader tests because they presuppose highlighting support" >> return []
   let results = r1s ++
+
                 [ r2, r3, r4, r5 -- S5
                 , r6, r7, r7a    -- markdown reader
                 , r8, r8a        -- rst
@@ -167,7 +168,8 @@ runTest testname opts inp norm = do
   let normPath = norm
   hFlush stdout
   -- Note: COLUMNS must be set for markdown table reader
-  ph <- runProcess pandocPath (opts ++ [inpPath] ++ ["--data-dir", ".."]) Nothing (Just [("COLUMNS", "80")]) Nothing (Just hOut) (Just stderr)
+  ph <- runProcess pandocPath (opts ++ [inpPath] ++ ["--data-dir", ".."]) Nothing
+        (Just [("LANG","en_US.UTF-8"),("COLUMNS", "80")]) Nothing (Just hOut) (Just stderr)
   ec <- waitForProcess ph
   result  <- if ec == ExitSuccess
                 then do
-- 
1.7.1


[-- Attachment #5: 0016-update-citation-test-to-latest-citeproc-hs-fixes.patch --]
[-- Type: text/plain, Size: 2201 bytes --]

From d84791830e8c274119d96bdf06426327e624e7d0 Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Sat, 13 Nov 2010 00:00:41 +0100
Subject: [PATCH 16/16] update citation test to latest citeproc-hs fixes

---
 tests/biblio.bib               |    2 +-
 tests/markdown-citations.plain |   17 +++++++++--------
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/tests/biblio.bib b/tests/biblio.bib
index d395eb5..755d535 100644
--- a/tests/biblio.bib
+++ b/tests/biblio.bib
@@ -20,7 +20,7 @@ author="John Doe and Jenny Roe",
 title="Why Water Is Wet",
 booktitle="Third Book",
 editor="Sam Smith",
-publisher="Oxford University PresS",
+publisher="Oxford University Press",
 address="Oxford",
 year="2007"
 }
diff --git a/tests/markdown-citations.plain b/tests/markdown-citations.plain
index 8521e8c..d734334 100644
--- a/tests/markdown-citations.plain
+++ b/tests/markdown-citations.plain
@@ -5,8 +5,8 @@ Pandoc with citeproc-hs
 
 @nonexistent
 
-Doe (2005) says blah. Doe (2005, 30) says blah. Doe (2005; 2006,
-30; see also Doe and Roe 2007) says blah.
+Doe (2005) says blah. Doe (2005, 30) says blah. Doe (2005) Doe
+(2006), p. 30; see also Doe and Roe (2007) says blah.
 
 In a note.[^1] A citation group
 (see Doe 2005, 34-35; also Doe and Roe 2007, chap. 3). Another one
@@ -17,19 +17,20 @@ Now some modifiers.[^3]
 References
 ==========
 
-Doe, John. 2005. First Book. Cambridge: Cambridge University Press.
+Doe, John. 2005. First Book. Cambridge: Cambridge University
+Press.
 
 ---. 2006. Article. Journal of Generic Studies 6: 33-34.
 
-Doe, John, and Jenny Roe. 2007. Why Water Is Wet. Sam Smith Ed.
-Third Book. Oxford: Oxford University Press.
+Doe, John, and Jenny Roe. 2007. Why Water Is Wet. In Third Book,
+ed. Sam Smith. Oxford: Oxford University Press.
 
 [^1]:
-    A citation without locators [Doe and Roe (2007)].
+    A citation without locators Doe and Roe (2007).
 
 [^2]:
-    Some citations (see Doe 2006, chap. 3; Doe and Roe 2007; Doe
-    2005).
+    Some citations see Doe (2006), chap. 3; Doe and Roe (2007); Doe
+    (2005).
 
 [^3]:
     Like a citation without author: (2005), and now Doe with a

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]             ` <4CDD4501.7030700-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
  2010-11-12 16:13               ` John MacFarlane
@ 2010-11-12 23:26               ` Andrea Rossato
  1 sibling, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-11-12 23:26 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Fri, Nov 12, 2010 at 02:45:37PM +0100, Nathan Gass wrote:
> By the way, how should the writers handle multiple AuthorInText in
> one Citation? This question arises because AuthorInText is marked by
> another CitationMode (as I wrongly suggested myself earlier) and not
> by a Bool on the Cite.

If a citation with the AuthorInText is at the beginning of the list
then we have a textual citation: in the generated output the first
element of the list is the label to be placed in-text.

When a citation with the AuthorInText is processed in a citation group
the citation is rendered with the name of the author without
formatting. That didn't change.

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]         ` <20101112084314.GA15038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
                             ` (3 preceding siblings ...)
  2010-11-12 23:23           ` Andrea Rossato
@ 2010-11-13  1:11           ` Andrea Rossato
       [not found]             ` <20101113011105.GJ19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  4 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-13  1:11 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Fri, Nov 12, 2010 at 12:43:14AM -0800, John MacFarlane wrote:
> OK, I've pushed changes to the markdown reader integrating
> the new format for textual citations.

I did not read the parser, but there are a few problems. I created
this example which should expose some of them.

Andrea

# Try Pandoc with Citeproc-hs

A working in-text citation @item2 [@item3], and a non working in-text
citation @item1 [@item3; @item1]

Another non working citation [@item1], and a working one, thank to the
locator [@item3, p. 3]. And then a non working citation [@item2;
@item3; @item1] followed by a non working one [@item3; @item2, p. 3].

See the note.[^1]

# References

[^1]:
     Something left behind: [@item2].


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]             ` <20101112232354.GH19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-13  2:56               ` John MacFarlane
       [not found]                 ` <20101113025645.GA25386-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-13  2:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 13 10 00:23 ]:
> On Fri, Nov 12, 2010 at 12:43:14AM -0800, John MacFarlane wrote:
> > I've also added a rudimentary test, similar to what Nathan had,
> > but with the new syntax.  However, the test currently fails
> > with a multibyte read error when it tries to access the locale
> > files in citeproc-hs.   Perhaps those need to be encoded as
> > UTF-8?
> 
> In RunTest you need to set the LANG variable. See the attached patch.
> I also updated the test to the latest citeproc-hs fixes.

Thanks.

> If you can run the test (after pulling the latest changes from the
> citeproc-hs repo), you'll see that the only left problem seems to be
> the parser not eating some characters (the initial '[' and final ']',
> only if the initial is followed by a '-' when in-text, always in
> footnotes).

I didn't agree with all of your changes to the test.  In particular,
it seems to me that

@item1 [-@item2, p. 30; see also @item3] says blah.

should turn into:

Doe (2005; 2006, 30; see also Doe and Roe 2007) says blah.

not (as you propose):

Doe (2005) Doe (2006), p. 30; see also Doe and Roe (2007) says blah.

Also, in n. 1 we should have:

   A citation without locators (Doe and Roe 2007).

not:

   A citation without locators Doe and Roe (2007).

And in n. 2 we should have:

    Some citations (see Doe 2006, chap. 3; Doe and Roe 2007; Doe
    2005).

not:

    Some citations see Doe (2006), chap. 3; Doe and Roe (2007); Doe
    (2005).

So far, I've only committed the changes I agree with.  But I'm open
to being convinced about the others...

Having a good standard copy of the CSLs would help, too.

> I fixed some of the other issues - I'm sure more are to be found. BTW,
> there was a nasty and hidden bug so that "ed." was being printed after
> the name of the editor, thus messing up everything. The
> chicago-author-date style seems pretty fine.
> 
> The patches I'm sending:
> 
>  - we cannot use Map, since it uses the Ord instance, while we relay
>    on the Eq instance to match only those citations that have the same
>    citationHash: this is visible only with footnote styles, when the
>    citationNoteNumber is being updated during the citation processing
>    (great if you can find a better fix);

Oh, shoot.  Where exactly does == get used on Citations in the citeproc
code?  If it's just in a couple places, we could just introduce a
special purpose equivalence relation (x `hashesMatch` y).  I'd like
to keep the Map if we can do it without too much trouble.

>  - I added support for multiple bibliographic database: you can also
>    mix their format (this is the cool part since this way you can use
>    json for entries you cannot represent with bibtext). I don't know
>    if '--biblio-format' is still consistent: any idea?

Where does --biblio-format get used?  I never needed to use it in
my tests -- it just seemed to recognize that the bibliography was
bibtex.  I'd prefer a solution that removed --biblio-format altogether.
If we still need to supply a biblio format, we could just determine
it from the file extension, perhaps.

I like the treatment of --biblio-file in this patch, but I don't
think this way of handling multiple biblio-file together with
multiple biblio-format is robust.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]             ` <20101113011105.GJ19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-13  3:38               ` John MacFarlane
       [not found]                 ` <20101113033806.GA27595-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-13  3:38 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 13 10 02:11 ]:
> On Fri, Nov 12, 2010 at 12:43:14AM -0800, John MacFarlane wrote:
> > OK, I've pushed changes to the markdown reader integrating
> > the new format for textual citations.
> 
> I did not read the parser, but there are a few problems. I created
> this example which should expose some of them.

Turns out there were many problems with the parser! I've fixed the ones I
found. The tests now pass, except for the issue about @nonexistent.
More tests would be good.

A couple of issues:

1. What to do about nonexistent keys?  Currently the parser checks
keys only when they're in initial in-text position (i.e. not within
brackets). That's why, in the tests, '@nonexistent' comes across as
'@nonexistent', while '[@nonexistent]' gives 'Anon. (error)' or something like
that.

Of course, it seems inconsistent to check keys when they don't occur
within []s, but not when they do.  The rationale was this:

* If I don't check keys outside of brackets, we may get things treated
as citations when they shouldn't be.  Maybe this isn't a great concern;
but then again, people do sometimes use @ at the beginning of a word.

* If I do check all the keys inside of brackets, then there's the
question what to do if some are found but not others.  Do we return
a Cite, or not?  What do we do about the non-found keys?

So I wasn't sure what to do here.

2. As we discussed before, I'd like to make prefix and locator have type
[Inline] rather than String.  Is this doable on the citeproc-hs side?
I think it would have many advantages (and, keeping them as raw strings
has some disadvantages).

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                 ` <20101113025645.GA25386-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-13  9:16                   ` Andrea Rossato
       [not found]                     ` <20101113091616.GK19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-13  9:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Fri, Nov 12, 2010 at 06:56:45PM -0800, John MacFarlane wrote:
> I didn't agree with all of your changes to the test.  In particular,
> it seems to me that
> 
> @item1 [-@item2, p. 30; see also @item3] says blah.
> 
> should turn into:
> 
> Doe (2005; 2006, 30; see also Doe and Roe 2007) says blah.


yea, you are right.

> 
> Also, in n. 1 we should have:
> 
>    A citation without locators (Doe and Roe 2007).

yes indeed.

> And in n. 2 we should have:
> 
>     Some citations (see Doe 2006, chap. 3; Doe and Roe 2007; Doe
>     2005).

yes, once again.

> So far, I've only committed the changes I agree with.  But I'm open
> to being convinced about the others...

sorry about the mess: it was very late, I was tired and the parser was
confusing me.

> Oh, shoot.  Where exactly does == get used on Citations in the citeproc
> code?  If it's just in a couple places, we could just introduce a
> special purpose equivalence relation (x `hashesMatch` y).  I'd like
> to keep the Map if we can do it without too much trouble.

Data.List.lookup uses it: it does the trick in Biblio.processCite.
Nowhere else.

So, if you can match a map key using only the hash, that would be fine
for me (to see if it works just check with a footnote style).

> >  - I added support for multiple bibliographic database: you can also
> >    mix their format (this is the cool part since this way you can use
> >    json for entries you cannot represent with bibtext). I don't know
> >    if '--biblio-format' is still consistent: any idea?
> 
> Where does --biblio-format get used?  I never needed to use it in
> my tests -- it just seemed to recognize that the bibliography was
> bibtex.  I'd prefer a solution that removed --biblio-format altogether.
> If we still need to supply a biblio format, we could just determine
> it from the file extension, perhaps.
> 
> I like the treatment of --biblio-file in this patch, but I don't
> think this way of handling multiple biblio-file together with
> multiple biblio-format is robust.

I agree, this is why I was asking. Citeproc uses the file extension to
determine how to read the bibliographic database. '--biblio-format'
would override the parsing of the file extension. If you want to get
rid of it I'd favor it: if the extension is not recognized we just
print a clear message.

Maybe '--biblio' should become '--bibliography', which is a bit more
descriptive and intuitive? Anyway, I'd leave this kind of stylistic
corrections to a native English speaker.

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: Re: textual citation
       [not found]                     ` <20101113091616.GK19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-13 11:40                       ` Andrea Rossato
       [not found]                         ` <20101113114018.GM19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-11-13 16:50                       ` John MacFarlane
  1 sibling, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-13 11:40 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

On Sat, Nov 13, 2010 at 10:16:16AM +0100, Andrea Rossato wrote:
> On Fri, Nov 12, 2010 at 06:56:45PM -0800, John MacFarlane wrote:
> > Oh, shoot.  Where exactly does == get used on Citations in the citeproc
> > code?  If it's just in a couple places, we could just introduce a
> > special purpose equivalence relation (x `hashesMatch` y).  I'd like
> > to keep the Map if we can do it without too much trouble.
> 
> Data.List.lookup uses it: it does the trick in Biblio.processCite.
> Nowhere else.
> 
> So, if you can match a map key using only the hash, that would be fine
> for me (to see if it works just check with a footnote style).

I'm quite aware of the fact that I should be thinking twice before
opening my mouth or, as the present case teaches, I should at least be
listening more carefully to what I'm saying...

I'm attaching the proper fix for the use of maps: a proper Ord
instance instead of am Eq one!

Andrea

ps: obviously the patch is against pandoc-types.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: 0001-define-Ord-instead-of-Eq-since-we-use-maps-from-Data.patch --]
[-- Type: text/plain, Size: 1211 bytes --]

From e9c3270b559b68a7001f30642a1d68a651d1e889 Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Sat, 13 Nov 2010 12:35:26 +0100
Subject: [PATCH] define Ord instead of Eq since we use maps from Data.Map

---
 Text/Pandoc/Definition.hs |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Text/Pandoc/Definition.hs b/Text/Pandoc/Definition.hs
index 3b8c7e6..1f6703c 100644
--- a/Text/Pandoc/Definition.hs
+++ b/Text/Pandoc/Definition.hs
@@ -136,11 +136,11 @@ data Citation = Citation { citationId      :: String
                          , citationNoteNum :: Int
                          , citationHash    :: Int
                          }
-                deriving (Show, Ord, Read, Typeable, Data)
+                deriving (Show, Eq, Read, Typeable, Data)
 
-instance Eq Citation where
-    (==) (Citation _ _ _ _ _ ha)
-         (Citation _ _ _ _ _ hb) = ha == hb
+instance Ord Citation where
+    compare (Citation _ _ _ _ _ ha)
+            (Citation _ _ _ _ _ hb) = compare ha hb
 
 data CitationMode = AuthorInText | SuppressAuthor | NormalCitation
                     deriving (Show, Eq, Ord, Read, Typeable, Data)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                 ` <20101113033806.GA27595-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-13 13:15                   ` Andrea Rossato
       [not found]                     ` <20101113131538.GO19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-11-16 23:17                   ` Nathan Gass
  1 sibling, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-13 13:15 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Fri, Nov 12, 2010 at 07:38:06PM -0800, John MacFarlane wrote:
> A couple of issues:
> 
> 1. What to do about nonexistent keys?  Currently the parser checks
> keys only when they're in initial in-text position (i.e. not within
> brackets). That's why, in the tests, '@nonexistent' comes across as
> '@nonexistent', while '[@nonexistent]' gives 'Anon. (error)' or something like
> that.
> 
> Of course, it seems inconsistent to check keys when they don't occur
> within []s, but not when they do.  The rationale was this:
> 
> * If I don't check keys outside of brackets, we may get things treated
> as citations when they shouldn't be.  Maybe this isn't a great concern;
> but then again, people do sometimes use @ at the beginning of a word.
> 
> * If I do check all the keys inside of brackets, then there's the
> question what to do if some are found but not others.  Do we return
> a Cite, or not?  What do we do about the non-found keys?
> 
> So I wasn't sure what to do here.

The citeproc side will check if the citation has a corresponding
bibliographic reference: if not an empty reference with the title set
to "citationId was not found!" is generated. No other variable is set,
this is why you get an "Anon." with the chicago-author-date (the term
is used when the 'author' variable is not set). Maybe some fake but
descriptive author's field should be set too.

If you run the example with the apa-x.csl style that comes with the
test-suite, you'll see that for:

    [@nonexistent]

you get:

    (“nonexistent not found!,” nd)

My opinion is that either we choose never to check or we always check:
either way you have a way of spotting errors. The present
implementation seems a bit confusing to me.

> 
> 2. As we discussed before, I'd like to make prefix and locator have type
> [Inline] rather than String.  Is this doable on the citeproc-hs side?
> I think it would have many advantages (and, keeping them as raw strings
> has some disadvantages).

This is fine for me, even though I think will we have to wrap the
[Inline] in some type before passing it to the processor:

data Affixes = PandocString [Inline]
             | PlainString  String

I'd prefer you to parse the suffix (which should be an [Inline]). I
think we should have a single locator between commas:

    [see @item, p. 1, suffix;]

'see' and 'suffix' would [Inline], 'p. 1' a string to be passed to
parseLocator. It shouldn't be difficult, is it?

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                     ` <20101113131538.GO19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-13 16:27                       ` John MacFarlane
       [not found]                         ` <20101113162702.GC1212-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-13 16:27 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 13 10 14:15 ]:
> On Fri, Nov 12, 2010 at 07:38:06PM -0800, John MacFarlane wrote:
> > A couple of issues:
> > 
> > 1. What to do about nonexistent keys?  Currently the parser checks
> > keys only when they're in initial in-text position (i.e. not within
> > brackets). That's why, in the tests, '@nonexistent' comes across as
> > '@nonexistent', while '[@nonexistent]' gives 'Anon. (error)' or something like
> > that.
> > 
> > Of course, it seems inconsistent to check keys when they don't occur
> > within []s, but not when they do.  The rationale was this:
> > 
> > * If I don't check keys outside of brackets, we may get things treated
> > as citations when they shouldn't be.  Maybe this isn't a great concern;
> > but then again, people do sometimes use @ at the beginning of a word.
> > 
> > * If I do check all the keys inside of brackets, then there's the
> > question what to do if some are found but not others.  Do we return
> > a Cite, or not?  What do we do about the non-found keys?
> > 
> > So I wasn't sure what to do here.
> 
> The citeproc side will check if the citation has a corresponding
> bibliographic reference: if not an empty reference with the title set
> to "citationId was not found!" is generated. No other variable is set,
> this is why you get an "Anon." with the chicago-author-date (the term
> is used when the 'author' variable is not set). Maybe some fake but
> descriptive author's field should be set too.
> 
> If you run the example with the apa-x.csl style that comes with the
> test-suite, you'll see that for:
> 
>     [@nonexistent]
> 
> you get:
> 
>     (“nonexistent not found!,” nd)
> 
> My opinion is that either we choose never to check or we always check:
> either way you have a way of spotting errors. The present
> implementation seems a bit confusing to me.
> 
> > 
> > 2. As we discussed before, I'd like to make prefix and locator have type
> > [Inline] rather than String.  Is this doable on the citeproc-hs side?
> > I think it would have many advantages (and, keeping them as raw strings
> > has some disadvantages).
> 
> This is fine for me, even though I think will we have to wrap the
> [Inline] in some type before passing it to the processor:
> 
> data Affixes = PandocString [Inline]
>              | PlainString  String
> 
> I'd prefer you to parse the suffix (which should be an [Inline]). I
> think we should have a single locator between commas:
> 
>     [see @item, p. 1, suffix;]
> 
> 'see' and 'suffix' would [Inline], 'p. 1' a string to be passed to
> parseLocator. It shouldn't be difficult, is it?

How would the parser distinguish the locator from the suffix?
Just splitting on commas won't work, because:

- sometimes there will be multiple commas in the locator
  [see @item, pp. 33-35, 38, 250n3] (can citeproc even handle
  this kind of case?)
- sometimes there will be a suffix but no locator
  [see @item, which was retracted in 2004]

I don't see how the parser can reliably separate locator from
suffix without knowing e.g. that 'pp.' means 'pages'. And I'd like to keep
this kind of locale-specific stuff in citeproc-hs if possible.

One possible solution would be to introduce a distinction in
punctuation:

[prefix @key: locator, suffix]
[prefix @key: locator only]
[prefix @key, suffix only]

But this leaves us with problems in the case where the locator
itself contains commas ('pp. 33, 35, 38').  I guess I need to know
how citeproc deals with such cases.

Another idea would be to have pandoc treat everything as 'suffix',
and not use the locator at all.  This would have the drawback that
style-specific ways of printing the locator (e.g. whether 'pp.'
is used for pages) would have no effect. But this is more or less
how it works in natbib and biblatex.

Any ideas?

John

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                     ` <20101113091616.GK19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-11-13 11:40                       ` Andrea Rossato
@ 2010-11-13 16:50                       ` John MacFarlane
  1 sibling, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-11-13 16:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 13 10 10:16 ]:
> > Oh, shoot.  Where exactly does == get used on Citations in the citeproc
> > code?  If it's just in a couple places, we could just introduce a
> > special purpose equivalence relation (x `hashesMatch` y).  I'd like
> > to keep the Map if we can do it without too much trouble.
> 
> Data.List.lookup uses it: it does the trick in Biblio.processCite.
> Nowhere else.
> 
> So, if you can match a map key using only the hash, that would be fine
> for me (to see if it works just check with a footnote style).

What I meant to ask was: where in the citeproc code do you rely on the fact
that you've defined Eq for Citation so that only the citationHash matters?

> > >  - I added support for multiple bibliographic database: you can also
> > >    mix their format (this is the cool part since this way you can use
> > >    json for entries you cannot represent with bibtext). I don't know
> > >    if '--biblio-format' is still consistent: any idea?
> > 
> > Where does --biblio-format get used?  I never needed to use it in
> > my tests -- it just seemed to recognize that the bibliography was
> > bibtex.  I'd prefer a solution that removed --biblio-format altogether.
> > If we still need to supply a biblio format, we could just determine
> > it from the file extension, perhaps.
> > 
> > I like the treatment of --biblio-file in this patch, but I don't
> > think this way of handling multiple biblio-file together with
> > multiple biblio-format is robust.
> 
> I agree, this is why I was asking. Citeproc uses the file extension to
> determine how to read the bibliographic database. '--biblio-format'
> would override the parsing of the file extension. If you want to get
> rid of it I'd favor it: if the extension is not recognized we just
> print a clear message.
> 
> Maybe '--biblio' should become '--bibliography', which is a bit more
> descriptive and intuitive? Anyway, I'd leave this kind of stylistic
> corrections to a native English speaker.

Yes, that's a good idea.  The option parser will accept an unambiguous
prefix, so you'll still be able to write '--biblio' or even '--bib'
if you want.  Also, let's get rid of '--biblio-format'.

I'm about to push some patches that clean all of this up, and make
error messages for missing bibliographies clearer.

Eventually it would be good to remove the biblioFormat parameter from
readBiblioFile in citeproc. (Currently I'm just passing it an empty string.)

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                         ` <20101113162702.GC1212-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-13 16:55                           ` Bruce
  2010-11-14 13:26                           ` Andrea Rossato
  1 sibling, 0 replies; 71+ messages in thread
From: Bruce @ 2010-11-13 16:55 UTC (permalink / raw)
  To: pandoc-discuss



On Nov 13, 11:27 am, John MacFarlane <fiddlosop...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> +++ Andrea Rossato [Nov 13 10 14:15 ]:
>
>
>
> > On Fri, Nov 12, 2010 at 07:38:06PM -0800, John MacFarlane wrote:
> > > A couple of issues:
>
> > > 1. What to do about nonexistent keys?  Currently the parser checks
> > > keys only when they're in initial in-text position (i.e. not within
> > > brackets). That's why, in the tests, '@nonexistent' comes across as
> > > '@nonexistent', while '[@nonexistent]' gives 'Anon. (error)' or something like
> > > that.
>
> > > Of course, it seems inconsistent to check keys when they don't occur
> > > within []s, but not when they do.  The rationale was this:
>
> > > * If I don't check keys outside of brackets, we may get things treated
> > > as citations when they shouldn't be.  Maybe this isn't a great concern;
> > > but then again, people do sometimes use @ at the beginning of a word.
>
> > > * If I do check all the keys inside of brackets, then there's the
> > > question what to do if some are found but not others.  Do we return
> > > a Cite, or not?  What do we do about the non-found keys?
>
> > > So I wasn't sure what to do here.
>
> > The citeproc side will check if the citation has a corresponding
> > bibliographic reference: if not an empty reference with the title set
> > to "citationId was not found!" is generated. No other variable is set,
> > this is why you get an "Anon." with the chicago-author-date (the term
> > is used when the 'author' variable is not set). Maybe some fake but
> > descriptive author's field should be set too.
>
> > If you run the example with the apa-x.csl style that comes with the
> > test-suite, you'll see that for:
>
> >     [@nonexistent]
>
> > you get:
>
> >     (“nonexistent not found!,” nd)
>
> > My opinion is that either we choose never to check or we always check:
> > either way you have a way of spotting errors. The present
> > implementation seems a bit confusing to me.
>
> > > 2. As we discussed before, I'd like to make prefix and locator have type
> > > [Inline] rather than String.  Is this doable on the citeproc-hs side?
> > > I think it would have many advantages (and, keeping them as raw strings
> > > has some disadvantages).
>
> > This is fine for me, even though I think will we have to wrap the
> > [Inline] in some type before passing it to the processor:
>
> > data Affixes = PandocString [Inline]
> >              | PlainString  String
>
> > I'd prefer you to parse the suffix (which should be an [Inline]). I
> > think we should have a single locator between commas:
>
> >     [see @item, p. 1, suffix;]
>
> > 'see' and 'suffix' would [Inline], 'p. 1' a string to be passed to
> > parseLocator. It shouldn't be difficult, is it?
>
> How would the parser distinguish the locator from the suffix?
> Just splitting on commas won't work, because:
>
> - sometimes there will be multiple commas in the locator
>   [see @item, pp. 33-35, 38, 250n3] (can citeproc even handle
>   this kind of case?)
> - sometimes there will be a suffix but no locator
>   [see @item, which was retracted in 2004]
>
> I don't see how the parser can reliably separate locator from
> suffix without knowing e.g. that 'pp.' means 'pages'. And I'd like to keep
> this kind of locale-specific stuff in citeproc-hs if possible.
>
> One possible solution would be to introduce a distinction in
> punctuation:
>
> [prefix @key: locator, suffix]
> [prefix @key: locator only]
> [prefix @key, suffix only]
>
> But this leaves us with problems in the case where the locator
> itself contains commas ('pp. 33, 35, 38').  I guess I need to know
> how citeproc deals with such cases.
>
> Another idea would be to have pandoc treat everything as 'suffix',
> and not use the locator at all.  This would have the drawback that
> style-specific ways of printing the locator (e.g. whether 'pp.'
> is used for pages) would have no effect. But this is more or less
> how it works in natbib and biblatex.

I have no comment on the details (haven't had the time to follow the
conversation in depth), but I do consider it VERY important to allow
citeproc to handle style/locale-specific formatting of locators. CSL
is designed to handle these details, and so to know, for example, that
page numbers locators should have a shortened prefix label for pages
in a particular style, and in turn to pull in the string appropriate
to the particular locale.

Bruce

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                         ` <20101113114018.GM19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-13 16:57                           ` John MacFarlane
  0 siblings, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-11-13 16:57 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Good, pushed. You can ignore my last question about this; this clears things
up for me.

John

+++ Andrea Rossato [Nov 13 10 12:40 ]:
> On Sat, Nov 13, 2010 at 10:16:16AM +0100, Andrea Rossato wrote:
> > On Fri, Nov 12, 2010 at 06:56:45PM -0800, John MacFarlane wrote:
> > > Oh, shoot.  Where exactly does == get used on Citations in the citeproc
> > > code?  If it's just in a couple places, we could just introduce a
> > > special purpose equivalence relation (x `hashesMatch` y).  I'd like
> > > to keep the Map if we can do it without too much trouble.
> > 
> > Data.List.lookup uses it: it does the trick in Biblio.processCite.
> > Nowhere else.
> > 
> > So, if you can match a map key using only the hash, that would be fine
> > for me (to see if it works just check with a footnote style).
> 
> I'm quite aware of the fact that I should be thinking twice before
> opening my mouth or, as the present case teaches, I should at least be
> listening more carefully to what I'm saying...
> 
> I'm attaching the proper fix for the use of maps: a proper Ord
> instance instead of am Eq one!
> 
> Andrea
> 
> ps: obviously the patch is against pandoc-types.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.
> 

> From e9c3270b559b68a7001f30642a1d68a651d1e889 Mon Sep 17 00:00:00 2001
> From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
> Date: Sat, 13 Nov 2010 12:35:26 +0100
> Subject: [PATCH] define Ord instead of Eq since we use maps from Data.Map
> 
> ---
>  Text/Pandoc/Definition.hs |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/Text/Pandoc/Definition.hs b/Text/Pandoc/Definition.hs
> index 3b8c7e6..1f6703c 100644
> --- a/Text/Pandoc/Definition.hs
> +++ b/Text/Pandoc/Definition.hs
> @@ -136,11 +136,11 @@ data Citation = Citation { citationId      :: String
>                           , citationNoteNum :: Int
>                           , citationHash    :: Int
>                           }
> -                deriving (Show, Ord, Read, Typeable, Data)
> +                deriving (Show, Eq, Read, Typeable, Data)
>  
> -instance Eq Citation where
> -    (==) (Citation _ _ _ _ _ ha)
> -         (Citation _ _ _ _ _ hb) = ha == hb
> +instance Ord Citation where
> +    compare (Citation _ _ _ _ _ ha)
> +            (Citation _ _ _ _ _ hb) = compare ha hb
>  
>  data CitationMode = AuthorInText | SuppressAuthor | NormalCitation
>                      deriving (Show, Eq, Ord, Read, Typeable, Data)
> -- 
> 1.7.1
> 


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                         ` <20101113162702.GC1212-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-13 16:55                           ` Bruce
@ 2010-11-14 13:26                           ` Andrea Rossato
       [not found]                             ` <20101114132646.GR19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-14 13:26 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

On Sat, Nov 13, 2010 at 08:27:02AM -0800, John MacFarlane wrote:
> One possible solution would be to introduce a distinction in
> punctuation:
> 
> [prefix @key: locator, suffix]
> [prefix @key: locator only]
> [prefix @key, suffix only]
> 
> But this leaves us with problems in the case where the locator
> itself contains commas ('pp. 33, 35, 38').  I guess I need to know
> how citeproc deals with such cases.

As Bruce said, we MUST implement a proper locator support, and I think
the solution you proposed is a nice one. The problem you are talking
about is not a difficult one: a locator is made of a locator label
(only one is supported in CSL) and a numeric value (a number or a list
of ranges). So after the label you'll only find 'words' (in the
Data.List.words sense), which have at least some digit: 64-68, 204n,
P124N21-P124N27, etc.

I'm attaching a patch to show you, with list processing
(Biblio.parseLocator') how the problem could be addressed (the patch
breaks the markdown parser -- I have little time to study it -- so
take it as an example).

My approach removes the ',' before the locator (which seems to me more
adherent to the spirit of your proposal). Should we leave it?

That is to say:

    [see @item1, with a suffix; see also @item3: chapter 3-5]

will become:

    (see Doe 2005 with a suffix; see also Doe and Roe 2007, chap. 3-5)


I'm writing the code to properly support numeric ranges in citeproc-hs
right now: presently citeproc-hs does nothing.

That meas that something like:

    A citation [see @item1: p. 34-35, 68n, for an example].

would become:

    A citation (see Doe 2005, 34-35,68n, for an example).

But page ranges will be formatted appropriately ("34-5, 68n," in style
with the option "page-range-format" set to "minimal").

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: 0018-example-on-how-to-support-a-citation-suffix-after-th.patch --]
[-- Type: text/plain, Size: 3261 bytes --]

From 4da7bf626042d4ba0317849487fdf3050ceb69c9 Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Sun, 14 Nov 2010 14:02:49 +0100
Subject: [PATCH 18/18] example on how to support a citation suffix after the locator

---
 src/Text/Pandoc/Biblio.hs           |   18 +++++++++++++++++-
 src/Text/Pandoc/Readers/Markdown.hs |    9 +++++----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/src/Text/Pandoc/Biblio.hs b/src/Text/Pandoc/Biblio.hs
index bca24d8..8a1f1c5 100644
--- a/src/Text/Pandoc/Biblio.hs
+++ b/src/Text/Pandoc/Biblio.hs
@@ -30,6 +30,7 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 module Text.Pandoc.Biblio ( processBiblio ) where
 
 import Control.Monad ( when )
+import Data.Char ( isDigit )
 import Data.List
 import Data.Unique
 import qualified Data.Map as M
@@ -152,13 +153,14 @@ setCitationNoteNum i = map $ \c -> c { citationNoteNum = i}
 
 toCslCite :: Citation -> CSL.Cite
 toCslCite (Citation i p l cm nn h)
-    = let (la,lo) = parseLocator l
+    = let (la,lo,su) = parseLocator' l
           citMode = case cm of
                       AuthorInText   -> (True, False)
                       SuppressAuthor -> (False,True )
                       NormalCitation -> (False,False)
       in   emptyCite { CSL.citeId         = i
                      , CSL.citePrefix     = p
+                     , CSL.citeSuffix     = su
                      , CSL.citeLabel      = la
                      , CSL.citeLocator    = lo
                      , CSL.citeNoteNumber = show nn
@@ -166,3 +168,17 @@ toCslCite (Citation i p l cm nn h)
                      , CSL.suppressAuthor = snd citMode
                      , CSL.citeHash       = h
                      }
+
+
+parseLocator' :: String -> (String, String, String)
+parseLocator' [] = ([], [], [])
+parseLocator' (x:xs)
+    | ':' <- x  = locator
+    | ',' <- x  = ([], [], xs)
+    | otherwise = ([], [], [])
+    where
+      locator = case words xs of
+                  (y:ys) -> let (a,b) = parseLocator $ unwords (y : takeWhile (or . map isDigit) ys)
+                                c     =                unwords $    dropWhile (or . map isDigit) ys
+                            in  (a,b,c)
+                  _      -> ([], [], [])
diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs
index 0d0e850..2cd097a 100644
--- a/src/Text/Pandoc/Readers/Markdown.hs
+++ b/src/Text/Pandoc/Readers/Markdown.hs
@@ -1354,16 +1354,17 @@ citeKey :: GenParser Char st String
 citeKey = try $ do
   char '@'
   first <- letter
-  rest <- many $ noneOf ",;]@ \t\n"
+  rest <- many $ noneOf ",:;]@ \t\n"
   return (first:rest)
 
 locator :: GenParser Char st String
 locator = try $ do
-  optional $ char ','
+  a <- char ':' <|> char ','
   spnl
   -- TODO should eventually be list of inlines
-  many1 $ (char '\\' >> oneOf "];\n") <|> noneOf "];\n" <|>
-             (char '\n' >> notFollowedBy blankline >> return ' ')
+  b <- many1 $ (char '\\' >> oneOf "];\n") <|> noneOf "];\n" <|>
+                (char '\n' >> notFollowedBy blankline >> return ' ')
+  return (a:b)
 
 prefix :: GenParser Char st String
 prefix = liftM removeLeadingTrailingSpace $

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                 ` <20101113033806.GA27595-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-13 13:15                   ` Andrea Rossato
@ 2010-11-16 23:17                   ` Nathan Gass
  1 sibling, 0 replies; 71+ messages in thread
From: Nathan Gass @ 2010-11-16 23:17 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 13.11.10 04:38, John MacFarlane wrote:
> +++ Andrea Rossato [Nov 13 10 02:11 ]:
>> On Fri, Nov 12, 2010 at 12:43:14AM -0800, John MacFarlane wrote:
>>> OK, I've pushed changes to the markdown reader integrating
>>> the new format for textual citations.
>>
>> I did not read the parser, but there are a few problems. I created
>> this example which should expose some of them.
>
> Turns out there were many problems with the parser! I've fixed the ones I
> found. The tests now pass, except for the issue about @nonexistent.
> More tests would be good.
>
> A couple of issues:
>
> 1. What to do about nonexistent keys?  Currently the parser checks
> keys only when they're in initial in-text position (i.e. not within
> brackets). That's why, in the tests, '@nonexistent' comes across as
> '@nonexistent', while '[@nonexistent]' gives 'Anon. (error)' or something like
> that.

As Andrea, I think this is confusing.

>
> Of course, it seems inconsistent to check keys when they don't occur
> within []s, but not when they do.  The rationale was this:
>
> * If I don't check keys outside of brackets, we may get things treated
> as citations when they shouldn't be.  Maybe this isn't a great concern;
> but then again, people do sometimes use @ at the beginning of a word.

With my latex writer we can handle some use cases where pandoc does not 
need to now the citation keys at all. So this variant opens up some 
simpler commands to convert markdown with citations to latex with 
citations. So I think I'm for this variant or at least for some option 
to disable the key check.

>
> * If I do check all the keys inside of brackets, then there's the
> question what to do if some are found but not others.  Do we return
> a Cite, or not?  What do we do about the non-found keys?

Don't think this case is very important, as it will mostly/always be
an error in the document anyway. Including not found keys verbatim in 
the prefixes/locators seems to me to be the most consistent solution 
with the way unknown keys are handled in text. If this is too hard, 
nobody will complain if the hole cite does not parse.

>
> So I wasn't sure what to do here.
>
> 2. As we discussed before, I'd like to make prefix and locator have type
> [Inline] rather than String.  Is this doable on the citeproc-hs side?
> I think it would have many advantages (and, keeping them as raw strings
> has some disadvantages).

I already argued for this on the wiki page. A simple [Inline] -> String 
function which simply drops additional markup is imho better than the 
current state (which includes the markup verbatim in the output).

Nathan

>
> John
>


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                             ` <20101114132646.GR19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-17  4:39                               ` John MacFarlane
       [not found]                                 ` <20101117043955.GA18136-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-17  4:39 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 14 10 14:26 ]:
> On Sat, Nov 13, 2010 at 08:27:02AM -0800, John MacFarlane wrote:
> > One possible solution would be to introduce a distinction in
> > punctuation:
> > 
> > [prefix @key: locator, suffix]
> > [prefix @key: locator only]
> > [prefix @key, suffix only]
> > 
> > But this leaves us with problems in the case where the locator
> > itself contains commas ('pp. 33, 35, 38').  I guess I need to know
> > how citeproc deals with such cases.
> 
> As Bruce said, we MUST implement a proper locator support, and I think
> the solution you proposed is a nice one. The problem you are talking
> about is not a difficult one: a locator is made of a locator label
> (only one is supported in CSL) and a numeric value (a number or a list
> of ranges). So after the label you'll only find 'words' (in the
> Data.List.words sense), which have at least some digit: 64-68, 204n,
> P124N21-P124N27, etc.
> 
> I'm attaching a patch to show you, with list processing
> (Biblio.parseLocator') how the problem could be addressed (the patch
> breaks the markdown parser -- I have little time to study it -- so
> take it as an example).
> 
> My approach removes the ',' before the locator (which seems to me more
> adherent to the spirit of your proposal). Should we leave it?
> 
> That is to say:
> 
>     [see @item1, with a suffix; see also @item3: chapter 3-5]
> 
> will become:
> 
>     (see Doe 2005 with a suffix; see also Doe and Roe 2007, chap. 3-5)
> 
> 
> I'm writing the code to properly support numeric ranges in citeproc-hs
> right now: presently citeproc-hs does nothing.
> 
> That meas that something like:
> 
>     A citation [see @item1: p. 34-35, 68n, for an example].
> 
> would become:
> 
>     A citation (see Doe 2005, 34-35,68n, for an example).
> 
> But page ranges will be formatted appropriately ("34-5, 68n," in style
> with the option "page-range-format" set to "minimal").

OK, this should work.

As a first step, I've made the following changes (now on github):

pandoc-types:  citationPrefix is now [Inline] rather than String.
citationSuffix has been added (also [Inline]).

pandoc:  updated for changes in the Citation type in pandoc-types.

The pandoc changes don't presuppose any changes in citeproc-hs -- pandoc still
passes raw strings to citeproc. citeproc should be changed to take [Inline]
for CSL.citePrefix and CSL.citeSuffix. I'm guessing this will allow a lot
of code simplification, since citeproc currently uses a custom pandoc-like
structure (OStr, OSpace, etc.).

With these changes, I can start working on the parser, and Andrea
can work on converting citeproc to use [Inline].

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                 ` <20101117043955.GA18136-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-17 21:49                                   ` Andrea Rossato
  2010-11-19 19:51                                   ` John MacFarlane
  2010-11-23  9:56                                   ` Nathan Gass
  2 siblings, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-11-17 21:49 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2049 bytes --]

On Tue, Nov 16, 2010 at 08:39:56PM -0800, John MacFarlane wrote:
> +++ Andrea Rossato [Nov 14 10 14:26 ]:
> pandoc-types:  citationPrefix is now [Inline] rather than String.
> citationSuffix has been added (also [Inline]).
> 
> pandoc:  updated for changes in the Citation type in pandoc-types.

I'm attaching a patch that removes stringify and directly passes
[Inline] to citeproc-hs.

There's also a small but important fix for the parser, just to
remember you to be careful not to pass to citeproc empty content in a
non empty list of [Inline]. citeproc will always assume that a
non-empty list of inlines will produce some output (this is relevant,
for instance, to suppress delimiters when suppressing the author's
name).
 
> The pandoc changes don't presuppose any changes in citeproc-hs -- pandoc still
> passes raw strings to citeproc. citeproc should be changed to take [Inline]
> for CSL.citePrefix and CSL.citeSuffix. I'm guessing this will allow a lot
> of code simplification, since citeproc currently uses a custom pandoc-like
> structure (OStr, OSpace, etc.).

Unfortunately this is not the case: the Output data structure is
needed to encode enough information during the evaluation of the style
so to be able to process the output to remove ambiguous cites, to add
year suffixes, and to collapse citations (actually the most difficult
part of the processor code).

> With these changes, I can start working on the parser, and Andrea
> can work on converting citeproc to use [Inline].

I've already pushed the needed changes to citeproc-hs: the [inline]
will be passed back to pandoc as citation affixes.

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: 0021-update-tp-latest-changes.patch --]
[-- Type: text/plain, Size: 2426 bytes --]

From 0e077fa54824a90bcfccde4eeb912350c9bbca10 Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Wed, 17 Nov 2010 22:37:07 +0100
Subject: [PATCH 21/21] update tp latest changes

---
 src/Text/Pandoc/Biblio.hs           |   14 ++------------
 src/Text/Pandoc/Readers/Markdown.hs |    2 +-
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/src/Text/Pandoc/Biblio.hs b/src/Text/Pandoc/Biblio.hs
index 60e0591..dde822d 100644
--- a/src/Text/Pandoc/Biblio.hs
+++ b/src/Text/Pandoc/Biblio.hs
@@ -150,16 +150,6 @@ setCiteNoteNum               _  _ = []
 setCitationNoteNum :: Int -> [Citation] -> [Citation]
 setCitationNoteNum i = map $ \c -> c { citationNoteNum = i}
 
--- a temporary function to tide us over until citeproc is
--- changed to use Inline lists for prefixes and suffixes...
-stringify :: [Inline] -> String
-stringify = queryWith go
-  where go :: Inline -> [Char]
-        go Space = " "
-        go (Str x) = x
-        go (Code x) = x
-        go _ = ""
-
 toCslCite :: Citation -> CSL.Cite
 toCslCite c
     = let (la,lo) = parseLocator $ citationLocator c
@@ -168,8 +158,8 @@ toCslCite c
                       SuppressAuthor -> (False,True )
                       NormalCitation -> (False,False)
       in   emptyCite { CSL.citeId         = citationId c
-                     , CSL.citePrefix     = stringify $ citationPrefix c
-                     , CSL.citeSuffix     = stringify $ citationSuffix c
+                     , CSL.citePrefix     = PandocText $ citationPrefix c
+                     , CSL.citeSuffix     = PandocText $ citationSuffix c
                      , CSL.citeLabel      = la
                      , CSL.citeLocator    = lo
                      , CSL.citeNoteNumber = show $ citationNoteNum c
diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs
index 8101d30..4975ee0 100644
--- a/src/Text/Pandoc/Readers/Markdown.hs
+++ b/src/Text/Pandoc/Readers/Markdown.hs
@@ -1392,7 +1392,7 @@ citation = try $ do
   key <- citeKey
   loc <- option "" locator
   return $ Citation{ citationId        = key
-                     , citationPrefix  = [Str pref]
+                     , citationPrefix  = if pref /= [] then [Str pref] else []
                      , citationSuffix  = []
                      , citationLocator = loc
                      , citationMode    = if suppress_auth

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                 ` <20101117043955.GA18136-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-17 21:49                                   ` Andrea Rossato
@ 2010-11-19 19:51                                   ` John MacFarlane
       [not found]                                     ` <20101119195134.GB30277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-23  9:56                                   ` Nathan Gass
  2 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-19 19:51 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I've pushed some improvements to the parser, and updated the
tests.  The latest syntax calls for a space between key and locator, not
a comma.

There are some tests that fail due to what I think
are problems in citeproc.

For example, citeproc doesn't seem to like a textual citation
with both locator and suffix (the suffix is dropped).

Also, citeproc puts an extra space before the suffix, and
omits needed space between multiple page ranges:  so you
get (Doe 2005, 33-35,37-39) instead of (Doe 2005, 33-35, 37-39).

There are also issues with things like:

[@item1 pp. 33-35, see also @item2 p. 35]

This is the kind of thing that authors might well write, but currently
pandoc/citeproc mangles it.  I'm not sure how to handle this.  We
could forbid citations in suffixes, but I'm not sure that will seem
intuitive to writers.

Best,
John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                     ` <20101119195134.GB30277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-20  2:53                                       ` Andrea Rossato
       [not found]                                         ` <20101120025350.GA13438-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-20  2:53 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Fri, Nov 19, 2010 at 11:51:34AM -0800, John MacFarlane wrote:
> I've pushed some improvements to the parser, and updated the
> tests.  The latest syntax calls for a space between key and locator, not
> a comma.
> 
> There are some tests that fail due to what I think
> are problems in citeproc.
> 
> For example, citeproc doesn't seem to like a textual citation
> with both locator and suffix (the suffix is dropped).

A regression indeed. Fixed.


> Also, citeproc puts an extra space before the suffix, and
> omits needed space between multiple page ranges:  so you
> get (Doe 2005, 33-35,37-39) instead of (Doe 2005, 33-35, 37-39).

Well, the native processor input should be a trimmed string: "a
suffix". Pandoc is sending ", a suffix". Which is fine, it is just
something to be agreed on as I mentioned when I proposed an example on
how to handle the suffix. I fixed that too.

The second one was something I'm aware of, as I said, and is very high
in my todo list.

> 
> There are also issues with things like:
> 
> [@item1 pp. 33-35, see also @item2 p. 35]
> 
> This is the kind of thing that authors might well write, but currently
> pandoc/citeproc mangles it.  I'm not sure how to handle this.  We
> could forbid citations in suffixes, but I'm not sure that will seem
> intuitive to writers.

The second one is not a citation in a suffix: it clearly is a citation
with a prefix and a wrong separator from the first one. You should fix
that. Am I wrong?

I'm getting ready for a release. Now citeproc-hs passes 383 tests out
of 527. The last major issue, sorting, has been fixed. There are small
details to be fixed, but nothing to block a 0.3 release.

Before I need a release of pandoc-types (I'd have a final proposal for
a new constructor: NamedInline String [Inline]. It could be used to
store information while processing the pandoc document. If you are
open to such an extension mechanism I could provide some use cases I'm
facing - but now it is bed time).

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                         ` <20101120025350.GA13438-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-21 18:23                                           ` John MacFarlane
       [not found]                                             ` <20101121182302.GK24768-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-21 18:23 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 20 10 03:53 ]:
> On Fri, Nov 19, 2010 at 11:51:34AM -0800, John MacFarlane wrote:
> > I've pushed some improvements to the parser, and updated the
> > tests.  The latest syntax calls for a space between key and locator, not
> > a comma.
> > 
> > There are some tests that fail due to what I think
> > are problems in citeproc.
> > 
> > For example, citeproc doesn't seem to like a textual citation
> > with both locator and suffix (the suffix is dropped).
> 
> A regression indeed. Fixed.
> 
> 
> > Also, citeproc puts an extra space before the suffix, and
> > omits needed space between multiple page ranges:  so you
> > get (Doe 2005, 33-35,37-39) instead of (Doe 2005, 33-35, 37-39).
> 
> Well, the native processor input should be a trimmed string: "a
> suffix". Pandoc is sending ", a suffix". Which is fine, it is just
> something to be agreed on as I mentioned when I proposed an example on
> how to handle the suffix. I fixed that too.
> 
> The second one was something I'm aware of, as I said, and is very high
> in my todo list.
> 
> > 
> > There are also issues with things like:
> > 
> > [@item1 pp. 33-35, see also @item2 p. 35]
> > 
> > This is the kind of thing that authors might well write, but currently
> > pandoc/citeproc mangles it.  I'm not sure how to handle this.  We
> > could forbid citations in suffixes, but I'm not sure that will seem
> > intuitive to writers.
> 
> The second one is not a citation in a suffix: it clearly is a citation
> with a prefix and a wrong separator from the first one. You should fix
> that. Am I wrong?

Hm, I guess you're right that it would be more natural to do it
that way.  My worry is that users of pandoc might naturally write
things the way I did above.  Currently the @item2 just disappears
from the output, which is unexpected.  I wonder if there's a better
way?

> I'm getting ready for a release. Now citeproc-hs passes 383 tests out
> of 527. The last major issue, sorting, has been fixed. There are small
> details to be fixed, but nothing to block a 0.3 release.
> 
> Before I need a release of pandoc-types (I'd have a final proposal for
> a new constructor: NamedInline String [Inline]. It could be used to
> store information while processing the pandoc document. If you are
> open to such an extension mechanism I could provide some use cases I'm
> facing - but now it is bed time).

Yes, please do provide them when you have a chance.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                             ` <20101121182302.GK24768-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-22 17:09                                               ` Andrea Rossato
  0 siblings, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-11-22 17:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Sun, Nov 21, 2010 at 10:23:02AM -0800, John MacFarlane wrote:
> +++ Andrea Rossato [Nov 20 10 03:53 ]:
> > On Fri, Nov 19, 2010 at 11:51:34AM -0800, John MacFarlane wrote:
> > > There are also issues with things like:
> > > 
> > > [@item1 pp. 33-35, see also @item2 p. 35]
> > > 
> > > This is the kind of thing that authors might well write, but currently
> > > pandoc/citeproc mangles it.  I'm not sure how to handle this.  We
> > > could forbid citations in suffixes, but I'm not sure that will seem
> > > intuitive to writers.
> > 
> > The second one is not a citation in a suffix: it clearly is a citation
> > with a prefix and a wrong separator from the first one. You should fix
> > that. Am I wrong?
> 
> Hm, I guess you're right that it would be more natural to do it
> that way.  My worry is that users of pandoc might naturally write
> things the way I did above.  Currently the @item2 just disappears
> from the output, which is unexpected.  I wonder if there's a better
> way?

There surely are. Still dropping @item2 will not pass unnoticed, and
this is a message that there is something wrong in the syntax.

We could drop the suffix for the sake of a better syntax. Then you
would be able to write something like

[see @item1, p. 12, for an example]

(which show what a suffix is meant to be).

There may be intermediate solutions (a suffix may be allowed only for
the last citation of a citation group, for instance). Or we stick with
the present syntax which must obey to the grammar you proposed. In
affixes Note, Cite, and possibly Image inlines should be silently
dropped.

> > I'm getting ready for a release. Now citeproc-hs passes 383 tests out
> > of 527. The last major issue, sorting, has been fixed. There are small
> > details to be fixed, but nothing to block a 0.3 release.
> > 
> > Before I need a release of pandoc-types (I'd have a final proposal for
> > a new constructor: NamedInline String [Inline]. It could be used to
> > store information while processing the pandoc document. If you are
> > open to such an extension mechanism I could provide some use cases I'm
> > facing - but now it is bed time).
> 
> Yes, please do provide them when you have a chance.

CSL may be thought as a recursive data-type. Elements may define
formatting which will affect contained element. Suppose a title is to
be displayed, for a given style, with a headline style (title
capitalization, like "This is Doe's Book"), and, we another style,
with a sentence capitalization ("This is Doe's book"). To achieve this
result, we need to assign the string "Doe" to a "nocase" class, which
upper level may not modify.

Another example: superscripts and subscripts. Pandoc does not have a
base line definition. What is neither sub or superscript is normal.
But, in the processor I need to set some strings to "baseline" so that
upper elements will not include the in a sub/superscipt inline.

I'm presently using the Link constructor for storing such information.
I'm perplexed, and I'm not able to decide whether the "NamedInline
String [Inline]" could be a nice one or it just is a super-ugly hack
(which actually makes me tend to the second possibility).

Using Link is not nice either, though.

Andrea

ps: I'll try to catch up with the rest of the discussion shortly (I
had a busy day today)


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                 ` <20101117043955.GA18136-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-17 21:49                                   ` Andrea Rossato
  2010-11-19 19:51                                   ` John MacFarlane
@ 2010-11-23  9:56                                   ` Nathan Gass
       [not found]                                     ` <4CEB8FB6.807-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
  2 siblings, 1 reply; 71+ messages in thread
From: Nathan Gass @ 2010-11-23  9:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 17.11.10 05:39, John MacFarlane wrote:
> +++ Andrea Rossato [Nov 14 10 14:26 ]:
>> On Sat, Nov 13, 2010 at 08:27:02AM -0800, John MacFarlane wrote:
>>> One possible solution would be to introduce a distinction in
>>> punctuation:
>>>
>>> [prefix @key: locator, suffix]
>>> [prefix @key: locator only]
>>> [prefix @key, suffix only]
>>>
>>> But this leaves us with problems in the case where the locator
>>> itself contains commas ('pp. 33, 35, 38').  I guess I need to know
>>> how citeproc deals with such cases.
>>
>> As Bruce said, we MUST implement a proper locator support, and I think
>> the solution you proposed is a nice one. The problem you are talking
>> about is not a difficult one: a locator is made of a locator label
>> (only one is supported in CSL) and a numeric value (a number or a list
>> of ranges). So after the label you'll only find 'words' (in the
>> Data.List.words sense), which have at least some digit: 64-68, 204n,
>> P124N21-P124N27, etc.
>>
>> I'm attaching a patch to show you, with list processing
>> (Biblio.parseLocator') how the problem could be addressed (the patch
>> breaks the markdown parser -- I have little time to study it -- so
>> take it as an example).
>>
>> My approach removes the ',' before the locator (which seems to me more
>> adherent to the spirit of your proposal). Should we leave it?
>>
>> That is to say:
>>
>>      [see @item1, with a suffix; see also @item3: chapter 3-5]
>>
>> will become:
>>
>>      (see Doe 2005 with a suffix; see also Doe and Roe 2007, chap. 3-5)
>>
>>
>> I'm writing the code to properly support numeric ranges in citeproc-hs
>> right now: presently citeproc-hs does nothing.
>>
>> That meas that something like:
>>
>>      A citation [see @item1: p. 34-35, 68n, for an example].
>>
>> would become:
>>
>>      A citation (see Doe 2005, 34-35,68n, for an example).
>>
>> But page ranges will be formatted appropriately ("34-5, 68n," in style
>> with the option "page-range-format" set to "minimal").
>
> OK, this should work.
>
> As a first step, I've made the following changes (now on github):
>
> pandoc-types:  citationPrefix is now [Inline] rather than String.
> citationSuffix has been added (also [Inline]).
>
> pandoc:  updated for changes in the Citation type in pandoc-types.

Having this distinction in the native format adds imho a lot of unneeded 
complexity. Every writer for citation formats which does not 
need/support this distinction has to reproduce the original string 
again, as it is written in the document. Moreover, every Reader has to 
be careful to parse a locator correctly. You then get a pletoria of 
combinations of writers and readers and a native format loosing some 
information, which is bound to buy us some annoying little unsolvable 
problems.

For instance, it is a lot harder this way to use pandoc to *correctly* 
convert natbib citations to biblatex citations in a latex document. The 
feature itself comes for free, but getting the locator/suffix 
distinction correct complicates making the conversion do the right thing 
a lot.

So why not drop the locator from the Citation type, and extract the 
locator from the suffix in Text.Pandoc.Biblio? This way any use case not 
involving citeproc does not need to think at all about the 
locator/suffix distinction. And we don't need an extra locator parser 
for every format which has its own citation support.

Nathan



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                     ` <4CEB8FB6.807-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
@ 2010-11-23 15:46                                       ` John MacFarlane
       [not found]                                         ` <20101123154639.GB12884-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-23 15:46 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Nathan Gass [Nov 23 10 10:56 ]:
> On 17.11.10 05:39, John MacFarlane wrote:
> >+++ Andrea Rossato [Nov 14 10 14:26 ]:
> >>On Sat, Nov 13, 2010 at 08:27:02AM -0800, John MacFarlane wrote:
> >>>One possible solution would be to introduce a distinction in
> >>>punctuation:
> >>>
> >>>[prefix @key: locator, suffix]
> >>>[prefix @key: locator only]
> >>>[prefix @key, suffix only]
> >>>
> >>>But this leaves us with problems in the case where the locator
> >>>itself contains commas ('pp. 33, 35, 38').  I guess I need to know
> >>>how citeproc deals with such cases.
> >>
> >>As Bruce said, we MUST implement a proper locator support, and I think
> >>the solution you proposed is a nice one. The problem you are talking
> >>about is not a difficult one: a locator is made of a locator label
> >>(only one is supported in CSL) and a numeric value (a number or a list
> >>of ranges). So after the label you'll only find 'words' (in the
> >>Data.List.words sense), which have at least some digit: 64-68, 204n,
> >>P124N21-P124N27, etc.
> >>
> >>I'm attaching a patch to show you, with list processing
> >>(Biblio.parseLocator') how the problem could be addressed (the patch
> >>breaks the markdown parser -- I have little time to study it -- so
> >>take it as an example).
> >>
> >>My approach removes the ',' before the locator (which seems to me more
> >>adherent to the spirit of your proposal). Should we leave it?
> >>
> >>That is to say:
> >>
> >>     [see @item1, with a suffix; see also @item3: chapter 3-5]
> >>
> >>will become:
> >>
> >>     (see Doe 2005 with a suffix; see also Doe and Roe 2007, chap. 3-5)
> >>
> >>
> >>I'm writing the code to properly support numeric ranges in citeproc-hs
> >>right now: presently citeproc-hs does nothing.
> >>
> >>That meas that something like:
> >>
> >>     A citation [see @item1: p. 34-35, 68n, for an example].
> >>
> >>would become:
> >>
> >>     A citation (see Doe 2005, 34-35,68n, for an example).
> >>
> >>But page ranges will be formatted appropriately ("34-5, 68n," in style
> >>with the option "page-range-format" set to "minimal").
> >
> >OK, this should work.
> >
> >As a first step, I've made the following changes (now on github):
> >
> >pandoc-types:  citationPrefix is now [Inline] rather than String.
> >citationSuffix has been added (also [Inline]).
> >
> >pandoc:  updated for changes in the Citation type in pandoc-types.
> 
> Having this distinction in the native format adds imho a lot of
> unneeded complexity. Every writer for citation formats which does
> not need/support this distinction has to reproduce the original
> string again, as it is written in the document. Moreover, every
> Reader has to be careful to parse a locator correctly. You then get
> a pletoria of combinations of writers and readers and a native
> format loosing some information, which is bound to buy us some
> annoying little unsolvable problems.
> 
> For instance, it is a lot harder this way to use pandoc to
> *correctly* convert natbib citations to biblatex citations in a
> latex document. The feature itself comes for free, but getting the
> locator/suffix distinction correct complicates making the conversion
> do the right thing a lot.
> 
> So why not drop the locator from the Citation type, and extract the
> locator from the suffix in Text.Pandoc.Biblio? This way any use case
> not involving citeproc does not need to think at all about the
> locator/suffix distinction. And we don't need an extra locator
> parser for every format which has its own citation support.

I suggested this originally, but Andrea thought there was a good reason to
separate locator and suffix while parsing the citation, rather than splitting
them in citeproc. I can't recall exactly what it was. Maybe he could weigh in
again?

On the other hand, it's not all that difficult to "reproduce the original
string again" -- every writer has a function to write an escaped string and to
write an inline list, and you just have to call the first on the locator and
the second on the suffix. So I don't see a big difficulty here.

Besides, you're just focused on natbib. Other citation formats *do* the
locator/suffix distinction. For example, biblatex does, if I recall correctly.
So the code used in citeproc to split into locator and suffix would have to be
reproduced in multiple places.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                         ` <20101123154639.GB12884-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-23 20:09                                           ` Andrea Rossato
  2010-11-24  1:20                                           ` Nathan Gass
  1 sibling, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-11-23 20:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Tue, Nov 23, 2010 at 07:46:39AM -0800, John MacFarlane wrote:
> +++ Nathan Gass [Nov 23 10 10:56 ]:
> > On 17.11.10 05:39, John MacFarlane wrote:
> > >+++ Andrea Rossato [Nov 14 10 14:26 ]:
> > >pandoc-types:  citationPrefix is now [Inline] rather than String.
> > >citationSuffix has been added (also [Inline]).
> > 
> > Having this distinction in the native format adds imho a lot of
> > unneeded complexity. Every writer for citation formats which does
> > not need/support this distinction has to reproduce the original
> > string again, as it is written in the document. Moreover, every
> > Reader has to be careful to parse a locator correctly. You then get
> > a pletoria of combinations of writers and readers and a native
> > format loosing some information, which is bound to buy us some
> > annoying little unsolvable problems.
> > 
> > For instance, it is a lot harder this way to use pandoc to
> > *correctly* convert natbib citations to biblatex citations in a
> > latex document. The feature itself comes for free, but getting the
> > locator/suffix distinction correct complicates making the conversion
> > do the right thing a lot.
> > 
> > So why not drop the locator from the Citation type, and extract the
> > locator from the suffix in Text.Pandoc.Biblio? This way any use case
> > not involving citeproc does not need to think at all about the
> > locator/suffix distinction. And we don't need an extra locator
> > parser for every format which has its own citation support.
> 
> I suggested this originally, but Andrea thought there was a good reason to
> separate locator and suffix while parsing the citation, rather than splitting
> them in citeproc. I can't recall exactly what it was. Maybe he could weigh in
> again?
> 
> On the other hand, it's not all that difficult to "reproduce the original
> string again" -- every writer has a function to write an escaped string and to
> write an inline list, and you just have to call the first on the locator and
> the second on the suffix. So I don't see a big difficulty here.
> 
> Besides, you're just focused on natbib. Other citation formats *do* the
> locator/suffix distinction. For example, biblatex does, if I recall correctly.
> So the code used in citeproc to split into locator and suffix would have to be
> reproduced in multiple places.

If you want to move the locator/suffix parsing into Text.Pandoc.Biblio
because that would benefit other components, I do not have any kind of
objection.

The only requirement I'm pressing for is to pass to citeproc a locator
and a separate suffix (as a [Inline]) so that I do not have to
re-write a parser for the suffix. That's it.

I'd leave the parsing and, more generally, the pandoc side entirely to
your discretion, as long as you can correctly instantiate a complete
CSL.Cite type (abstractly, via the emptyCite function).

andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                         ` <20101123154639.GB12884-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-23 20:09                                           ` Andrea Rossato
@ 2010-11-24  1:20                                           ` Nathan Gass
       [not found]                                             ` <4CEC6860.7020908-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Nathan Gass @ 2010-11-24  1:20 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 23.11.10 16:46, John MacFarlane wrote:
> +++ Nathan Gass [Nov 23 10 10:56 ]:
>> On 17.11.10 05:39, John MacFarlane wrote:
>>> +++ Andrea Rossato [Nov 14 10 14:26 ]:
>>>> On Sat, Nov 13, 2010 at 08:27:02AM -0800, John MacFarlane wrote:
>>>>> One possible solution would be to introduce a distinction in
>>>>> punctuation:
>>>>>
>>>>> [prefix @key: locator, suffix]
>>>>> [prefix @key: locator only]
>>>>> [prefix @key, suffix only]
>>>>>
>>>>> But this leaves us with problems in the case where the locator
>>>>> itself contains commas ('pp. 33, 35, 38').  I guess I need to know
>>>>> how citeproc deals with such cases.
>>>>
>>>> As Bruce said, we MUST implement a proper locator support, and I think
>>>> the solution you proposed is a nice one. The problem you are talking
>>>> about is not a difficult one: a locator is made of a locator label
>>>> (only one is supported in CSL) and a numeric value (a number or a list
>>>> of ranges). So after the label you'll only find 'words' (in the
>>>> Data.List.words sense), which have at least some digit: 64-68, 204n,
>>>> P124N21-P124N27, etc.
>>>>
>>>> I'm attaching a patch to show you, with list processing
>>>> (Biblio.parseLocator') how the problem could be addressed (the patch
>>>> breaks the markdown parser -- I have little time to study it -- so
>>>> take it as an example).
>>>>
>>>> My approach removes the ',' before the locator (which seems to me more
>>>> adherent to the spirit of your proposal). Should we leave it?
>>>>
>>>> That is to say:
>>>>
>>>>      [see @item1, with a suffix; see also @item3: chapter 3-5]
>>>>
>>>> will become:
>>>>
>>>>      (see Doe 2005 with a suffix; see also Doe and Roe 2007, chap. 3-5)
>>>>
>>>>
>>>> I'm writing the code to properly support numeric ranges in citeproc-hs
>>>> right now: presently citeproc-hs does nothing.
>>>>
>>>> That meas that something like:
>>>>
>>>>      A citation [see @item1: p. 34-35, 68n, for an example].
>>>>
>>>> would become:
>>>>
>>>>      A citation (see Doe 2005, 34-35,68n, for an example).
>>>>
>>>> But page ranges will be formatted appropriately ("34-5, 68n," in style
>>>> with the option "page-range-format" set to "minimal").
>>>
>>> OK, this should work.
>>>
>>> As a first step, I've made the following changes (now on github):
>>>
>>> pandoc-types:  citationPrefix is now [Inline] rather than String.
>>> citationSuffix has been added (also [Inline]).
>>>
>>> pandoc:  updated for changes in the Citation type in pandoc-types.
>>
>> Having this distinction in the native format adds imho a lot of
>> unneeded complexity. Every writer for citation formats which does
>> not need/support this distinction has to reproduce the original
>> string again, as it is written in the document. Moreover, every
>> Reader has to be careful to parse a locator correctly. You then get
>> a pletoria of combinations of writers and readers and a native
>> format loosing some information, which is bound to buy us some
>> annoying little unsolvable problems.
>>
>> For instance, it is a lot harder this way to use pandoc to
>> *correctly* convert natbib citations to biblatex citations in a
>> latex document. The feature itself comes for free, but getting the
>> locator/suffix distinction correct complicates making the conversion
>> do the right thing a lot.
>>
>> So why not drop the locator from the Citation type, and extract the
>> locator from the suffix in Text.Pandoc.Biblio? This way any use case
>> not involving citeproc does not need to think at all about the
>> locator/suffix distinction. And we don't need an extra locator
>> parser for every format which has its own citation support.
>
> I suggested this originally, but Andrea thought there was a good reason to
> separate locator and suffix while parsing the citation, rather than splitting
> them in citeproc. I can't recall exactly what it was. Maybe he could weigh in
> again?
>
> On the other hand, it's not all that difficult to "reproduce the original
> string again" -- every writer has a function to write an escaped string and to
> write an inline list, and you just have to call the first on the locator and
> the second on the suffix. So I don't see a big difficulty here.

This depends on the complexity of the locator parser, which imho could 
get a little bit more complex. Anyway, what I find the bigger problem is 
to repeat the locator parser for any reader, which is necessary for the 
reader und citeproc to work together. I currently don't see a good way 
to get a general locator parser which is adaptable for any reader, but 
if you can come up with one which can be used for markdown and latex, 
this certainly would be a solution too.

>
> Besides, you're just focused on natbib. Other citation formats *do* the
> locator/suffix distinction. For example, biblatex does, if I recall correctly.
> So the code used in citeproc to split into locator and suffix would have to be
> reproduced in multiple places.

But a function parseLocator [Inline] -> (Locator, [Inline]) can be used 
by any writer without modification.

I develop the biblatex writer parallel to the natbib one and did not 
find any locator feature in their documentation. The closest I could 
find were the various citation commands taking a volume argument 
(\volcite \pvolcite etc.). I was not planing to use them in my biblatex 
writer based on my understanding of the commands.

Anyway, how my code handles different citation constructs can be tested 
on my wiki, and I'm open for suggestions from anyone how any cases could 
be handled in a better way, especially for biblatex which I did not use 
until now.

Nathan


>
> John
>


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                             ` <4CEC6860.7020908-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
@ 2010-11-24  2:39                                               ` John MacFarlane
       [not found]                                                 ` <20101124023950.GA25133-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-24  2:39 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Nathan Gass [Nov 24 10 02:20 ]:
> On 23.11.10 16:46, John MacFarlane wrote:
> >+++ Nathan Gass [Nov 23 10 10:56 ]:
> >>On 17.11.10 05:39, John MacFarlane wrote:
> >>>+++ Andrea Rossato [Nov 14 10 14:26 ]:
> >>>>On Sat, Nov 13, 2010 at 08:27:02AM -0800, John MacFarlane wrote:
> >>>>>One possible solution would be to introduce a distinction in
> >>>>>punctuation:
> >>>>>
> >>>>>[prefix @key: locator, suffix]
> >>>>>[prefix @key: locator only]
> >>>>>[prefix @key, suffix only]
> >>>>>
> >>>>>But this leaves us with problems in the case where the locator
> >>>>>itself contains commas ('pp. 33, 35, 38').  I guess I need to know
> >>>>>how citeproc deals with such cases.
> >>>>
> >>>>As Bruce said, we MUST implement a proper locator support, and I think
> >>>>the solution you proposed is a nice one. The problem you are talking
> >>>>about is not a difficult one: a locator is made of a locator label
> >>>>(only one is supported in CSL) and a numeric value (a number or a list
> >>>>of ranges). So after the label you'll only find 'words' (in the
> >>>>Data.List.words sense), which have at least some digit: 64-68, 204n,
> >>>>P124N21-P124N27, etc.
> >>>>
> >>>>I'm attaching a patch to show you, with list processing
> >>>>(Biblio.parseLocator') how the problem could be addressed (the patch
> >>>>breaks the markdown parser -- I have little time to study it -- so
> >>>>take it as an example).
> >>>>
> >>>>My approach removes the ',' before the locator (which seems to me more
> >>>>adherent to the spirit of your proposal). Should we leave it?
> >>>>
> >>>>That is to say:
> >>>>
> >>>>     [see @item1, with a suffix; see also @item3: chapter 3-5]
> >>>>
> >>>>will become:
> >>>>
> >>>>     (see Doe 2005 with a suffix; see also Doe and Roe 2007, chap. 3-5)
> >>>>
> >>>>
> >>>>I'm writing the code to properly support numeric ranges in citeproc-hs
> >>>>right now: presently citeproc-hs does nothing.
> >>>>
> >>>>That meas that something like:
> >>>>
> >>>>     A citation [see @item1: p. 34-35, 68n, for an example].
> >>>>
> >>>>would become:
> >>>>
> >>>>     A citation (see Doe 2005, 34-35,68n, for an example).
> >>>>
> >>>>But page ranges will be formatted appropriately ("34-5, 68n," in style
> >>>>with the option "page-range-format" set to "minimal").
> >>>
> >>>OK, this should work.
> >>>
> >>>As a first step, I've made the following changes (now on github):
> >>>
> >>>pandoc-types:  citationPrefix is now [Inline] rather than String.
> >>>citationSuffix has been added (also [Inline]).
> >>>
> >>>pandoc:  updated for changes in the Citation type in pandoc-types.
> >>
> >>Having this distinction in the native format adds imho a lot of
> >>unneeded complexity. Every writer for citation formats which does
> >>not need/support this distinction has to reproduce the original
> >>string again, as it is written in the document. Moreover, every
> >>Reader has to be careful to parse a locator correctly. You then get
> >>a pletoria of combinations of writers and readers and a native
> >>format loosing some information, which is bound to buy us some
> >>annoying little unsolvable problems.
> >>
> >>For instance, it is a lot harder this way to use pandoc to
> >>*correctly* convert natbib citations to biblatex citations in a
> >>latex document. The feature itself comes for free, but getting the
> >>locator/suffix distinction correct complicates making the conversion
> >>do the right thing a lot.
> >>
> >>So why not drop the locator from the Citation type, and extract the
> >>locator from the suffix in Text.Pandoc.Biblio? This way any use case
> >>not involving citeproc does not need to think at all about the
> >>locator/suffix distinction. And we don't need an extra locator
> >>parser for every format which has its own citation support.
> >
> >I suggested this originally, but Andrea thought there was a good reason to
> >separate locator and suffix while parsing the citation, rather than splitting
> >them in citeproc. I can't recall exactly what it was. Maybe he could weigh in
> >again?
> >
> >On the other hand, it's not all that difficult to "reproduce the original
> >string again" -- every writer has a function to write an escaped string and to
> >write an inline list, and you just have to call the first on the locator and
> >the second on the suffix. So I don't see a big difficulty here.
> 
> This depends on the complexity of the locator parser, which imho
> could get a little bit more complex. Anyway, what I find the bigger
> problem is to repeat the locator parser for any reader, which is
> necessary for the reader und citeproc to work together. I currently
> don't see a good way to get a general locator parser which is
> adaptable for any reader, but if you can come up with one which can
> be used for markdown and latex, this certainly would be a solution
> too.

The current locator parser is very simple.  A locator is basically
just a word followed by any number of words containing digits.
(That's not quite precise, but you can look at the code.)
Not hard to reproduce that in other readers.

But I guess you're saying that you'd like the locator parsing to
be done inside citeproc, while Andrea wants the locator to be
split off before calling citeproc.  Doing things your way would
make it slightly easier to implement citations in the LaTeX reader.
citeproc has to do some nontrivial parsing of the locator anyway -- see
parseLocator in Text.CSL.Reference. Why not also split off the locator from
the suffix? But citeproc is Andrea's project, so you have to convince him.
I think his view is that the citeproc API should mirror the citeproc data
structures as much as possible.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                 ` <20101124023950.GA25133-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-24  9:39                                                   ` Nathan Gass
       [not found]                                                     ` <4CECDD5D.6010400-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Nathan Gass @ 2010-11-24  9:39 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 24.11.10 03:39, John MacFarlane wrote:
>
> The current locator parser is very simple.  A locator is basically
> just a word followed by any number of words containing digits.
> (That's not quite precise, but you can look at the code.)
> Not hard to reproduce that in other readers.

So [@item1 p. 10] works, but not [@item1, p. 10] [@item1, *p.* 10]? I'd 
opt to drop markup in the locator and be a bit flexible with some 
special chars directly after the key. This would make the locator parser 
more complex and moreover format specific. Additionally I no longer can 
translate the last one to \citet[\emph{p.} 10]{item1} in natbib, resp. 
to \autocite[\emph{p.} 10]{item1} in biblatex. I know that this is 
strange markdown to write, but handling it correctly in the latex writer 
comes for free which only a [Inline] suffix and would be very hard to 
implement at the moment.

>
> But I guess you're saying that you'd like the locator parsing to
> be done inside citeproc, while Andrea wants the locator to be
> split off before calling citeproc.  Doing things your way would
> make it slightly easier to implement citations in the LaTeX reader.
> citeproc has to do some nontrivial parsing of the locator anyway -- see
> parseLocator in Text.CSL.Reference. Why not also split off the locator from
> the suffix? But citeproc is Andrea's project, so you have to convince him.
> I think his view is that the citeproc API should mirror the citeproc data
> structures as much as possible.

No, I explicitly said I want to split off the Locator in 
Text.Pandoc.Biblio. I just want to reproduce the current parser as 
function seperateLocator :: [Inline] -> (String, [Inline]). I'd 
volunteer to do it, but I was given the understanding that maximally two 
people can work efficiently on the same code base...

Nathan

>
> John
>


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                     ` <4CECDD5D.6010400-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
@ 2010-11-24 16:09                                                       ` John MacFarlane
       [not found]                                                         ` <20101124160951.GD1590-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-24 20:32                                                       ` Andrea Rossato
  1 sibling, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-24 16:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Nathan Gass [Nov 24 10 10:39 ]:
> On 24.11.10 03:39, John MacFarlane wrote:
> >
> >The current locator parser is very simple.  A locator is basically
> >just a word followed by any number of words containing digits.
> >(That's not quite precise, but you can look at the code.)
> >Not hard to reproduce that in other readers.
> 
> So [@item1 p. 10] works, but not [@item1, p. 10] [@item1, *p.* 10]?

I'd like to be able to allow a comma before the locator.  But there
needs to be some way to identify a suffix.  Remember, a citation might
contain only a locator, only a suffix, both, or neither.  So currently
if pandoc sees a comma after the key, it knows we're on the suffix.
I'm not sure how to do it otherwise.  Of course, we could TRY to parse
a locator, using the heuristic above, but what if they want a suffix
like "since 1975"?

> I'd opt to drop markup in the locator and be a bit flexible with
> some special chars directly after the key. This would make the
> locator parser more complex and moreover format specific.
> Additionally I no longer can translate the last one to
> \citet[\emph{p.} 10]{item1} in natbib, resp. to \autocite[\emph{p.}
> 10]{item1} in biblatex. I know that this is strange markdown to
> write, but handling it correctly in the latex writer comes for free
> which only a [Inline] suffix and would be very hard to implement at
> the moment.
>
> >But I guess you're saying that you'd like the locator parsing to
> >be done inside citeproc, while Andrea wants the locator to be
> >split off before calling citeproc.  Doing things your way would
> >make it slightly easier to implement citations in the LaTeX reader.
> >citeproc has to do some nontrivial parsing of the locator anyway -- see
> >parseLocator in Text.CSL.Reference. Why not also split off the locator from
> >the suffix? But citeproc is Andrea's project, so you have to convince him.
> >I think his view is that the citeproc API should mirror the citeproc data
> >structures as much as possible.
> 
> No, I explicitly said I want to split off the Locator in
> Text.Pandoc.Biblio. I just want to reproduce the current parser as
> function seperateLocator :: [Inline] -> (String, [Inline]). I'd
> volunteer to do it, but I was given the understanding that maximally
> two people can work efficiently on the same code base...

I'm always willing to consider an atomic patch, that makes one
surveyable change to the current code base. Just email me the patch, though --
much easier for me than dealing with github.

We should probably agree about the issue about commas first, though.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                     ` <4CECDD5D.6010400-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
  2010-11-24 16:09                                                       ` John MacFarlane
@ 2010-11-24 20:32                                                       ` Andrea Rossato
  1 sibling, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-11-24 20:32 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Wed, Nov 24, 2010 at 10:39:41AM +0100, Nathan Gass wrote:
> On 24.11.10 03:39, John MacFarlane wrote:
> >
> >The current locator parser is very simple.  A locator is basically
> >just a word followed by any number of words containing digits.
> >(That's not quite precise, but you can look at the code.)
> >Not hard to reproduce that in other readers.
> 
> So [@item1 p. 10] works, but not [@item1, p. 10] [@item1, *p.* 10]?
> I'd opt to drop markup in the locator and be a bit flexible with
> some special chars directly after the key. This would make the
> locator parser more complex and moreover format specific.
> Additionally I no longer can translate the last one to
> \citet[\emph{p.} 10]{item1} in natbib, resp. to \autocite[\emph{p.}
> 10]{item1} in biblatex. I know that this is strange markdown to
> write, but handling it correctly in the latex writer comes for free
> which only a [Inline] suffix and would be very hard to implement at
> the moment.
> 
> >
> >But I guess you're saying that you'd like the locator parsing to
> >be done inside citeproc, while Andrea wants the locator to be
> >split off before calling citeproc.  Doing things your way would
> >make it slightly easier to implement citations in the LaTeX reader.
> >citeproc has to do some nontrivial parsing of the locator anyway -- see
> >parseLocator in Text.CSL.Reference. Why not also split off the locator from
> >the suffix? But citeproc is Andrea's project, so you have to convince him.
> >I think his view is that the citeproc API should mirror the citeproc data
> >structures as much as possible.
> 
> No, I explicitly said I want to split off the Locator in
> Text.Pandoc.Biblio. I just want to reproduce the current parser as
> function seperateLocator :: [Inline] -> (String, [Inline]). I'd
> volunteer to do it, but I was given the understanding that maximally
> two people can work efficiently on the same code base...

The locator (which actually consists of a locator label - "page",
"chapter", etc. - and an actual locator - generally a numeric value)
is a CSL variable and must be a string, not an [Inline]. The *only*
constraint I'm subject to is that I cannot get, from pandoc, a single
string to generate a locator and a suffix as an [Inline], because
citeproc cannot depend on the pandoc library to parse a string into an
[Inline] (and I'm not going to replicate it, obviously).

So, if the suffix is to be an [Inline] it must be parsed in pandoc.
Otherwise you can even pass to citeproc a raw citation (something like
"see @item1 p. 7, for some details", for instance), and citeproc will
parse it into the desired data-type - every markdown formatting would
*not* be parsed).

Nathan, I'm not working on Pandoc.Biblio right now, and I have no plan
to modify it, unless for implementing the new bibliographic options if
needed, so if you want the Markdown parser to generate a single
[Inline] with the part of the citation that comes after the item's id,
or a string or whatever, and have some proposal for that please go
ahead. Then John will have the final word.

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                         ` <20101124160951.GD1590-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-26 12:33                                                           ` Nathan Gass
       [not found]                                                             ` <4CEFA937.7040606-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Nathan Gass @ 2010-11-26 12:33 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1741 bytes --]

On 24.11.10 17:09, John MacFarlane wrote:
> I'm always willing to consider an atomic patch, that makes one
> surveyable change to the current code base. Just email me the patch, though --
> much easier for me than dealing with github.
>
> We should probably agree about the issue about commas first, though.

I have attached patches which reproduce the current behavior regarding 
commas. This patch reproduces the old code exactly for the given test files.

They would differ on locators which contain markup. AFAIK the old code 
did not parce the markup, and include the symbols verbatim in the 
locator. The new code does not recognize a locator with markup and 
instead leave it in the suffix. So the new code keeps the markup in 
locators, but looses the special locator handling citeproc provides.

I can argue for this solution as well as for simply dropping the markup 
(therefore keeping citeprocs locator handling but loosing markup in 
locators). If somebody really wants to include a verbatim * or similar 
in the locator, I think it is more consistent to require him to escape it.

Should I provide another patch with some tests for markup inside of 
citations?

Nathan

P.S. The 0001-*.pandoc.patch was necessary for me to run the citation 
tests, and has nothing to do with the other changes.


>
> John
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: 0001-Removed-locator-from-Citation.pandoc-types.patch --]
[-- Type: text/plain, Size: 902 bytes --]

From bbf44f2eeafaa80d8500bb756fbaeb0fca42bd66 Mon Sep 17 00:00:00 2001
From: Nathan Gass <gass-H+JoUxikxhPtRgLqZ5aouw@public.gmane.org>
Date: Fri, 26 Nov 2010 13:07:53 +0100
Subject: [PATCH] Removed locator from Citation.

---
 Text/Pandoc/Definition.hs |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/Text/Pandoc/Definition.hs b/Text/Pandoc/Definition.hs
index df841a9..722d011 100644
--- a/Text/Pandoc/Definition.hs
+++ b/Text/Pandoc/Definition.hs
@@ -133,7 +133,6 @@ data Inline
 data Citation = Citation { citationId      :: String
                          , citationPrefix  :: [Inline]
                          , citationSuffix  :: [Inline]
-                         , citationLocator :: String
                          , citationMode    :: CitationMode
                          , citationNoteNum :: Int
                          , citationHash    :: Int
-- 
1.7.3.2


[-- Attachment #3: 0001-Added-HOME-environment-variable-to-fix-citation-test.pandoc.patch --]
[-- Type: text/plain, Size: 979 bytes --]

From 0ce84b90166af03441fc8042d42a5463812bd263 Mon Sep 17 00:00:00 2001
From: Nathan Gass <gass-H+JoUxikxhPtRgLqZ5aouw@public.gmane.org>
Date: Thu, 25 Nov 2010 21:29:05 +0100
Subject: [PATCH 1/2] Added HOME environment variable to fix citation tests.

---
 tests/RunTests.hs |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tests/RunTests.hs b/tests/RunTests.hs
index 2fa89bf..d6259ff 100644
--- a/tests/RunTests.hs
+++ b/tests/RunTests.hs
@@ -168,7 +168,7 @@ runTest testname opts inp norm = do
   hFlush stdout
   -- Note: COLUMNS must be set for markdown table reader
   ph <- runProcess pandocPath (opts ++ [inpPath] ++ ["--data-dir", ".."]) Nothing
-        (Just [("LANG","en_US.UTF-8"),("COLUMNS", "80")]) Nothing (Just hOut) (Just stderr)
+        (Just [("LANG","en_US.UTF-8"),("COLUMNS", "80"),("HOME", "./")]) Nothing (Just hOut) (Just stderr)
   ec <- waitForProcess ph
   result  <- if ec == ExitSuccess
                 then do
-- 
1.7.3.2


[-- Attachment #4: 0002-Adapted-to-change-Removed-locator-from-Citation.-in-.pandoc.patch --]
[-- Type: text/plain, Size: 6436 bytes --]

From 467974ed55308d4a632c5ac129a4d36bc4123a80 Mon Sep 17 00:00:00 2001
From: Nathan Gass <gass-H+JoUxikxhPtRgLqZ5aouw@public.gmane.org>
Date: Fri, 26 Nov 2010 13:03:51 +0100
Subject: [PATCH 2/2] Adapted to change "Removed locator from Citation." in pandoc-types.

For citeproc usage, the locator gets matched from the suffix in Text.Pandoc.Biblio.
---
 src/Text/Pandoc/Biblio.hs           |   38 +++++++++++++++++++++++++++++++---
 src/Text/Pandoc/Readers/Markdown.hs |   38 +---------------------------------
 2 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/src/Text/Pandoc/Biblio.hs b/src/Text/Pandoc/Biblio.hs
index 921cf54..bf1624b 100644
--- a/src/Text/Pandoc/Biblio.hs
+++ b/src/Text/Pandoc/Biblio.hs
@@ -31,6 +31,7 @@ module Text.Pandoc.Biblio ( processBiblio ) where
 
 import Data.List
 import Data.Unique
+import Data.Char ( isDigit )
 import qualified Data.Map as M
 import Text.CSL hiding ( Cite(..), Citation(..) )
 import qualified Text.CSL as CSL ( Cite(..) )
@@ -97,8 +98,8 @@ getNoteCitations needNote
       in  queryWith getCitation . getCits
 
 setHash :: Citation -> IO Citation
-setHash (Citation i p s l cm nn _)
-    = hashUnique `fmap` newUnique >>= return . Citation i p s l cm nn
+setHash (Citation i p s cm nn _)
+    = hashUnique `fmap` newUnique >>= return . Citation i p s cm nn
 
 generateNotes :: [Inline] -> Pandoc -> Pandoc
 generateNotes needNote = processWith (mvCiteInNote needNote)
@@ -150,14 +151,15 @@ setCitationNoteNum i = map $ \c -> c { citationNoteNum = i}
 
 toCslCite :: Citation -> CSL.Cite
 toCslCite c
-    = let (la,lo) = parseLocator $ citationLocator c
+    = let (l, s)  = locatorWords $ citationSuffix c
+          (la,lo) = parseLocator $ unwords l
           citMode = case citationMode c of
                       AuthorInText   -> (True, False)
                       SuppressAuthor -> (False,True )
                       NormalCitation -> (False,False)
       in   emptyCite { CSL.citeId         = citationId c
                      , CSL.citePrefix     = PandocText $ citationPrefix c
-                     , CSL.citeSuffix     = PandocText $ citationSuffix c
+                     , CSL.citeSuffix     = PandocText $ s
                      , CSL.citeLabel      = la
                      , CSL.citeLocator    = lo
                      , CSL.citeNoteNumber = show $ citationNoteNum c
@@ -165,3 +167,31 @@ toCslCite c
                      , CSL.suppressAuthor = snd citMode
                      , CSL.citeHash       = citationHash c
                      }
+
+locatorWords :: [Inline] -> ([String], [Inline])
+locatorWords (Space : t) = locatorWords t
+locatorWords (Str "" : t) = locatorWords t
+locatorWords a@(Str (',' : s) : t)
+    = if ws /= [] then (ws, t') else ([], a)
+    where
+        (ws, t') = locatorWords (Str s:t)
+locatorWords i
+    = if any isDigit w then (w':ws, s'') else ([], i)
+    where
+        (w, s')   = locatorWord i
+        (ws, s'') = locatorWords s'
+        w'        = if ws == [] then w else w ++ ","
+
+locatorWord :: [Inline] -> (String, [Inline])
+locatorWord (Space : r) = (" " ++ ts, r')
+    where
+        (ts, r') = locatorWord r
+locatorWord (Str t : r)
+    | t' /= ""  = (w      , Str t' : r)
+    | otherwise = (t ++ ts, r'        )
+    where
+        w  = takeWhile (/= ',') t
+        t' = dropWhile (/= ',') t
+        (ts, r') = locatorWord r
+locatorWord i = ("", i)
+
diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs
index 2d3ad11..1b39007 100644
--- a/src/Text/Pandoc/Readers/Markdown.hs
+++ b/src/Text/Pandoc/Readers/Markdown.hs
@@ -34,7 +34,7 @@ module Text.Pandoc.Readers.Markdown (
 import Data.List ( transpose, isSuffixOf, sortBy, findIndex, intercalate )
 import qualified Data.Map as M
 import Data.Ord ( comparing )
-import Data.Char ( isAlphaNum, isDigit )
+import Data.Char ( isAlphaNum )
 import Data.Maybe
 import Text.Pandoc.Definition
 import Text.Pandoc.Shared
@@ -1319,23 +1319,12 @@ spnl = try $ do
   skipSpaces
   notFollowedBy (char '\n')
 
-blankSpace :: GenParser Char st ()
-blankSpace = try $ do
-  res <- many1 $ oneOf " \t\n"
-  guard $ length res > 0
-  guard $ length (filter (=='\n') res) <= 1
-
-noneOfUnlessEscaped :: [Char] -> GenParser Char st Char
-noneOfUnlessEscaped cs =
-  try (char '\\' >> oneOf cs) <|> noneOf cs
-
 textualCite :: GenParser Char ParserState [Citation]
 textualCite = try $ do
   (_, key) <- citeKey
   let first = Citation{ citationId      = key
                       , citationPrefix  = []
                       , citationSuffix  = []
-                      , citationLocator = ""
                       , citationMode    = AuthorInText
                       , citationNoteNum = 0
                       , citationHash    = 0
@@ -1349,12 +1338,11 @@ bareloc :: Citation -> GenParser Char ParserState [Citation]
 bareloc c = try $ do
   spnl
   char '['
-  loc <- locator
   suff <- suffix
   rest <- option [] $ try $ char ';' >> citeList
   spnl
   char ']'
-  return $ c{ citationLocator = loc, citationSuffix = suff } : rest
+  return $ c{ citationSuffix = suff } : rest
 
 normalCite :: GenParser Char ParserState [Citation]
 normalCite = try $ do
@@ -1376,26 +1364,6 @@ citeKey = try $ do
   guard $ key `elem` stateCitations st
   return (suppress_author, key)
 
-locator :: GenParser Char st String
-locator = try $ do
-  spnl
-  w  <- many1 (noneOf " \t\n;,]")
-  ws <- many (locatorWord <|> locatorComma)
-  return $ unwords $ w:ws
-
-locatorWord :: GenParser Char st String
-locatorWord = try $ do
-  spnl
-  wd <- many1 $ noneOfUnlessEscaped "];, \t\n"
-  guard $ any isDigit wd
-  return wd
-
-locatorComma :: GenParser Char st String
-locatorComma = try $ do
-  char ','
-  lookAhead $ locatorWord
-  return ","
-
 suffix :: GenParser Char ParserState [Inline]
 suffix = try $ do
   spnl
@@ -1412,12 +1380,10 @@ citation :: GenParser Char ParserState Citation
 citation = try $ do
   pref <- prefix
   (suppress_author, key) <- citeKey
-  loc <- option "" $ try $ blankSpace >> locator
   suff <- suffix
   return $ Citation{ citationId        = key
                      , citationPrefix  = pref
                      , citationSuffix  = suff
-                     , citationLocator = loc
                      , citationMode    = if suppress_author
                                             then SuppressAuthor
                                             else NormalCitation

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                             ` <4CEFA937.7040606-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
@ 2010-11-27 15:12                                                               ` John MacFarlane
       [not found]                                                                 ` <20101127151254.GA535-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-27 15:12 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Nathan Gass [Nov 26 10 13:33 ]:
> On 24.11.10 17:09, John MacFarlane wrote:
> >I'm always willing to consider an atomic patch, that makes one
> >surveyable change to the current code base. Just email me the patch, though --
> >much easier for me than dealing with github.
> >
> >We should probably agree about the issue about commas first, though.
> 
> I have attached patches which reproduce the current behavior
> regarding commas. This patch reproduces the old code exactly for the
> given test files.
> 
> They would differ on locators which contain markup. AFAIK the old
> code did not parce the markup, and include the symbols verbatim in
> the locator. The new code does not recognize a locator with markup
> and instead leave it in the suffix. So the new code keeps the markup
> in locators, but looses the special locator handling citeproc
> provides.
> 
> I can argue for this solution as well as for simply dropping the
> markup (therefore keeping citeprocs locator handling but loosing
> markup in locators). If somebody really wants to include a verbatim
> * or similar in the locator, I think it is more consistent to
> require him to escape it.
> 
> Should I provide another patch with some tests for markup inside of
> citations?

Thanks for the patch.  I applied your patches, then rewrote the locator
parsing using Parsec -- this is easier for me to understand and
maintain.  The current code also strips formatting from the locator
prefix, as you wanted.

Note:  anyone tracking the development version will need to update
pandoc-types as well.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                 ` <20101127151254.GA535-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-27 18:58                                                                   ` Andrea Rossato
       [not found]                                                                     ` <20101127185836.GD32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-27 18:58 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2728 bytes --]

On Sat, Nov 27, 2010 at 07:12:54AM -0800, John MacFarlane wrote:
> +++ Nathan Gass [Nov 26 10 13:33 ]:
> > On 24.11.10 17:09, John MacFarlane wrote:
> > >I'm always willing to consider an atomic patch, that makes one
> > >surveyable change to the current code base. Just email me the patch, though --
> > >much easier for me than dealing with github.
> > >
> > >We should probably agree about the issue about commas first, though.
> > 
> > I have attached patches which reproduce the current behavior
> > regarding commas. This patch reproduces the old code exactly for the
> > given test files.
> > 
> > They would differ on locators which contain markup. AFAIK the old
> > code did not parce the markup, and include the symbols verbatim in
> > the locator. The new code does not recognize a locator with markup
> > and instead leave it in the suffix. So the new code keeps the markup
> > in locators, but looses the special locator handling citeproc
> > provides.
> > 
> > I can argue for this solution as well as for simply dropping the
> > markup (therefore keeping citeprocs locator handling but loosing
> > markup in locators). If somebody really wants to include a verbatim
> > * or similar in the locator, I think it is more consistent to
> > require him to escape it.
> > 
> > Should I provide another patch with some tests for markup inside of
> > citations?
> 
> Thanks for the patch.  I applied your patches, then rewrote the locator
> parsing using Parsec -- this is easier for me to understand and
> maintain.  The current code also strips formatting from the locator
> prefix, as you wanted.
> 
> Note:  anyone tracking the development version will need to update
> pandoc-types as well.
> 

I'm attaching a patch to introduce a new parameter for the citeproc
function: I implemented to output filtering stuff, so if we want
multiple bibliographies and bibliographic entry filtering we only need
to implement the pandoc side.

I fixed the page range formatting. Still the tests will fail, since in
some of your recent changes I saw that the suffix lost the initial
comma (it used to be ", a suffix" while now it is " a suffix"). This
is a pandoc regression, so you'll probably want to fix the suffix
parser.

If a reference is not found now an error will be produced by the
processor.

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: 0045-add-a-new-parameter-to-citeproc.patch --]
[-- Type: text/plain, Size: 1146 bytes --]

From 07e4ff61b9948af10a487c73fabfacbbee1a69d8 Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Sat, 27 Nov 2010 19:39:29 +0100
Subject: [PATCH 45/45] add a new parameter to citeproc

---
 src/Text/Pandoc/Biblio.hs |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/Text/Pandoc/Biblio.hs b/src/Text/Pandoc/Biblio.hs
index 4a8cea4..2f27525 100644
--- a/src/Text/Pandoc/Biblio.hs
+++ b/src/Text/Pandoc/Biblio.hs
@@ -54,7 +54,7 @@ processBiblio cslfile r p
                                   needNt = cits \\ concat ncits
                               in (,) needNt $ getNoteCitations needNt p'
                          else (,) [] $ queryWith getCitation p'
-            result     = citeproc csl r (setNearNote csl $ map (map toCslCite) grps)
+            result     = citeproc procOpts csl r (setNearNote csl $ map (map toCslCite) grps)
             cits_map   = M.fromList $ zip grps (citations result)
             biblioList = map (renderPandoc' csl) (bibliography result)
             Pandoc m b = processWith (procInlines $ processCite csl cits_map) p'

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                     ` <20101127185836.GD32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-27 19:32                                                                       ` John MacFarlane
       [not found]                                                                         ` <20101127193232.GA3576-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-27 19:32 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 27 10 19:58 ]:
> On Sat, Nov 27, 2010 at 07:12:54AM -0800, John MacFarlane wrote:
> > +++ Nathan Gass [Nov 26 10 13:33 ]:
> > > On 24.11.10 17:09, John MacFarlane wrote:
> > > >I'm always willing to consider an atomic patch, that makes one
> > > >surveyable change to the current code base. Just email me the patch, though --
> > > >much easier for me than dealing with github.
> > > >
> > > >We should probably agree about the issue about commas first, though.
> > > 
> > > I have attached patches which reproduce the current behavior
> > > regarding commas. This patch reproduces the old code exactly for the
> > > given test files.
> > > 
> > > They would differ on locators which contain markup. AFAIK the old
> > > code did not parce the markup, and include the symbols verbatim in
> > > the locator. The new code does not recognize a locator with markup
> > > and instead leave it in the suffix. So the new code keeps the markup
> > > in locators, but looses the special locator handling citeproc
> > > provides.
> > > 
> > > I can argue for this solution as well as for simply dropping the
> > > markup (therefore keeping citeprocs locator handling but loosing
> > > markup in locators). If somebody really wants to include a verbatim
> > > * or similar in the locator, I think it is more consistent to
> > > require him to escape it.
> > > 
> > > Should I provide another patch with some tests for markup inside of
> > > citations?
> > 
> > Thanks for the patch.  I applied your patches, then rewrote the locator
> > parsing using Parsec -- this is easier for me to understand and
> > maintain.  The current code also strips formatting from the locator
> > prefix, as you wanted.
> > 
> > Note:  anyone tracking the development version will need to update
> > pandoc-types as well.
> > 
> 
> I'm attaching a patch to introduce a new parameter for the citeproc
> function: I implemented to output filtering stuff, so if we want
> multiple bibliographies and bibliographic entry filtering we only need
> to implement the pandoc side.

Could you document these a bit more?

I thought that the citeproc-js version we were looking at had
four variants, and allowed them to be combined in one filter.
(So, I was imagining that ProcOpts would hold a list of
BibOpts.)  Also, I wasn't sure why each BibOpt had two
association lists as parameters...  Finally, I gather that
procOpts is a kind of neutral default, but I didn't really
understand why "Select [] []" would, in effect, select
everything...

> I fixed the page range formatting. Still the tests will fail, since in
> some of your recent changes I saw that the suffix lost the initial
> comma (it used to be ", a suffix" while now it is " a suffix"). This
> is a pandoc regression, so you'll probably want to fix the suffix
> parser.

Thanks, I made a few changes, and I think everything is fixed now. All tests
pass.

JOhn


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                         ` <20101127193232.GA3576-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-27 20:09                                                                           ` John MacFarlane
       [not found]                                                                             ` <20101127200931.GA4421-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-27 20:10                                                                           ` Andrea Rossato
  1 sibling, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-27 20:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Nov 27 10 11:32 ]:
> +++ Andrea Rossato [Nov 27 10 19:58 ]:
> > On Sat, Nov 27, 2010 at 07:12:54AM -0800, John MacFarlane wrote:
> > > +++ Nathan Gass [Nov 26 10 13:33 ]:
> > > > On 24.11.10 17:09, John MacFarlane wrote:
> > > > >I'm always willing to consider an atomic patch, that makes one
> > > > >surveyable change to the current code base. Just email me the patch, though --
> > > > >much easier for me than dealing with github.
> > > > >
> > > > >We should probably agree about the issue about commas first, though.
> > > > 
> > > > I have attached patches which reproduce the current behavior
> > > > regarding commas. This patch reproduces the old code exactly for the
> > > > given test files.
> > > > 
> > > > They would differ on locators which contain markup. AFAIK the old
> > > > code did not parce the markup, and include the symbols verbatim in
> > > > the locator. The new code does not recognize a locator with markup
> > > > and instead leave it in the suffix. So the new code keeps the markup
> > > > in locators, but looses the special locator handling citeproc
> > > > provides.
> > > > 
> > > > I can argue for this solution as well as for simply dropping the
> > > > markup (therefore keeping citeprocs locator handling but loosing
> > > > markup in locators). If somebody really wants to include a verbatim
> > > > * or similar in the locator, I think it is more consistent to
> > > > require him to escape it.
> > > > 
> > > > Should I provide another patch with some tests for markup inside of
> > > > citations?
> > > 
> > > Thanks for the patch.  I applied your patches, then rewrote the locator
> > > parsing using Parsec -- this is easier for me to understand and
> > > maintain.  The current code also strips formatting from the locator
> > > prefix, as you wanted.
> > > 
> > > Note:  anyone tracking the development version will need to update
> > > pandoc-types as well.
> > > 
> > 
> > I'm attaching a patch to introduce a new parameter for the citeproc
> > function: I implemented to output filtering stuff, so if we want
> > multiple bibliographies and bibliographic entry filtering we only need
> > to implement the pandoc side.
> 
> Could you document these a bit more?
> 
> I thought that the citeproc-js version we were looking at had
> four variants, and allowed them to be combined in one filter.
> (So, I was imagining that ProcOpts would hold a list of
> BibOpts.)  Also, I wasn't sure why each BibOpt had two
> association lists as parameters...  Finally, I gather that
> procOpts is a kind of neutral default, but I didn't really
> understand why "Select [] []" would, in effect, select
> everything...
> 
> > I fixed the page range formatting. Still the tests will fail, since in
> > some of your recent changes I saw that the suffix lost the initial
> > comma (it used to be ", a suffix" while now it is " a suffix"). This
> > is a pandoc regression, so you'll probably want to fix the suffix
> > parser.
> 
> Thanks, I made a few changes, and I think everything is fixed now. All tests
> pass.

PS. It would be good if you could look over the test norms --

tests/markdown-citations.chicago-author-date.html
tests/markdown-citations.ieee.html
tests/markdown-citations.mhra.html

You can just view them in a browser (setting encoding to UTF-8).
I'm pretty sure the first is right, but I'm not so sure about the rest,
because I don't know those styles.  I have some doubts.  In ieee:

    Reference [1]  says blah. Reference [1]  says blah. Reference [1]  says
    blah. Reference [1]  says blah.

It seems that there are too many spaces after the [1]. In addition,
the locator & suffix are not put anywhere -- is that how it's supposed
to work in this style?

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                         ` <20101127193232.GA3576-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-27 20:09                                                                           ` John MacFarlane
@ 2010-11-27 20:10                                                                           ` Andrea Rossato
       [not found]                                                                             ` <20101127201014.GF32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-27 20:10 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Sat, Nov 27, 2010 at 11:32:32AM -0800, John MacFarlane wrote:
> +++ Andrea Rossato [Nov 27 10 19:58 ]:
> > 
> > I'm attaching a patch to introduce a new parameter for the citeproc
> > function: I implemented to output filtering stuff, so if we want
> > multiple bibliographies and bibliographic entry filtering we only need
> > to implement the pandoc side.
> 
> Could you document these a bit more?
> 
> I thought that the citeproc-js version we were looking at had
> four variants, and allowed them to be combined in one filter.
> (So, I was imagining that ProcOpts would hold a list of
> BibOpts.)  Also, I wasn't sure why each BibOpt had two
> association lists as parameters...  Finally, I gather that
> procOpts is a kind of neutral default, but I didn't really
> understand why "Select [] []" would, in effect, select
> everything...
> 

I tried to strictly follow the citeproc-js API:
http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#selective-output

BibOpt is an "object that may contain one of the objects 'select',
'include' or 'exclude', and optionally an additional quash object".

As you know:

   and [] = True

"select" must match all fields: this is why "Select [] []" matches
everything.

The second (optional) list of every BibOpt constructs is the "quash"
optional object (a list of (field,value) tuples).

I know this is not what I anticipated a few emails ago: I must confess
I hadn't read the linked stuff very carefully before posting... the
list you are referring to would permit a mixture of 'select',
'include', and 'exclude' commands, which would lead to unpredictable
results (at least this was my perception when trying to implement it
and before going back to Frank's documentation).

That should represent what you proposed:

<references src="mybib.bib">
  <select type="book" categories="1990s" />
  <quash author="Smith" />
</references>

But:

<references src="mybib.bib">
  <select type="book" categories="1990s" />
  <exclude type="article-journal" />
  <quash author="Smith" />
</references>

Would not be permitted )and only use the 'select' element would bee
used in my implementation).

Does it make sense?

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: Re: textual citation
       [not found]                                                                             ` <20101127201014.GF32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-27 20:18                                                                               ` Andrea Rossato
  2010-11-28  2:22                                                                               ` John MacFarlane
  1 sibling, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-11-27 20:18 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Sat, Nov 27, 2010 at 09:10:14PM +0100, Andrea Rossato wrote:
> Would not be permitted )and only use the 'select' element would bee
> used in my implementation).

"would not be permitted (and only the 'select' element would be used in
my implementation)".


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                             ` <20101127200931.GA4421-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-27 20:39                                                                               ` Andrea Rossato
       [not found]                                                                                 ` <20101127203907.GH32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-11-28 13:03                                                                               ` Andrea Rossato
  1 sibling, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-27 20:39 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Sat, Nov 27, 2010 at 12:09:31PM -0800, John MacFarlane wrote:
> +++ John MacFarlane [Nov 27 10 11:32 ]:
> 
>     Reference [1]  says blah. Reference [1]  says blah. Reference [1]  says
>     blah. Reference [1]  says blah.
> 
> It seems that there are too many spaces after the [1]. In addition,
> the locator & suffix are not put anywhere -- is that how it's supposed
> to work in this style?

I'll take a look since the extra spaces seem indeed a citeproc bug.

The locator, like every other rendered element, is rendered only when
the style includes it. In ieee.csl there are only citation numbers,
which must even be collapsed. It is not possible to retain a suffix or
a prefix when collapsing references.

So, I think this is the way it is supposed to work that kind of style.
This is the way I coded it. But I must admit I'm not so sure and I
need to investigate a bit further.

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                             ` <20101127201014.GF32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-11-27 20:18                                                                               ` Andrea Rossato
@ 2010-11-28  2:22                                                                               ` John MacFarlane
       [not found]                                                                                 ` <20101128022210.GA6819-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-28  2:22 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 27 10 21:10 ]:
> On Sat, Nov 27, 2010 at 11:32:32AM -0800, John MacFarlane wrote:
> > +++ Andrea Rossato [Nov 27 10 19:58 ]:
> > > 
> > > I'm attaching a patch to introduce a new parameter for the citeproc
> > > function: I implemented to output filtering stuff, so if we want
> > > multiple bibliographies and bibliographic entry filtering we only need
> > > to implement the pandoc side.
> > 
> > Could you document these a bit more?
> > 
> > I thought that the citeproc-js version we were looking at had
> > four variants, and allowed them to be combined in one filter.
> > (So, I was imagining that ProcOpts would hold a list of
> > BibOpts.)  Also, I wasn't sure why each BibOpt had two
> > association lists as parameters...  Finally, I gather that
> > procOpts is a kind of neutral default, but I didn't really
> > understand why "Select [] []" would, in effect, select
> > everything...
> > 
> 
> I tried to strictly follow the citeproc-js API:
> http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#selective-output
> 
> BibOpt is an "object that may contain one of the objects 'select',
> 'include' or 'exclude', and optionally an additional quash object".
> 
> As you know:
> 
>    and [] = True
> 
> "select" must match all fields: this is why "Select [] []" matches
> everything.
> 
> The second (optional) list of every BibOpt constructs is the "quash"
> optional object (a list of (field,value) tuples).
> 
> I know this is not what I anticipated a few emails ago: I must confess
> I hadn't read the linked stuff very carefully before posting... the
> list you are referring to would permit a mixture of 'select',
> 'include', and 'exclude' commands, which would lead to unpredictable
> results (at least this was my perception when trying to implement it
> and before going back to Frank's documentation).
> 
> That should represent what you proposed:
> 
> <references src="mybib.bib">
>   <select type="book" categories="1990s" />
>   <quash author="Smith" />
> </references>
> 
> But:
> 
> <references src="mybib.bib">
>   <select type="book" categories="1990s" />
>   <exclude type="article-journal" />
>   <quash author="Smith" />
> </references>
> 
> Would not be permitted )and only use the 'select' element would bee
> used in my implementation).
> 
> Does it make sense?

That helps, but I'm still not certain about some things.

I had thought we could use this mechanism to get the functionality
of natbib's \nocite command, which includes an entry in the bibliography
even though it's not cited in the text.  It looks to me, on cursory
inspection, that your options can only prune down the list of actually
cited works, not add uncited works. Am I mistaken?

Also, although I can see the value of imitating citeproc-js on this,
the API seems unnatural to me.  Basically you've given AND,
OR, and NOR with select, include, and exclude.  THat's not enough
power, so quash is added.  But wouldn't it be better to implement a
little query language (as I did in yst)?  Restrictions could be
combined freely using NOT, AND, and OR.  That would be both more
flexible and more perspicuous, I think.

To sum up, my questions are:

1.  Can these commands be used to include works from the bibliography
file that are not actually cited in the text, or do they just pare down
the list of works cited?

2.  Would it be worth implementing a more general and perspicuous
query language, or is it important to imitate citeproc-js?

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: Re: textual citation
       [not found]                                                                                 ` <20101127203907.GH32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-28 12:57                                                                                   ` Andrea Rossato
  0 siblings, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-11-28 12:57 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]

On Sat, Nov 27, 2010 at 09:39:07PM +0100, Andrea Rossato wrote:
> On Sat, Nov 27, 2010 at 12:09:31PM -0800, John MacFarlane wrote:
> > +++ John MacFarlane [Nov 27 10 11:32 ]:
> > 
> >     Reference [1]  says blah. Reference [1]  says blah. Reference [1]  says
> >     blah. Reference [1]  says blah.
> > 
> > It seems that there are too many spaces after the [1]. In addition,
> > the locator & suffix are not put anywhere -- is that how it's supposed
> > to work in this style?
> 
> I'll take a look since the extra spaces seem indeed a citeproc bug.

That was not a citeproc bug. That's a pandoc bug! Introduced by me, I
must admit.

Here's the fix.

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: 0049-check-if-we-actually-need-a-space.patch --]
[-- Type: text/plain, Size: 1182 bytes --]

From 424038ad54345f919011ac423fb51aa016cf65eb Mon Sep 17 00:00:00 2001
From: Andrea Rossato <andrea.rossato-/Q1r7N5in3P/wltNWqQaag@public.gmane.org>
Date: Sun, 28 Nov 2010 13:53:15 +0100
Subject: [PATCH 49/49] check if we actually need a space

---
 src/Text/Pandoc/Biblio.hs |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/src/Text/Pandoc/Biblio.hs b/src/Text/Pandoc/Biblio.hs
index efaafd7..ceecb57 100644
--- a/src/Text/Pandoc/Biblio.hs
+++ b/src/Text/Pandoc/Biblio.hs
@@ -71,7 +71,10 @@ processCite s cs (i:is)
       addNt t x = if null x then [] else [Cite t $ renderPandoc s x]
       process t = case M.lookup t cs of
                     Just  x -> if isTextualCitation t && x /= []
-                               then renderPandoc s [head x] ++ [Space] ++ addNt t (tail x)
+                               then renderPandoc s [head x] ++
+                                    if tail x /= []
+                                    then [Space] ++ addNt t (tail x)
+                                    else []
                                else [Cite t $ renderPandoc s x]
                     Nothing -> [Str ("Error processing " ++ show t)]
 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                             ` <20101127200931.GA4421-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-27 20:39                                                                               ` Andrea Rossato
@ 2010-11-28 13:03                                                                               ` Andrea Rossato
       [not found]                                                                                 ` <20101128130345.GK32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-28 13:03 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Sat, Nov 27, 2010 at 12:09:31PM -0800, John MacFarlane wrote:
> PS. It would be good if you could look over the test norms --
> 
> tests/markdown-citations.chicago-author-date.html
> tests/markdown-citations.ieee.html
> tests/markdown-citations.mhra.html
> 
> You can just view them in a browser (setting encoding to UTF-8).
> I'm pretty sure the first is right, but I'm not so sure about the rest,
> because I don't know those styles.  I have some doubts.  In ieee:
> 
>     Reference [1]  says blah. Reference [1]  says blah. Reference [1]  says
>     blah. Reference [1]  says blah.
> 
> It seems that there are too many spaces after the [1] [...]

PS. I broke your tests: now numeric range joins are rendered with an
en-dash instead of a hyphen.

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                 ` <20101128130345.GK32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-11-28 16:16                                                                                   ` John MacFarlane
       [not found]                                                                                     ` <20101128161612.GB29510-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-28 18:10                                                                                   ` John MacFarlane
  1 sibling, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-28 16:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 28 10 14:03 ]:
> On Sat, Nov 27, 2010 at 12:09:31PM -0800, John MacFarlane wrote:
> > PS. It would be good if you could look over the test norms --
> > 
> > tests/markdown-citations.chicago-author-date.html
> > tests/markdown-citations.ieee.html
> > tests/markdown-citations.mhra.html
> > 
> > You can just view them in a browser (setting encoding to UTF-8).
> > I'm pretty sure the first is right, but I'm not so sure about the rest,
> > because I don't know those styles.  I have some doubts.  In ieee:
> > 
> >     Reference [1]  says blah. Reference [1]  says blah. Reference [1]  says
> >     blah. Reference [1]  says blah.
> > 
> > It seems that there are too many spaces after the [1] [...]
> 
> PS. I broke your tests: now numeric range joins are rendered with an
> en-dash instead of a hyphen.

I applied your fix and updated the tests to use en-dashes.
Even so, there are quite a few failing tests, which you can see for
yourself.

I also removed a few spurious double spaces from the tests, which
causes even more failures.  Here's the diff, so you can see where
the extra spaces were sneaking in...

John

diff --git a/tests/markdown-citations.chicago-author-date.html b/tests/markdown-
index 8f01bc9..0543449 100644
--- a/tests/markdown-citations.chicago-author-date.html
+++ b/tests/markdown-citations.chicago-author-date.html
@@ -25,7 +25,7 @@
 ><h1 id="references"
 >References</h1
 ><p
->Doe,  John. 2005. <em
+>Doe, John. 2005. <em
   >First Book</em
   >. Cambridge: Cambridge University Press.</p
 ><p
@@ -33,7 +33,7 @@
   >Journal of Generic Studies</em
   > 6: 33–34.</p
 ><p
->Doe,  John, and Jenny Roe. 2007. Why Water Is Wet. In <em
+>Doe, John, and Jenny Roe. 2007. Why Water Is Wet. In <em
   >Third Book</em
   >, ed. Sam Smith. Oxford: Oxford University Press.</p
 ><div class="footnotes"
@@ -53,4 +53,4 @@
       ></li
     ></ol
   ></div

diff --git a/tests/markdown-citations.ieee.html b/tests/markdown-citations.ieee.
index 7c7d811..f076a46 100644
--- a/tests/markdown-citations.ieee.html
+++ b/tests/markdown-citations.ieee.html
@@ -27,15 +27,15 @@
 ><p
 >[1] J. Doe, <em
   >First Book</em
-  >,  Cambridge: Cambridge University Press, 2005.</p
+  >, Cambridge: Cambridge University Press, 2005.</p
 ><p
 >[2] J. Doe, “Article”, <em
   >Journal of Generic Studies</em
-  >,  vol. 6, 2006, pp. 33–34.</p
+  >, vol. 6, 2006, pp. 33–34.</p
 ><p
 >[3] J. Doe and J. Roe, “Why Water Is Wet”, <em
   >Third Book</em
-  >, Smith,  S., Ed.,  Oxford: Oxford University Press, 2007.</p
+  >, Smith, S., Ed., Oxford: Oxford University Press, 2007.</p
 ><div class="footnotes"
 ><hr
    /><ol
@@ -53,4 +53,4 @@
       ></li
     ></ol
   ></div

diff --git a/tests/markdown-citations.mhra.html b/tests/markdown-citations.mhra.
index 29c69d1..2964ed8 100644
--- a/tests/markdown-citations.mhra.html
+++ b/tests/markdown-citations.mhra.html
@@ -57,7 +57,7 @@
 ><h1 id="references"
 >References</h1
 ><p
->Doe,  John, ‘Article’, <em
+>Doe, John, ‘Article’, <em
   >Journal of Generic Studies</em
   >, 6 (2006), 33–34.</p
 ><p
@@ -65,7 +65,7 @@
   >First Book</em
   > (Cambridge: Cambridge University Press, 2005).</p
 ><p
->Doe,  John, and Jenny Roe, ‘Why Water Is Wet’, in <em
+>Doe, John, and Jenny Roe, ‘Why Water Is Wet’, in <em
   >Third Book</em
   >, ed by Sam Smith (Oxford: Oxford University Press, 2007).</p
 ><div class="footnotes"



-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                 ` <20101128130345.GK32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-11-28 16:16                                                                                   ` John MacFarlane
@ 2010-11-28 18:10                                                                                   ` John MacFarlane
       [not found]                                                                                     ` <20101128181002.GA30854-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-11-28 18:10 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Nov 28 10 14:03 ]:
> On Sat, Nov 27, 2010 at 12:09:31PM -0800, John MacFarlane wrote:
> > PS. It would be good if you could look over the test norms --
> > 
> > tests/markdown-citations.chicago-author-date.html
> > tests/markdown-citations.ieee.html
> > tests/markdown-citations.mhra.html
> > 
> > You can just view them in a browser (setting encoding to UTF-8).
> > I'm pretty sure the first is right, but I'm not so sure about the rest,
> > because I don't know those styles.  I have some doubts.  In ieee:
> > 
> >     Reference [1]  says blah. Reference [1]  says blah. Reference [1]  says
> >     blah. Reference [1]  says blah.
> > 
> > It seems that there are too many spaces after the [1] [...]
> 
> PS. I broke your tests: now numeric range joins are rendered with an
> en-dash instead of a hyphen.

I've changed the tests to use markdown output instead of HTML;
this is a lot easier to inspect and see where tests fail.

I think there are still some mistakes in the test norms
(markdown-citations.*.txt).  IT would be good if you could take
a look at these.  In particular, should "ed" have a period
after it in mhra?

JOhn


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                                     ` <20101128161612.GB29510-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-29 16:10                                                                                       ` Andrea Rossato
       [not found]                                                                                         ` <20101129161059.GC20563-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-11-29 16:10 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Sun, Nov 28, 2010 at 08:16:12AM -0800, John MacFarlane wrote:
> ->Doe,  John, and Jenny Roe. 2007. Why Water Is Wet. In <em
> +>Doe, John, and Jenny Roe. 2007. Why Water Is Wet. In <em

My fault (no spotted by the test-suite though). Fixed.

> diff --git a/tests/markdown-citations.ieee.html b/tests/markdown-citations.ieee.
> index 7c7d811..f076a46 100644
> --- a/tests/markdown-citations.ieee.html

> -  >,  Cambridge: Cambridge University Press, 2005.</p
> +  >, Cambridge: Cambridge University Press, 2005.</p

> -  >,  vol. 6, 2006, pp. 33–34.</p
> +  >, vol. 6, 2006, pp. 33–34.</p

> -  >, Smith,  S., Ed.,  Oxford: Oxford University Press, 2007.</p
> +  >, Smith, S., Ed., Oxford: Oxford University Press, 2007.</p


This is a bug in the ieee style and I'm really puzzled. It seems like
CSL processors should get rid of extra-spaces thus permitting the kind
of stupid bugs this style shows. Or at least this was what I intended
and coded. I've contacted the CSL guys on that:

http://sourceforge.net/mailarchive/forum.php?thread_name=20101129153947.GB20563%40eeepc.istitutocolli.org&forum_name=xbiblio-devel

> diff --git a/tests/markdown-citations.mhra.html b/tests/markdown-citations.mhra.
> index 29c69d1..2964ed8 100644
> --- a/tests/markdown-citations.mhra.html

> ->Doe,  John, ‘Article’, <em
> +>Doe, John, ‘Article’, <em

same as above (my fault and fixed)

andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                                     ` <20101128181002.GA30854-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-29 16:12                                                                                       ` Andrea Rossato
  0 siblings, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-11-29 16:12 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Sun, Nov 28, 2010 at 10:10:03AM -0800, John MacFarlane wrote:
> In particular, should "ed" have a period after it in mhra?

no. this is the relevant style part:

   <macro name="editor-translator">
      <group delimiter=", ">
         <names variable="editor" delimiter=", ">
            <label form="verb-short" text-case="lowercase" suffix=" " strip-periods="true"/>
            <name and="text" delimiter=", " delimiter-precedes-last="never"/>
         </names>
         <choose>
            <if variable="author editor" match="any">
               <names variable="translator" delimiter=", ">
                  <label form="verb-short" text-case="lowercase" suffix=" " strip-periods="true"/>
                  <name and="text" delimiter=", " delimiter-precedes-last="never"/>
               </names>
            </if>
         </choose>
      </group>
   </macro>

"ed" is the <label> element and stands for "edited", must be lower and
without periods (strip-periods="true").

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                                 ` <20101128022210.GA6819-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-01 11:28                                                                                   ` Andrea Rossato
       [not found]                                                                                     ` <20101201112806.GH10338-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-12-01 11:28 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Sorry it took me so long to come back to this, but I spent sometime in
reading and debugging citation styles lately...

On Sat, Nov 27, 2010 at 06:22:10PM -0800, John MacFarlane wrote:
> +++ Andrea Rossato [Nov 27 10 21:10 ]:
> > That should represent what you proposed:
> > 
> > <references src="mybib.bib">
> >   <select type="book" categories="1990s" />
> >   <quash author="Smith" />
> > </references>
> > 
> > But:
> > 
> > <references src="mybib.bib">
> >   <select type="book" categories="1990s" />
> >   <exclude type="article-journal" />
> >   <quash author="Smith" />
> > </references>
> > 
> > Would not be permitted )and only use the 'select' element would bee
> > used in my implementation).
> > 
> > Does it make sense?
> 
> That helps, but I'm still not certain about some things.
> 
> I had thought we could use this mechanism to get the functionality
> of natbib's \nocite command, which includes an entry in the bibliography
> even though it's not cited in the text.  It looks to me, on cursory
> inspection, that your options can only prune down the list of actually
> cited works, not add uncited works. Am I mistaken?

No you are right. As far as I know there is no way to include in the
bibliography a work which has not be cited. The issue is a bit
troublesome because it breaks to possibility to change among citation
styles: what happens to a \nocite command in a purely numeric style
like ieee?

> Also, although I can see the value of imitating citeproc-js on this,
> the API seems unnatural to me.  Basically you've given AND,
> OR, and NOR with select, include, and exclude.  THat's not enough
> power, so quash is added.  But wouldn't it be better to implement a
> little query language (as I did in yst)?  Restrictions could be
> combined freely using NOT, AND, and OR.  That would be both more
> flexible and more perspicuous, I think.
> 
> To sum up, my questions are:
> 
> 1.  Can these commands be used to include works from the bibliography
> file that are not actually cited in the text, or do they just pare down
> the list of works cited?

No, as I said.

> 2.  Would it be worth implementing a more general and perspicuous
> query language, or is it important to imitate citeproc-js?

It is not important to imitate citeproc-js here. That API is not
included in the CSL specification and we are totally free to create a
new one that fits our needs best. I imitated citeproc-js just because
I thought you didn't want to include this feature in the next release
but I still wanted to test it against the test-suite.

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: Re: textual citation
       [not found]                                                                                         ` <20101129161059.GC20563-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-12-01 13:06                                                                                           ` Andrea Rossato
       [not found]                                                                                             ` <20101201130603.GJ10338-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-12-01 13:06 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Mon, Nov 29, 2010 at 05:10:59PM +0100, Andrea Rossato wrote:
> On Sun, Nov 28, 2010 at 08:16:12AM -0800, John MacFarlane wrote:
> > diff --git a/tests/markdown-citations.ieee.html b/tests/markdown-citations.ieee.
> > index 7c7d811..f076a46 100644
> > --- a/tests/markdown-citations.ieee.html
> 
> > -  >,  Cambridge: Cambridge University Press, 2005.</p
> > +  >, Cambridge: Cambridge University Press, 2005.</p
> 
> > -  >,  vol. 6, 2006, pp. 33–34.</p
> > +  >, vol. 6, 2006, pp. 33–34.</p
> 
> > -  >, Smith,  S., Ed.,  Oxford: Oxford University Press, 2007.</p
> > +  >, Smith, S., Ed., Oxford: Oxford University Press, 2007.</p
> 
> 
> This is a bug in the ieee style and I'm really puzzled. It seems like
> CSL processors should get rid of extra-spaces thus permitting the kind
> of stupid bugs this style shows. Or at least this was what I intended
> and coded. I've contacted the CSL guys on that:
> 
> http://sourceforge.net/mailarchive/forum.php?thread_name=20101129153947.GB20563%40eeepc.istitutocolli.org&forum_name=xbiblio-devel

I do not know if you are a xbiblio-devel subscriber and/or had the
time and the opportunity to have a look at the thread my post
generated in the CSL community.

If yes I'd like to know your opinion, at least with regards to what
pandoc's approach should be.

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                     ` <20101201112806.GH10338-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-12-01 15:52                                                                                       ` John MacFarlane
  0 siblings, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-12-01 15:52 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Dec 01 10 12:28 ]:
> Sorry it took me so long to come back to this, but I spent sometime in
> reading and debugging citation styles lately...
> 
> On Sat, Nov 27, 2010 at 06:22:10PM -0800, John MacFarlane wrote:
> > +++ Andrea Rossato [Nov 27 10 21:10 ]:
> > > That should represent what you proposed:
> > > 
> > > <references src="mybib.bib">
> > >   <select type="book" categories="1990s" />
> > >   <quash author="Smith" />
> > > </references>
> > > 
> > > But:
> > > 
> > > <references src="mybib.bib">
> > >   <select type="book" categories="1990s" />
> > >   <exclude type="article-journal" />
> > >   <quash author="Smith" />
> > > </references>
> > > 
> > > Would not be permitted )and only use the 'select' element would bee
> > > used in my implementation).
> > > 
> > > Does it make sense?
> > 
> > That helps, but I'm still not certain about some things.
> > 
> > I had thought we could use this mechanism to get the functionality
> > of natbib's \nocite command, which includes an entry in the bibliography
> > even though it's not cited in the text.  It looks to me, on cursory
> > inspection, that your options can only prune down the list of actually
> > cited works, not add uncited works. Am I mistaken?
> 
> No you are right. As far as I know there is no way to include in the
> bibliography a work which has not be cited. The issue is a bit
> troublesome because it breaks to possibility to change among citation
> styles: what happens to a \nocite command in a purely numeric style
> like ieee?

Let me get clear about the problem here. \nocite in natbib puts the entry
in the bibliography without putting a citation in the text.  In
principle, it should be possible to do this with numeric styles;
there would just be a number in the bibliography that doesn't appear
in the text.  Is there a reason it couldn't work this way?

On the other hand, suppressing an entry that IS cited from the
bibliography would cause problems with the numerical method.
There'd be a number with no corresponding entry in the bibliography.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                             ` <20101201130603.GJ10338-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-12-01 16:17                                                                                               ` John MacFarlane
       [not found]                                                                                                 ` <20101201161702.GD3038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-12-01 16:17 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Dec 01 10 14:06 ]:
> On Mon, Nov 29, 2010 at 05:10:59PM +0100, Andrea Rossato wrote:
> > On Sun, Nov 28, 2010 at 08:16:12AM -0800, John MacFarlane wrote:
> > > diff --git a/tests/markdown-citations.ieee.html b/tests/markdown-citations.ieee.
> > > index 7c7d811..f076a46 100644
> > > --- a/tests/markdown-citations.ieee.html
> > 
> > > -  >,  Cambridge: Cambridge University Press, 2005.</p
> > > +  >, Cambridge: Cambridge University Press, 2005.</p
> > 
> > > -  >,  vol. 6, 2006, pp. 33–34.</p
> > > +  >, vol. 6, 2006, pp. 33–34.</p
> > 
> > > -  >, Smith,  S., Ed.,  Oxford: Oxford University Press, 2007.</p
> > > +  >, Smith, S., Ed., Oxford: Oxford University Press, 2007.</p
> > 
> > 
> > This is a bug in the ieee style and I'm really puzzled. It seems like
> > CSL processors should get rid of extra-spaces thus permitting the kind
> > of stupid bugs this style shows. Or at least this was what I intended
> > and coded. I've contacted the CSL guys on that:
> > 
> > http://sourceforge.net/mailarchive/forum.php?thread_name=20101129153947.GB20563%40eeepc.istitutocolli.org&forum_name=xbiblio-devel
> 
> I do not know if you are a xbiblio-devel subscriber and/or had the
> time and the opportunity to have a look at the thread my post
> generated in the CSL community.
> 
> If yes I'd like to know your opinion, at least with regards to what
> pandoc's approach should be.

Interesting thread.  Although I lean towards your view that the
processor shouldn't cover up errors in the styles, I also see Frank's
point:  the buggy styles are already being used by lots of people,
and it would break things to change how citeproc-js handles them.

If there's to be a transition to the stricter behavior, it seems
it would have to come in phases:  first introduce a testing
version of citeproc-js that is strict, PLUS a tool or test suite
to check for problems in existing styles.  After people had some
time to clean up existing styles, perhaps the strict behavior could
be made standard.

For our purposes, I think the pragmatic thing to do would be to
imitate citeproc-js and collapse double spaces, so that people could
use the existing styles.  Is this a difficult thing to do in
citeproc-hs?  (Note that in Text.Pandoc.Shared there is already
a function normalizeSpaces, which could simply be called on the
final list of Inlines.*)  One could develop a different tool
to make evident spaces bugs in the styles, until the spec was
clarified one way or the other.

John

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                                 ` <20101201161702.GD3038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-01 20:55                                                                                                   ` dsanson
       [not found]                                                                                                     ` <8f044663-b02a-45bd-b299-60fef03bf457-n9fKM5ssUrqdjmvXPhoLGFYGCWtFR9XvQQ4Iyu8u01E@public.gmane.org>
  2010-12-03  5:57                                                                                                   ` John MacFarlane
  1 sibling, 1 reply; 71+ messages in thread
From: dsanson @ 2010-12-01 20:55 UTC (permalink / raw)
  To: pandoc-discuss

Perhaps a switch to shift between a "strict" behavior and a
"forgiving" behavior. This might encourage user discovery of
unintended spaces in existing styles, while still allowing us to use
the styles as-is without endless headaches. Eventually, if the styles
get cleaned up, the "forgiving" behavior could be dropped. And if the
CSL community decides instead to leave the styles as-is, the "strict"
behavior" could be dropped. At the moment, I'd think pandoc should
default to asking citeproc-hs to be "forgiving" but provide a cli
switch to ask it to be "strict".

One added benefit: I suspect the pandoc/citeproc-hs user community is
smaller and more tech-savvy, on average, than the zotero/mendeley user
community. By using the "strict" option, we'd be able to start finding
and reporting style bugs without flooding the CSL mailing lists with
confused complaints.

David


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                                 ` <20101201161702.GD3038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-12-01 20:55                                                                                                   ` dsanson
@ 2010-12-03  5:57                                                                                                   ` John MacFarlane
       [not found]                                                                                                     ` <20101203055730.GA24661-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-12-03  5:57 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Dec 01 10 08:17 ]:
> +++ Andrea Rossato [Dec 01 10 14:06 ]:
> > On Mon, Nov 29, 2010 at 05:10:59PM +0100, Andrea Rossato wrote:
> > > On Sun, Nov 28, 2010 at 08:16:12AM -0800, John MacFarlane wrote:
> > > > diff --git a/tests/markdown-citations.ieee.html b/tests/markdown-citations.ieee.
> > > > index 7c7d811..f076a46 100644
> > > > --- a/tests/markdown-citations.ieee.html
> > > 
> > > > -  >,  Cambridge: Cambridge University Press, 2005.</p
> > > > +  >, Cambridge: Cambridge University Press, 2005.</p
> > > 
> > > > -  >,  vol. 6, 2006, pp. 33–34.</p
> > > > +  >, vol. 6, 2006, pp. 33–34.</p
> > > 
> > > > -  >, Smith,  S., Ed.,  Oxford: Oxford University Press, 2007.</p
> > > > +  >, Smith, S., Ed., Oxford: Oxford University Press, 2007.</p
> > > 
> > > 
> > > This is a bug in the ieee style and I'm really puzzled. It seems like
> > > CSL processors should get rid of extra-spaces thus permitting the kind
> > > of stupid bugs this style shows. Or at least this was what I intended
> > > and coded. I've contacted the CSL guys on that:
> > > 
> > > http://sourceforge.net/mailarchive/forum.php?thread_name=20101129153947.GB20563%40eeepc.istitutocolli.org&forum_name=xbiblio-devel
> > 
> > I do not know if you are a xbiblio-devel subscriber and/or had the
> > time and the opportunity to have a look at the thread my post
> > generated in the CSL community.
> > 
> > If yes I'd like to know your opinion, at least with regards to what
> > pandoc's approach should be.
> 
> Interesting thread.  Although I lean towards your view that the
> processor shouldn't cover up errors in the styles, I also see Frank's
> point:  the buggy styles are already being used by lots of people,
> and it would break things to change how citeproc-js handles them.
> 
> If there's to be a transition to the stricter behavior, it seems
> it would have to come in phases:  first introduce a testing
> version of citeproc-js that is strict, PLUS a tool or test suite
> to check for problems in existing styles.  After people had some
> time to clean up existing styles, perhaps the strict behavior could
> be made standard.
> 
> For our purposes, I think the pragmatic thing to do would be to
> imitate citeproc-js and collapse double spaces, so that people could
> use the existing styles.  Is this a difficult thing to do in
> citeproc-hs?  (Note that in Text.Pandoc.Shared there is already
> a function normalizeSpaces, which could simply be called on the
> final list of Inlines.*)  One could develop a different tool
> to make evident spaces bugs in the styles, until the spec was
> clarified one way or the other.

PS. Here's a patch to citeproc-hs that collapses adjacent spaces
in pandoc output:

hunk ./src/Text/CSL/Output/Pandoc.hs 131
-    | Str " "   <- i  = Space          : cleanStrict is
+    | is_space i      = Space          : cleanStrict (dropWhile is_space is)
hunk ./src/Text/CSL/Output/Pandoc.hs 135
+  where is_space Space    = True
+        is_space (Str "") = True
+        is_space _        = False

When I applied this patch, all the pandoc test cases passed.
I know you may object to this on philosophical grounds, though.

John

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                                     ` <20101203055730.GA24661-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-03  6:09                                                                                                       ` John MacFarlane
       [not found]                                                                                                         ` <20101203060948.GA24736-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-12-03 14:11                                                                                                       ` Andrea Rossato
  1 sibling, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-12-03  6:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Dec 02 10 21:57 ]:
> +++ John MacFarlane [Dec 01 10 08:17 ]:
> > +++ Andrea Rossato [Dec 01 10 14:06 ]:
> > > On Mon, Nov 29, 2010 at 05:10:59PM +0100, Andrea Rossato wrote:
> > > > On Sun, Nov 28, 2010 at 08:16:12AM -0800, John MacFarlane wrote:
> > > > > diff --git a/tests/markdown-citations.ieee.html b/tests/markdown-citations.ieee.
> > > > > index 7c7d811..f076a46 100644
> > > > > --- a/tests/markdown-citations.ieee.html
> > > > 
> > > > > -  >,  Cambridge: Cambridge University Press, 2005.</p
> > > > > +  >, Cambridge: Cambridge University Press, 2005.</p
> > > > 
> > > > > -  >,  vol. 6, 2006, pp. 33–34.</p
> > > > > +  >, vol. 6, 2006, pp. 33–34.</p
> > > > 
> > > > > -  >, Smith,  S., Ed.,  Oxford: Oxford University Press, 2007.</p
> > > > > +  >, Smith, S., Ed., Oxford: Oxford University Press, 2007.</p
> > > > 
> > > > 
> > > > This is a bug in the ieee style and I'm really puzzled. It seems like
> > > > CSL processors should get rid of extra-spaces thus permitting the kind
> > > > of stupid bugs this style shows. Or at least this was what I intended
> > > > and coded. I've contacted the CSL guys on that:
> > > > 
> > > > http://sourceforge.net/mailarchive/forum.php?thread_name=20101129153947.GB20563%40eeepc.istitutocolli.org&forum_name=xbiblio-devel
> > > 
> > > I do not know if you are a xbiblio-devel subscriber and/or had the
> > > time and the opportunity to have a look at the thread my post
> > > generated in the CSL community.
> > > 
> > > If yes I'd like to know your opinion, at least with regards to what
> > > pandoc's approach should be.
> > 
> > Interesting thread.  Although I lean towards your view that the
> > processor shouldn't cover up errors in the styles, I also see Frank's
> > point:  the buggy styles are already being used by lots of people,
> > and it would break things to change how citeproc-js handles them.
> > 
> > If there's to be a transition to the stricter behavior, it seems
> > it would have to come in phases:  first introduce a testing
> > version of citeproc-js that is strict, PLUS a tool or test suite
> > to check for problems in existing styles.  After people had some
> > time to clean up existing styles, perhaps the strict behavior could
> > be made standard.
> > 
> > For our purposes, I think the pragmatic thing to do would be to
> > imitate citeproc-js and collapse double spaces, so that people could
> > use the existing styles.  Is this a difficult thing to do in
> > citeproc-hs?  (Note that in Text.Pandoc.Shared there is already
> > a function normalizeSpaces, which could simply be called on the
> > final list of Inlines.*)  One could develop a different tool
> > to make evident spaces bugs in the styles, until the spec was
> > clarified one way or the other.
> 
> PS. Here's a patch to citeproc-hs that collapses adjacent spaces
> in pandoc output:
> 
> hunk ./src/Text/CSL/Output/Pandoc.hs 131
> -    | Str " "   <- i  = Space          : cleanStrict is
> +    | is_space i      = Space          : cleanStrict (dropWhile is_space is)
> hunk ./src/Text/CSL/Output/Pandoc.hs 135
> +  where is_space Space    = True
> +        is_space (Str "") = True
> +        is_space _        = False
> 
> When I applied this patch, all the pandoc test cases passed.
> I know you may object to this on philosophical grounds, though.

I found another odd case with the mhra style.

    pandoc --biblio tests/biblio.bib --csl tests/mhra.csl -t markdown
    @item3 says...
    ^D
    John Doe and Jenny Roeed by Sam Smith[^1] says...
    
    Doe, John, and Jenny Roe, ‘Why Water Is Wet’, in *Third Book*, ed
    by Sam Smith (Oxford: Oxford University Press, 2007).
    
    [^1]:
        ‘Why Water Is Wet’, in *Third Book*,  (Oxford: Oxford University
        Press, 2007).

The odd part is:  "JohnDoe and Jenny Roeed by Sam Smith[^1]"
Presumably "ed by Sam Smith" should not be there...

Is this a bug in the style or in citeproc?

John

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                                         ` <20101203060948.GA24736-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-03  6:45                                                                                                           ` John MacFarlane
  2010-12-03 14:18                                                                                                           ` Andrea Rossato
  2010-12-07 17:27                                                                                                           ` Andrea Rossato
  2 siblings, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-12-03  6:45 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Dec 02 10 22:09 ]:
> +++ John MacFarlane [Dec 02 10 21:57 ]:
> > +++ John MacFarlane [Dec 01 10 08:17 ]:
> > > +++ Andrea Rossato [Dec 01 10 14:06 ]:
> > > > On Mon, Nov 29, 2010 at 05:10:59PM +0100, Andrea Rossato wrote:
> > > > > On Sun, Nov 28, 2010 at 08:16:12AM -0800, John MacFarlane wrote:
> > > > > > diff --git a/tests/markdown-citations.ieee.html b/tests/markdown-citations.ieee.
> > > > > > index 7c7d811..f076a46 100644
> > > > > > --- a/tests/markdown-citations.ieee.html
> > > > > 
> > > > > > -  >,  Cambridge: Cambridge University Press, 2005.</p
> > > > > > +  >, Cambridge: Cambridge University Press, 2005.</p
> > > > > 
> > > > > > -  >,  vol. 6, 2006, pp. 33–34.</p
> > > > > > +  >, vol. 6, 2006, pp. 33–34.</p
> > > > > 
> > > > > > -  >, Smith,  S., Ed.,  Oxford: Oxford University Press, 2007.</p
> > > > > > +  >, Smith, S., Ed., Oxford: Oxford University Press, 2007.</p
> > > > > 
> > > > > 
> > > > > This is a bug in the ieee style and I'm really puzzled. It seems like
> > > > > CSL processors should get rid of extra-spaces thus permitting the kind
> > > > > of stupid bugs this style shows. Or at least this was what I intended
> > > > > and coded. I've contacted the CSL guys on that:
> > > > > 
> > > > > http://sourceforge.net/mailarchive/forum.php?thread_name=20101129153947.GB20563%40eeepc.istitutocolli.org&forum_name=xbiblio-devel
> > > > 
> > > > I do not know if you are a xbiblio-devel subscriber and/or had the
> > > > time and the opportunity to have a look at the thread my post
> > > > generated in the CSL community.
> > > > 
> > > > If yes I'd like to know your opinion, at least with regards to what
> > > > pandoc's approach should be.
> > > 
> > > Interesting thread.  Although I lean towards your view that the
> > > processor shouldn't cover up errors in the styles, I also see Frank's
> > > point:  the buggy styles are already being used by lots of people,
> > > and it would break things to change how citeproc-js handles them.
> > > 
> > > If there's to be a transition to the stricter behavior, it seems
> > > it would have to come in phases:  first introduce a testing
> > > version of citeproc-js that is strict, PLUS a tool or test suite
> > > to check for problems in existing styles.  After people had some
> > > time to clean up existing styles, perhaps the strict behavior could
> > > be made standard.
> > > 
> > > For our purposes, I think the pragmatic thing to do would be to
> > > imitate citeproc-js and collapse double spaces, so that people could
> > > use the existing styles.  Is this a difficult thing to do in
> > > citeproc-hs?  (Note that in Text.Pandoc.Shared there is already
> > > a function normalizeSpaces, which could simply be called on the
> > > final list of Inlines.*)  One could develop a different tool
> > > to make evident spaces bugs in the styles, until the spec was
> > > clarified one way or the other.
> > 
> > PS. Here's a patch to citeproc-hs that collapses adjacent spaces
> > in pandoc output:
> > 
> > hunk ./src/Text/CSL/Output/Pandoc.hs 131
> > -    | Str " "   <- i  = Space          : cleanStrict is
> > +    | is_space i      = Space          : cleanStrict (dropWhile is_space is)
> > hunk ./src/Text/CSL/Output/Pandoc.hs 135
> > +  where is_space Space    = True
> > +        is_space (Str "") = True
> > +        is_space _        = False
> > 
> > When I applied this patch, all the pandoc test cases passed.
> > I know you may object to this on philosophical grounds, though.
> 
> I found another odd case with the mhra style.
> 
>     pandoc --biblio tests/biblio.bib --csl tests/mhra.csl -t markdown
>     @item3 says...
>     ^D
>     John Doe and Jenny Roeed by Sam Smith[^1] says...
>     
>     Doe, John, and Jenny Roe, ‘Why Water Is Wet’, in *Third Book*, ed
>     by Sam Smith (Oxford: Oxford University Press, 2007).
>     
>     [^1]:
>         ‘Why Water Is Wet’, in *Third Book*,  (Oxford: Oxford University
>         Press, 2007).
> 
> The odd part is:  "JohnDoe and Jenny Roeed by Sam Smith[^1]"
> Presumably "ed by Sam Smith" should not be there...

More precisely, it should be in the footnote, not in the main
text.  But maybe this is intentional?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                                                     ` <20101203055730.GA24661-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-12-03  6:09                                                                                                       ` John MacFarlane
@ 2010-12-03 14:11                                                                                                       ` Andrea Rossato
       [not found]                                                                                                         ` <20101203141139.GD14815-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-12-03 14:11 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Thu, Dec 02, 2010 at 09:57:30PM -0800, John MacFarlane wrote:
> +++ John MacFarlane [Dec 01 10 08:17 ]:
> > For our purposes, I think the pragmatic thing to do would be to
> > imitate citeproc-js and collapse double spaces, so that people could
> > use the existing styles.  Is this a difficult thing to do in
> > citeproc-hs?  (Note that in Text.Pandoc.Shared there is already
> > a function normalizeSpaces, which could simply be called on the
> > final list of Inlines.*)  One could develop a different tool
> > to make evident spaces bugs in the styles, until the spec was
> > clarified one way or the other.
> 
> PS. Here's a patch to citeproc-hs that collapses adjacent spaces
> in pandoc output:
> 
> hunk ./src/Text/CSL/Output/Pandoc.hs 131
> -    | Str " "   <- i  = Space          : cleanStrict is
> +    | is_space i      = Space          : cleanStrict (dropWhile is_space is)
> hunk ./src/Text/CSL/Output/Pandoc.hs 135
> +  where is_space Space    = True
> +        is_space (Str "") = True
> +        is_space _        = False
> 
> When I applied this patch, all the pandoc test cases passed.
> I know you may object to this on philosophical grounds, though.

This is actually the way citeproc-hs works. Since I'm working on some
improvements in the output (formatting and quotation flip-flopping),
when partially recording I recently introduced a bug so that spaces
are not normalized when feeding pandoc - but they are when producing
the output for the test-suite.

This is why I discovered the problem and that the correct fix is not
the patch you are proposing but the one I'm attaching.

My objections to your patch are actually practical: I think that
preventing bugs to be covered by the implementation should reward us
with better styles.

But, as you and David suggested, probably the best path to take is to
normalize spaces but also support the strict behavior. That requires a
new parameter for renderPandoc (or two version of the same function)
and a new command line option. I think this could also be the solution
that could be adopted in the CSL specification, even though Frank came
up with a proposal which seems very interesting to me - spaces could
not be used in affix attributes but only in delimiters.

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                                                         ` <20101203060948.GA24736-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-12-03  6:45                                                                                                           ` John MacFarlane
@ 2010-12-03 14:18                                                                                                           ` Andrea Rossato
  2010-12-07 17:27                                                                                                           ` Andrea Rossato
  2 siblings, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-12-03 14:18 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Thu, Dec 02, 2010 at 10:09:48PM -0800, John MacFarlane wrote:
> +++ John MacFarlane [Dec 02 10 21:57 ]:
> > +++ John MacFarlane [Dec 01 10 08:17 ]:
> > > +++ Andrea Rossato [Dec 01 10 14:06 ]:
> > > > On Mon, Nov 29, 2010 at 05:10:59PM +0100, Andrea Rossato wrote:
> > > > > On Sun, Nov 28, 2010 at 08:16:12AM -0800, John MacFarlane wrote:
> > > > > > diff --git a/tests/markdown-citations.ieee.html b/tests/markdown-citations.ieee.
> > > > > > index 7c7d811..f076a46 100644
> > > > > > --- a/tests/markdown-citations.ieee.html
> > > > > 
> > > > > > -  >,  Cambridge: Cambridge University Press, 2005.</p
> > > > > > +  >, Cambridge: Cambridge University Press, 2005.</p
> > > > > 
> > > > > > -  >,  vol. 6, 2006, pp. 33–34.</p
> > > > > > +  >, vol. 6, 2006, pp. 33–34.</p
> > > > > 
> > > > > > -  >, Smith,  S., Ed.,  Oxford: Oxford University Press, 2007.</p
> > > > > > +  >, Smith, S., Ed., Oxford: Oxford University Press, 2007.</p
> > > > > 
> > > > > 
> > > > > This is a bug in the ieee style and I'm really puzzled. It seems like
> > > > > CSL processors should get rid of extra-spaces thus permitting the kind
> > > > > of stupid bugs this style shows. Or at least this was what I intended
> > > > > and coded. I've contacted the CSL guys on that:
> > > > > 
> > > > > http://sourceforge.net/mailarchive/forum.php?thread_name=20101129153947.GB20563%40eeepc.istitutocolli.org&forum_name=xbiblio-devel
> > > > 
> > > > I do not know if you are a xbiblio-devel subscriber and/or had the
> > > > time and the opportunity to have a look at the thread my post
> > > > generated in the CSL community.
> > > > 
> > > > If yes I'd like to know your opinion, at least with regards to what
> > > > pandoc's approach should be.
> > > 
> > > Interesting thread.  Although I lean towards your view that the
> > > processor shouldn't cover up errors in the styles, I also see Frank's
> > > point:  the buggy styles are already being used by lots of people,
> > > and it would break things to change how citeproc-js handles them.
> > > 
> > > If there's to be a transition to the stricter behavior, it seems
> > > it would have to come in phases:  first introduce a testing
> > > version of citeproc-js that is strict, PLUS a tool or test suite
> > > to check for problems in existing styles.  After people had some
> > > time to clean up existing styles, perhaps the strict behavior could
> > > be made standard.
> > > 
> > > For our purposes, I think the pragmatic thing to do would be to
> > > imitate citeproc-js and collapse double spaces, so that people could
> > > use the existing styles.  Is this a difficult thing to do in
> > > citeproc-hs?  (Note that in Text.Pandoc.Shared there is already
> > > a function normalizeSpaces, which could simply be called on the
> > > final list of Inlines.*)  One could develop a different tool
> > > to make evident spaces bugs in the styles, until the spec was
> > > clarified one way or the other.
> > 
> > PS. Here's a patch to citeproc-hs that collapses adjacent spaces
> > in pandoc output:
> > 
> > hunk ./src/Text/CSL/Output/Pandoc.hs 131
> > -    | Str " "   <- i  = Space          : cleanStrict is
> > +    | is_space i      = Space          : cleanStrict (dropWhile is_space is)
> > hunk ./src/Text/CSL/Output/Pandoc.hs 135
> > +  where is_space Space    = True
> > +        is_space (Str "") = True
> > +        is_space _        = False
> > 
> > When I applied this patch, all the pandoc test cases passed.
> > I know you may object to this on philosophical grounds, though.
> 
> I found another odd case with the mhra style.
> 
>     pandoc --biblio tests/biblio.bib --csl tests/mhra.csl -t markdown
>     @item3 says...
>     ^D
>     John Doe and Jenny Roeed by Sam Smith[^1] says...
>     
>     Doe, John, and Jenny Roe, ‘Why Water Is Wet’, in *Third Book*, ed
>     by Sam Smith (Oxford: Oxford University Press, 2007).
>     
>     [^1]:
>         ‘Why Water Is Wet’, in *Third Book*,  (Oxford: Oxford University
>         Press, 2007).
> 
> The odd part is:  "JohnDoe and Jenny Roeed by Sam Smith[^1]"
> Presumably "ed by Sam Smith" should not be there...
> 
> Is this a bug in the style or in citeproc?


It seems a citeproc-hs bug, something I had not thought about when
coding the suppress-author option. "ed by Sam Smith" should not be
there.

I'll take care of it.

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: Re: textual citation
       [not found]                                                                                                         ` <20101203141139.GD14815-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-12-03 14:19                                                                                                           ` Andrea Rossato
       [not found]                                                                                                             ` <20101203141953.GF14815-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-12-03 15:30                                                                                                           ` John MacFarlane
  1 sibling, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-12-03 14:19 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2172 bytes --]

On Fri, Dec 03, 2010 at 03:11:39PM +0100, Andrea Rossato wrote:
> On Thu, Dec 02, 2010 at 09:57:30PM -0800, John MacFarlane wrote:
> > +++ John MacFarlane [Dec 01 10 08:17 ]:
> > > For our purposes, I think the pragmatic thing to do would be to
> > > imitate citeproc-js and collapse double spaces, so that people could
> > > use the existing styles.  Is this a difficult thing to do in
> > > citeproc-hs?  (Note that in Text.Pandoc.Shared there is already
> > > a function normalizeSpaces, which could simply be called on the
> > > final list of Inlines.*)  One could develop a different tool
> > > to make evident spaces bugs in the styles, until the spec was
> > > clarified one way or the other.
> > 
> > PS. Here's a patch to citeproc-hs that collapses adjacent spaces
> > in pandoc output:
> > 
> > hunk ./src/Text/CSL/Output/Pandoc.hs 131
> > -    | Str " "   <- i  = Space          : cleanStrict is
> > +    | is_space i      = Space          : cleanStrict (dropWhile is_space is)
> > hunk ./src/Text/CSL/Output/Pandoc.hs 135
> > +  where is_space Space    = True
> > +        is_space (Str "") = True
> > +        is_space _        = False
> > 
> > When I applied this patch, all the pandoc test cases passed.
> > I know you may object to this on philosophical grounds, though.
> 
> This is actually the way citeproc-hs works. Since I'm working on some
> improvements in the output (formatting and quotation flip-flopping),
> when partially recording I recently introduced a bug so that spaces
> are not normalized when feeding pandoc - but they are when producing
> the output for the test-suite.
> 
> This is why I discovered the problem and that the correct fix is not
> the patch you are proposing but the one I'm attaching.

the patch...

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: ieee.patch --]
[-- Type: text/plain, Size: 908 bytes --]

diff --git a/tests/ieee.csl b/tests/ieee.csl
index 2e0af17..af57495 100644
--- a/tests/ieee.csl
+++ b/tests/ieee.csl
@@ -49,7 +49,7 @@
       </choose>
    </macro>
    <macro name="publisher">
-      <text variable="publisher-place" suffix=": " prefix=" "/>
+      <text variable="publisher-place" suffix=": "/>
       <text variable="publisher" suffix=", "/>
       <date variable="issued">
          <date-part name="year"/>
@@ -115,7 +115,7 @@
                <group delimiter=", ">
                   <text macro="title"/>
                   <text variable="container-title" font-style="italic"/>
-                  <text variable="volume" prefix=" vol. "/>
+                  <text variable="volume" prefix="vol. "/>
                   <date variable="issued">
                      <date-part name="month" form="short" suffix=". " strip-periods="true"/>
                      <date-part name="year"/>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                                         ` <20101203141139.GD14815-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
  2010-12-03 14:19                                                                                                           ` Andrea Rossato
@ 2010-12-03 15:30                                                                                                           ` John MacFarlane
  1 sibling, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-12-03 15:30 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Dec 03 10 15:11 ]:
> On Thu, Dec 02, 2010 at 09:57:30PM -0800, John MacFarlane wrote:
> > +++ John MacFarlane [Dec 01 10 08:17 ]:
> > > For our purposes, I think the pragmatic thing to do would be to
> > > imitate citeproc-js and collapse double spaces, so that people could
> > > use the existing styles.  Is this a difficult thing to do in
> > > citeproc-hs?  (Note that in Text.Pandoc.Shared there is already
> > > a function normalizeSpaces, which could simply be called on the
> > > final list of Inlines.*)  One could develop a different tool
> > > to make evident spaces bugs in the styles, until the spec was
> > > clarified one way or the other.
> > 
> > PS. Here's a patch to citeproc-hs that collapses adjacent spaces
> > in pandoc output:
> > 
> > hunk ./src/Text/CSL/Output/Pandoc.hs 131
> > -    | Str " "   <- i  = Space          : cleanStrict is
> > +    | is_space i      = Space          : cleanStrict (dropWhile is_space is)
> > hunk ./src/Text/CSL/Output/Pandoc.hs 135
> > +  where is_space Space    = True
> > +        is_space (Str "") = True
> > +        is_space _        = False
> > 
> > When I applied this patch, all the pandoc test cases passed.
> > I know you may object to this on philosophical grounds, though.
> 
> This is actually the way citeproc-hs works. Since I'm working on some
> improvements in the output (formatting and quotation flip-flopping),
> when partially recording I recently introduced a bug so that spaces
> are not normalized when feeding pandoc - but they are when producing
> the output for the test-suite.
> 
> This is why I discovered the problem and that the correct fix is not
> the patch you are proposing but the one I'm attaching.

No patch was attached!

> My objections to your patch are actually practical: I think that
> preventing bugs to be covered by the implementation should reward us
> with better styles.

I've been thinking about this, and I'm coming around to the opposing
view, I think.  Given that it's very easy to collapse adjacent spaces,
why should it be considered a defect in a style if it has an extra
space?  Why not just make it part of the spec that adjacent spaces
are collapsed?

> But, as you and David suggested, probably the best path to take is to
> normalize spaces but also support the strict behavior. That requires a
> new parameter for renderPandoc (or two version of the same function)
> and a new command line option. I think this could also be the solution
> that could be adopted in the CSL specification, even though Frank came
> up with a proposal which seems very interesting to me - spaces could
> not be used in affix attributes but only in delimiters.

I must admit, I'm reluctant to clutter pandoc with another command-line
option unless I see a strong reason for it. If the option is only useful for
debugging CSL styles, and hence for CSL developers, I'd rather see a separate
specialized tool (which could of course use the pandoc library).

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                                             ` <20101203141953.GF14815-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
@ 2010-12-03 15:40                                                                                                               ` John MacFarlane
       [not found]                                                                                                                 ` <20101203154032.GB28210-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-12-03 15:40 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Dec 03 10 15:19 ]:
> On Fri, Dec 03, 2010 at 03:11:39PM +0100, Andrea Rossato wrote:
> > On Thu, Dec 02, 2010 at 09:57:30PM -0800, John MacFarlane wrote:
> > > +++ John MacFarlane [Dec 01 10 08:17 ]:
> > > > For our purposes, I think the pragmatic thing to do would be to
> > > > imitate citeproc-js and collapse double spaces, so that people could
> > > > use the existing styles.  Is this a difficult thing to do in
> > > > citeproc-hs?  (Note that in Text.Pandoc.Shared there is already
> > > > a function normalizeSpaces, which could simply be called on the
> > > > final list of Inlines.*)  One could develop a different tool
> > > > to make evident spaces bugs in the styles, until the spec was
> > > > clarified one way or the other.
> > > 
> > > PS. Here's a patch to citeproc-hs that collapses adjacent spaces
> > > in pandoc output:
> > > 
> > > hunk ./src/Text/CSL/Output/Pandoc.hs 131
> > > -    | Str " "   <- i  = Space          : cleanStrict is
> > > +    | is_space i      = Space          : cleanStrict (dropWhile is_space is)
> > > hunk ./src/Text/CSL/Output/Pandoc.hs 135
> > > +  where is_space Space    = True
> > > +        is_space (Str "") = True
> > > +        is_space _        = False
> > > 
> > > When I applied this patch, all the pandoc test cases passed.
> > > I know you may object to this on philosophical grounds, though.
> > 
> > This is actually the way citeproc-hs works. Since I'm working on some
> > improvements in the output (formatting and quotation flip-flopping),
> > when partially recording I recently introduced a bug so that spaces
> > are not normalized when feeding pandoc - but they are when producing
> > the output for the test-suite.
> > 
> > This is why I discovered the problem and that the correct fix is not
> > the patch you are proposing but the one I'm attaching.
> 
> the patch...

The intent of my patch was to make it possible to use pandoc/citeproc with
existing CSL styles. Your patch doesn't do that; it requires users to use
a special ieee.csl -- and what if similar problems arise for other styles?
Won't we be getting lots of mail on this list saying "Hey, I'm trying to
use my zotero styles with pandoc, and I'm getting all these weird
spacing bugs!"  Do we really want to have to reply, "Here's a patch to the
style?" So again, I'm wondering -- given that it's easy to change
pandoc/citeproc so that the features of ieee.csl that you changed don't cause
any problems, should these features really be considered bugs?

I liked the suggestion on the xbiblio list that if style authors wanted to
include multiple adjacent spaces, they should use a unicode nonbreaking space
character. This is consistent with how pandoc works in general.

So, I think my view is that the best solution would be to change citeproc-hs
along the lines of my earlier patch, and convince Bruce to change the citeproc
spec to say that "adjacent spaces will be collapsed in output." I'm open
to being persuaded otherwise, though, and of course, this  solution
would require cooperation from Bruce.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                                                                 ` <20101203154032.GB28210-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-07 10:30                                                                                                                   ` Andrea Rossato
       [not found]                                                                                                                     ` <20101207103034.GA22516-u31zCTIHpvLVI6Gt0zCidg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-12-07 10:30 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Fri, Dec 03, 2010 at 07:40:32AM -0800, John MacFarlane wrote:
> The intent of my patch was to make it possible to use pandoc/citeproc with
> existing CSL styles. Your patch doesn't do that; it requires users to use
> a special ieee.csl -- and what if similar problems arise for other styles?
> Won't we be getting lots of mail on this list saying "Hey, I'm trying to
> use my zotero styles with pandoc, and I'm getting all these weird
> spacing bugs!"  Do we really want to have to reply, "Here's a patch to the
> style?" So again, I'm wondering -- given that it's easy to change
> pandoc/citeproc so that the features of ieee.csl that you changed don't cause
> any problems, should these features really be considered bugs?
> 
> I liked the suggestion on the xbiblio list that if style authors wanted to
> include multiple adjacent spaces, they should use a unicode nonbreaking space
> character. This is consistent with how pandoc works in general.
> 
> So, I think my view is that the best solution would be to change citeproc-hs
> along the lines of my earlier patch, and convince Bruce to change the citeproc
> spec to say that "adjacent spaces will be collapsed in output." I'm open
> to being persuaded otherwise, though, and of course, this  solution
> would require cooperation from Bruce.

I understand the difference between my patch and yours. Actually, as I
said, yours should be:

hunk ./src/Text/CSL/Output/Pandoc.hs 141
-    | otherwise        = clean' s [i] ++ clean s is
+    | otherwise        = clean' s (i  :  clean s is)

which also fixes the "punctuation-in-quote" option (I told you this
was a regression).

When running the test-suite you'll indeed see, now:

|NORM| [2] J. Doe, “Article”, *Journal of Generic Studies*, vol. 6, 2006, pp. 33-34.
|TEST| [2] J. Doe, “Article,” *Journal of Generic Studies*, vol. 6, 2006, pp. 33-34.

This is due to the fact that in locales-en-US.xml there is:
  <style-options punctuation-in-quote="true"/>

Now, I think we are quite aware that, if we decide to enforce a
stricter policy, we need to fix the styles, and not answering to
people with a patch. Is it worth the effort? I don't know, as I
repeatedly said.

I understand that using the Zotero styles as they stand would be the
easiest solution. I wonder if laziness should guide CSL design
principles, though. CSL, which is not a document format but a
programming language for formatting citations, offers plenty of
facilities for dealing with this kind of issues, or with low input
data quality (you will see that a huge amount of Zotero styles will
produce extra commas and extra spaces when rendering, for instance, a
article-journal reference type which lacks the container-title
variable). Zotero styles are not using them because Zotero used to use
CSL as a scripting language (we had to wait citeproc-js to see a real
and complete implementation).

Anyway, I pushed the fix, so Zotero styles will look like good styles,
now.

Nonetheless I keep wondering if a style in a test-suite (citeproc-test
or the pandoc tests) should be bugged and invalid (ieee.cls pretends
to be in the "numeric" class even though the class attribute only
allows "note" and "in-text" values). This is the reason why all this
started. Since there seems to be no consensus on the path to take,
I'll just shut up.

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                                                     ` <20101207103034.GA22516-u31zCTIHpvLVI6Gt0zCidg@public.gmane.org>
@ 2010-12-07 16:30                                                                                                                       ` John MacFarlane
  0 siblings, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-12-07 16:30 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Dec 07 10 11:30 ]:
> On Fri, Dec 03, 2010 at 07:40:32AM -0800, John MacFarlane wrote:
> > The intent of my patch was to make it possible to use pandoc/citeproc with
> > existing CSL styles. Your patch doesn't do that; it requires users to use
> > a special ieee.csl -- and what if similar problems arise for other styles?
> > Won't we be getting lots of mail on this list saying "Hey, I'm trying to
> > use my zotero styles with pandoc, and I'm getting all these weird
> > spacing bugs!"  Do we really want to have to reply, "Here's a patch to the
> > style?" So again, I'm wondering -- given that it's easy to change
> > pandoc/citeproc so that the features of ieee.csl that you changed don't cause
> > any problems, should these features really be considered bugs?
> > 
> > I liked the suggestion on the xbiblio list that if style authors wanted to
> > include multiple adjacent spaces, they should use a unicode nonbreaking space
> > character. This is consistent with how pandoc works in general.
> > 
> > So, I think my view is that the best solution would be to change citeproc-hs
> > along the lines of my earlier patch, and convince Bruce to change the citeproc
> > spec to say that "adjacent spaces will be collapsed in output." I'm open
> > to being persuaded otherwise, though, and of course, this  solution
> > would require cooperation from Bruce.
> 
> I understand the difference between my patch and yours. Actually, as I
> said, yours should be:
> 
> hunk ./src/Text/CSL/Output/Pandoc.hs 141
> -    | otherwise        = clean' s [i] ++ clean s is
> +    | otherwise        = clean' s (i  :  clean s is)
> 
> which also fixes the "punctuation-in-quote" option (I told you this
> was a regression).
> 
> When running the test-suite you'll indeed see, now:
> 
> |NORM| [2] J. Doe, “Article”, *Journal of Generic Studies*, vol. 6, 2006, pp. 33-34.
> |TEST| [2] J. Doe, “Article,” *Journal of Generic Studies*, vol. 6, 2006, pp. 33-34.

Good, I've updated the test suite accordingly.

> This is due to the fact that in locales-en-US.xml there is:
>   <style-options punctuation-in-quote="true"/>
> 
> Now, I think we are quite aware that, if we decide to enforce a
> stricter policy, we need to fix the styles, and not answering to
> people with a patch. Is it worth the effort? I don't know, as I
> repeatedly said.
> 
> I understand that using the Zotero styles as they stand would be the
> easiest solution. I wonder if laziness should guide CSL design
> principles, though. CSL, which is not a document format but a
> programming language for formatting citations, offers plenty of
> facilities for dealing with this kind of issues, or with low input
> data quality (you will see that a huge amount of Zotero styles will
> produce extra commas and extra spaces when rendering, for instance, a
> article-journal reference type which lacks the container-title
> variable). Zotero styles are not using them because Zotero used to use
> CSL as a scripting language (we had to wait citeproc-js to see a real
> and complete implementation).

I agree with you on the general principle of making it easier to
spot bugs. It's a bad idea to make bugs less evident in normal cases,
if they will still emerge in corner cases.

However, with pandoc I want to be pragmatic. People are going to
need styles to use. So, either

(a) pandoc should work with the styles in the CSL 1.0 repository Bruce
is going to put up, or

(b) we should maintain a repository of "fixed" styles that will work
with pandoc.

(a) seems much easier.  And (a) is compatible with continuing to push
the other parties involved (e.g. Frank and Bruce) to fix the styles,
so that the processors can be made stricter.  That's why I'm in
favor of (a) for now...

> Nonetheless I keep wondering if a style in a test-suite (citeproc-test
> or the pandoc tests) should be bugged and invalid (ieee.cls pretends
> to be in the "numeric" class even though the class attribute only
> allows "note" and "in-text" values). This is the reason why all this
> started. Since there seems to be no consensus on the path to take,
> I'll just shut up.

I just applied your patch to ieee.csl.  If you want to send further
patches (e.g. with the "numeric" class), I can apply them too.
Of course, it would be best if you could get the Zotero version
fixed as well...

John

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                                     ` <8f044663-b02a-45bd-b299-60fef03bf457-n9fKM5ssUrqdjmvXPhoLGFYGCWtFR9XvQQ4Iyu8u01E@public.gmane.org>
@ 2010-12-07 16:35                                                                                                       ` John MacFarlane
       [not found]                                                                                                         ` <20101207163551.GC13385-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: John MacFarlane @ 2010-12-07 16:35 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ dsanson [Dec 01 10 12:55 ]:
> Perhaps a switch to shift between a "strict" behavior and a
> "forgiving" behavior. This might encourage user discovery of
> unintended spaces in existing styles, while still allowing us to use
> the styles as-is without endless headaches. Eventually, if the styles
> get cleaned up, the "forgiving" behavior could be dropped. And if the
> CSL community decides instead to leave the styles as-is, the "strict"
> behavior" could be dropped. At the moment, I'd think pandoc should
> default to asking citeproc-hs to be "forgiving" but provide a cli
> switch to ask it to be "strict".
> 
> One added benefit: I suspect the pandoc/citeproc-hs user community is
> smaller and more tech-savvy, on average, than the zotero/mendeley user
> community. By using the "strict" option, we'd be able to start finding
> and reporting style bugs without flooding the CSL mailing lists with
> confused complaints.
> 
> David

Perhaps this would be a useful feature, while this issue is in flux.

Andrea:  If you decide to add a 'strict' parameter to 'citeproc',
I'm willing to add a --strict-csl switch to pandoc that sets it to
True.  I'll leave it up to your judgement.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                                                         ` <20101203060948.GA24736-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-12-03  6:45                                                                                                           ` John MacFarlane
  2010-12-03 14:18                                                                                                           ` Andrea Rossato
@ 2010-12-07 17:27                                                                                                           ` Andrea Rossato
  2 siblings, 0 replies; 71+ messages in thread
From: Andrea Rossato @ 2010-12-07 17:27 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Thu, Dec 02, 2010 at 10:09:48PM -0800, John MacFarlane wrote:
> I found another odd case with the mhra style.
> 
>     pandoc --biblio tests/biblio.bib --csl tests/mhra.csl -t markdown
>     @item3 says...
>     ^D
>     John Doe and Jenny Roeed by Sam Smith[^1] says...
>     
>     Doe, John, and Jenny Roe, ‘Why Water Is Wet’, in *Third Book*, ed
>     by Sam Smith (Oxford: Oxford University Press, 2007).
>     
>     [^1]:
>         ‘Why Water Is Wet’, in *Third Book*,  (Oxford: Oxford University
>         Press, 2007).
> 
> The odd part is:  "JohnDoe and Jenny Roeed by Sam Smith[^1]"
> Presumably "ed by Sam Smith" should not be there...

I fixed this. The fix was a bit troublesome, though: now
suppress-author and author-in-text will work only with actual authors
and not with editors and/or translator.

A different approach would be to check if there is an author and, if
not, fall back to editors or translators. The problem is that with
this approach the result could be unpredictable.

Andrea

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Re: textual citation
       [not found]                                                                                                         ` <20101207163551.GC13385-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-09 11:23                                                                                                           ` Andrea Rossato
       [not found]                                                                                                             ` <20101209112343.GD28254-u31zCTIHpvLVI6Gt0zCidg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Andrea Rossato @ 2010-12-09 11:23 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Tue, Dec 07, 2010 at 08:35:51AM -0800, John MacFarlane wrote:
> +++ dsanson [Dec 01 10 12:55 ]:
> > Perhaps a switch to shift between a "strict" behavior and a
> > "forgiving" behavior. This might encourage user discovery of
> > unintended spaces in existing styles, while still allowing us to use
> > the styles as-is without endless headaches. Eventually, if the styles
> > get cleaned up, the "forgiving" behavior could be dropped. And if the
> > CSL community decides instead to leave the styles as-is, the "strict"
> > behavior" could be dropped. At the moment, I'd think pandoc should
> > default to asking citeproc-hs to be "forgiving" but provide a cli
> > switch to ask it to be "strict".
> > 
> > One added benefit: I suspect the pandoc/citeproc-hs user community is
> > smaller and more tech-savvy, on average, than the zotero/mendeley user
> > community. By using the "strict" option, we'd be able to start finding
> > and reporting style bugs without flooding the CSL mailing lists with
> > confused complaints.
> > 
> > David
> 
> Perhaps this would be a useful feature, while this issue is in flux.
> 
> Andrea:  If you decide to add a 'strict' parameter to 'citeproc',
> I'm willing to add a --strict-csl switch to pandoc that sets it to
> True.  I'll leave it up to your judgement.

I'm puzzled: I agree with you when you say we should be very careful
in adding command line options to pandoc. Moreover I'm coming to think
that pandoc is not a tool for debugging styles. My idea is to add a
tool to the citeproc-hs installation for running tests (from the
test-suite) and maybe debugging styles. I think this would be more
appropriate.

Moreover I think this issue should be addressed by the CSL
specification: --strict-csl is a bit misleading, since there is no
relaxed csl I'm aware of...;-)

On the other hand I'm really open to different approaches, so if you
think this is the path to take it's really just a matter of few
key-strokes to get a patch that does the job.

Andrea


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: textual citation
       [not found]                                                                                                             ` <20101209112343.GD28254-u31zCTIHpvLVI6Gt0zCidg@public.gmane.org>
@ 2010-12-09 16:29                                                                                                               ` John MacFarlane
  0 siblings, 0 replies; 71+ messages in thread
From: John MacFarlane @ 2010-12-09 16:29 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Andrea Rossato [Dec 09 10 12:23 ]:
> On Tue, Dec 07, 2010 at 08:35:51AM -0800, John MacFarlane wrote:
> > +++ dsanson [Dec 01 10 12:55 ]:
> > > Perhaps a switch to shift between a "strict" behavior and a
> > > "forgiving" behavior. This might encourage user discovery of
> > > unintended spaces in existing styles, while still allowing us to use
> > > the styles as-is without endless headaches. Eventually, if the styles
> > > get cleaned up, the "forgiving" behavior could be dropped. And if the
> > > CSL community decides instead to leave the styles as-is, the "strict"
> > > behavior" could be dropped. At the moment, I'd think pandoc should
> > > default to asking citeproc-hs to be "forgiving" but provide a cli
> > > switch to ask it to be "strict".
> > > 
> > > One added benefit: I suspect the pandoc/citeproc-hs user community is
> > > smaller and more tech-savvy, on average, than the zotero/mendeley user
> > > community. By using the "strict" option, we'd be able to start finding
> > > and reporting style bugs without flooding the CSL mailing lists with
> > > confused complaints.
> > > 
> > > David
> > 
> > Perhaps this would be a useful feature, while this issue is in flux.
> > 
> > Andrea:  If you decide to add a 'strict' parameter to 'citeproc',
> > I'm willing to add a --strict-csl switch to pandoc that sets it to
> > True.  I'll leave it up to your judgement.
> 
> I'm puzzled: I agree with you when you say we should be very careful
> in adding command line options to pandoc. Moreover I'm coming to think
> that pandoc is not a tool for debugging styles. My idea is to add a
> tool to the citeproc-hs installation for running tests (from the
> test-suite) and maybe debugging styles. I think this would be more
> appropriate.

I agree.  I was just being open to the other point of view.

> Moreover I think this issue should be addressed by the CSL
> specification: --strict-csl is a bit misleading, since there is no
> relaxed csl I'm aware of...;-)
>
> On the other hand I'm really open to different approaches, so if you
> think this is the path to take it's really just a matter of few
> key-strokes to get a patch that does the job.

No, I think having a testing tool with citeproc-hs is the best
way to go.

John


^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2010-12-09 16:29 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-11  1:49 [citeproc] textual citation Andrea Rossato
     [not found] ` <20101111014927.GP24988-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-12  6:36   ` John MacFarlane
     [not found]     ` <20101112063622.GA8676-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-12  8:43       ` John MacFarlane
     [not found]         ` <20101112084314.GA15038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-12 11:16           ` Andrea Rossato
     [not found]             ` <20101112111654.GE19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-12 16:08               ` John MacFarlane
2010-11-12 13:45           ` Nathan Gass
     [not found]             ` <4CDD4501.7030700-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
2010-11-12 16:13               ` John MacFarlane
2010-11-12 23:26               ` Andrea Rossato
2010-11-12 15:38           ` Andrea Rossato
     [not found]             ` <20101112153829.GG19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-12 16:16               ` John MacFarlane
2010-11-12 23:23           ` Andrea Rossato
     [not found]             ` <20101112232354.GH19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-13  2:56               ` John MacFarlane
     [not found]                 ` <20101113025645.GA25386-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-13  9:16                   ` Andrea Rossato
     [not found]                     ` <20101113091616.GK19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-13 11:40                       ` Andrea Rossato
     [not found]                         ` <20101113114018.GM19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-13 16:57                           ` John MacFarlane
2010-11-13 16:50                       ` John MacFarlane
2010-11-13  1:11           ` Andrea Rossato
     [not found]             ` <20101113011105.GJ19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-13  3:38               ` John MacFarlane
     [not found]                 ` <20101113033806.GA27595-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-13 13:15                   ` Andrea Rossato
     [not found]                     ` <20101113131538.GO19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-13 16:27                       ` John MacFarlane
     [not found]                         ` <20101113162702.GC1212-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-13 16:55                           ` Bruce
2010-11-14 13:26                           ` Andrea Rossato
     [not found]                             ` <20101114132646.GR19143-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-17  4:39                               ` John MacFarlane
     [not found]                                 ` <20101117043955.GA18136-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-17 21:49                                   ` Andrea Rossato
2010-11-19 19:51                                   ` John MacFarlane
     [not found]                                     ` <20101119195134.GB30277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-20  2:53                                       ` Andrea Rossato
     [not found]                                         ` <20101120025350.GA13438-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-21 18:23                                           ` John MacFarlane
     [not found]                                             ` <20101121182302.GK24768-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-22 17:09                                               ` Andrea Rossato
2010-11-23  9:56                                   ` Nathan Gass
     [not found]                                     ` <4CEB8FB6.807-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
2010-11-23 15:46                                       ` John MacFarlane
     [not found]                                         ` <20101123154639.GB12884-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-23 20:09                                           ` Andrea Rossato
2010-11-24  1:20                                           ` Nathan Gass
     [not found]                                             ` <4CEC6860.7020908-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
2010-11-24  2:39                                               ` John MacFarlane
     [not found]                                                 ` <20101124023950.GA25133-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-24  9:39                                                   ` Nathan Gass
     [not found]                                                     ` <4CECDD5D.6010400-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
2010-11-24 16:09                                                       ` John MacFarlane
     [not found]                                                         ` <20101124160951.GD1590-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-26 12:33                                                           ` Nathan Gass
     [not found]                                                             ` <4CEFA937.7040606-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
2010-11-27 15:12                                                               ` John MacFarlane
     [not found]                                                                 ` <20101127151254.GA535-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-27 18:58                                                                   ` Andrea Rossato
     [not found]                                                                     ` <20101127185836.GD32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-27 19:32                                                                       ` John MacFarlane
     [not found]                                                                         ` <20101127193232.GA3576-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-27 20:09                                                                           ` John MacFarlane
     [not found]                                                                             ` <20101127200931.GA4421-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-27 20:39                                                                               ` Andrea Rossato
     [not found]                                                                                 ` <20101127203907.GH32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-28 12:57                                                                                   ` Andrea Rossato
2010-11-28 13:03                                                                               ` Andrea Rossato
     [not found]                                                                                 ` <20101128130345.GK32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-28 16:16                                                                                   ` John MacFarlane
     [not found]                                                                                     ` <20101128161612.GB29510-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-29 16:10                                                                                       ` Andrea Rossato
     [not found]                                                                                         ` <20101129161059.GC20563-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-12-01 13:06                                                                                           ` Andrea Rossato
     [not found]                                                                                             ` <20101201130603.GJ10338-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-12-01 16:17                                                                                               ` John MacFarlane
     [not found]                                                                                                 ` <20101201161702.GD3038-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-01 20:55                                                                                                   ` dsanson
     [not found]                                                                                                     ` <8f044663-b02a-45bd-b299-60fef03bf457-n9fKM5ssUrqdjmvXPhoLGFYGCWtFR9XvQQ4Iyu8u01E@public.gmane.org>
2010-12-07 16:35                                                                                                       ` John MacFarlane
     [not found]                                                                                                         ` <20101207163551.GC13385-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-09 11:23                                                                                                           ` Andrea Rossato
     [not found]                                                                                                             ` <20101209112343.GD28254-u31zCTIHpvLVI6Gt0zCidg@public.gmane.org>
2010-12-09 16:29                                                                                                               ` John MacFarlane
2010-12-03  5:57                                                                                                   ` John MacFarlane
     [not found]                                                                                                     ` <20101203055730.GA24661-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-03  6:09                                                                                                       ` John MacFarlane
     [not found]                                                                                                         ` <20101203060948.GA24736-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-03  6:45                                                                                                           ` John MacFarlane
2010-12-03 14:18                                                                                                           ` Andrea Rossato
2010-12-07 17:27                                                                                                           ` Andrea Rossato
2010-12-03 14:11                                                                                                       ` Andrea Rossato
     [not found]                                                                                                         ` <20101203141139.GD14815-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-12-03 14:19                                                                                                           ` Andrea Rossato
     [not found]                                                                                                             ` <20101203141953.GF14815-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-12-03 15:40                                                                                                               ` John MacFarlane
     [not found]                                                                                                                 ` <20101203154032.GB28210-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-07 10:30                                                                                                                   ` Andrea Rossato
     [not found]                                                                                                                     ` <20101207103034.GA22516-u31zCTIHpvLVI6Gt0zCidg@public.gmane.org>
2010-12-07 16:30                                                                                                                       ` John MacFarlane
2010-12-03 15:30                                                                                                           ` John MacFarlane
2010-11-28 18:10                                                                                   ` John MacFarlane
     [not found]                                                                                     ` <20101128181002.GA30854-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-29 16:12                                                                                       ` Andrea Rossato
2010-11-27 20:10                                                                           ` Andrea Rossato
     [not found]                                                                             ` <20101127201014.GF32527-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-11-27 20:18                                                                               ` Andrea Rossato
2010-11-28  2:22                                                                               ` John MacFarlane
     [not found]                                                                                 ` <20101128022210.GA6819-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-01 11:28                                                                                   ` Andrea Rossato
     [not found]                                                                                     ` <20101201112806.GH10338-j4W6CDmL7uNdAaE8spi6tJZpQXiuRcL9@public.gmane.org>
2010-12-01 15:52                                                                                       ` John MacFarlane
2010-11-24 20:32                                                       ` Andrea Rossato
2010-11-16 23:17                   ` Nathan Gass

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).