From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/572 Path: news.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: pandoc/citeproc issues: multiple bibliographies, nocite, citeonly Date: Tue, 30 Nov 2010 19:25:56 -0800 Message-ID: <20101201032556.GA28952@protagoras.phil.berkeley.edu> References: <20101121193229.GB25657@protagoras.phil.berkeley.edu> <4CE9AABB.1070705@informatik.uni-marburg.de> <4CEC6A61.1000309@trizeps.ch> <20101124033315.GC25133@protagoras.phil.berkeley.edu> <20101124050631.GA28014@protagoras.phil.berkeley.edu> <4CF43B30.9050400@trizeps.ch> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Trace: dough.gmane.org 1291173975 23047 80.91.229.12 (1 Dec 2010 03:26:15 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 1 Dec 2010 03:26:15 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncCO38oIeaEBDIgNfnBBoEBCDCaA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Dec 01 04:26:10 2010 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-pv0-f186.google.com ([74.125.83.186]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PNdKa-0006Je-VB for gtp-pandoc-discuss@m.gmane.org; Wed, 01 Dec 2010 04:26:09 +0100 Original-Received: by pvb32 with SMTP id 32sf2364693pvb.3 for ; Tue, 30 Nov 2010 19:26:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:date:from:to:subject :message-id:references:mime-version:in-reply-to:x-pgp-key:user-agent :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition; bh=WV8V6GHMBXNhvAPO/1G/26WZsGyTWUgLnw0qbOUMjjQ=; b=Xlf/0bcI+vdkrXLSl2wnwx0aZTvRJpKLTg5IEPCgBAF4OSMDS/Cf0PaurpCi3LYy2q 3LnZxnqq8lGIouo9KTf0a8fYBZLuFW03m2ACFz/PoDa7+cf7z223dgeyvbH6qBJ9tB3+ KuQGedftdis1LOauBfQMjsYIvNS3yxlMUL/+M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id:references :mime-version:in-reply-to:x-pgp-key:user-agent:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type:content-disposition; b=rAVJ5qFIYZdaXIQJ1jtoZV2gid0gwnhzsVif6xkE8T2qG1W36c3/HZ2N3jzqwKO4nr dcJM2Yavvi1lg2Ms7z0+W2Rhg9gnol53vN1waHuokIRywp9/WLgHO99UBGa/9sdUXElp clNELBlF5se9KQug8p15n1+KorTN5C/43ojqg= Original-Received: by 10.142.208.5 with SMTP id f5mr331380wfg.39.1291173960111; Tue, 30 Nov 2010 19:26:00 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.142.6.9 with SMTP id 9ls10936573wff.3.p; Tue, 30 Nov 2010 19:25:59 -0800 (PST) Original-Received: by 10.142.148.12 with SMTP id v12mr78025wfd.43.1291173958950; Tue, 30 Nov 2010 19:25:58 -0800 (PST) Original-Received: by 10.142.148.12 with SMTP id v12mr78024wfd.43.1291173958896; Tue, 30 Nov 2010 19:25:58 -0800 (PST) Original-Received: from cm03fe.IST.Berkeley.EDU (cm03fe.IST.Berkeley.EDU [169.229.218.144]) by gmr-mx.google.com with ESMTP id p40si8239136wfc.2.2010.11.30.19.25.58; Tue, 30 Nov 2010 19:25:58 -0800 (PST) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 169.229.218.144 as permitted sender) client-ip=169.229.218.144; Original-Received: from protagoras.phil.berkeley.edu ([128.32.137.142]) by cm03fe.ist.berkeley.edu with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (auth plain:jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org) (envelope-from ) id 1PNdKP-0008Qh-A1 for pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; Tue, 30 Nov 2010 19:25:58 -0800 Original-Received: by protagoras.phil.berkeley.edu (Postfix, from userid 1000) id 9946C1317C2; Tue, 30 Nov 2010 19:25:56 -0800 (PST) In-Reply-To: <4CF43B30.9050400-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org> X-PGP-Key: http://johnmacfarlane.net/jgm.asc User-Agent: Mutt/1.5.20 (2009-06-14) X-Original-Sender: fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 169.229.218.144 as permitted sender) smtp.mail=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: List-Post: , List-Help: , List-Archive: Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-Subscribe: , List-Unsubscribe: , Content-Disposition: inline Xref: news.gmane.org gmane.text.pandoc:572 Archived-At: +++ Nathan Gass [Nov 30 10 00:45 ]: > >>One possibility would be a special attribute on a header: > >> > >># Works cited {.bibliography src="mybib.json"} > >> > >># References {.bibliography src="foo.bib" include="item2,item3" > >> omit="item4" if-year="1999" only-if-type="primary"} > > I like this, especially as this syntax would allow the addition of > other features without adding new syntax later on. > > > > >PS. I'm sorely tempted to put off implementing these complexities > >til later, and release a simple version of pandoc/citeproc that just > >constructs a bibliography of works cited in the document and puts > >them at the end, more or less the way it does now. > > +1 one to that > > The only problem I see with that plan is the following: > The sensible default now would be to always add a bibliography. On > the other hand, if we have a syntax to include the bibliography > where ever we want, the better default would be to not include any > bibliography per default. > > But I think an appropriate warning for future incompatible changes > would be enough in this case, as the documents will be easy enough > to convert. I'd like to avoid having documents work now and then fail later, even if they're easy to convert. Abstracting from the details, and from the question of whether we're going to implement fancy filtering now, the main questions we need to decide are: 1. whether the bibliography file should be specified in the text or as a command-line option. (I think having both possibilities would be confusing -- it should be one or the other, I think.) 2. whether multiple bibliographies should be allowed, or just one bibliography at the end. The current implementation answers 1 by "command-line option." If we want the bibliography file to be specified in the text, then we need (a) a syntax for specifying this in the file (b) some architectural changes to account for the fact that the bibliography file won't be known until after the markdown parser has been called. (More on this below.) For (a), I currently favor either the XML syntax or the attribute syntax after a header that would serve as the header for the bibliography. Example: # References { bib="mybib.bib" } (Though this forces you to have a header before the bibliography, as the XML doesn't.) This would also go well with multiple bibliographies (question 2). And it could be extended to allow filtering when this is available (it might already be available, in the latest darcs citeproc): # Primary sources { bib="mybib.bib" include-type="primary" } Another alternative (and the only way to go if we don't allow multiple bibliographies) would be to add a metadata block and put the bibliography filename there. I'm probably going to add a metadata block at some point, but it would be nice to avoid doing it just yet, as it would require many more decisions. If we allow multiple bibliographies, we'll also need to change the code in Text.Pandoc.Biblio, but I think this should be fairly straightforward. I guess I'm tempted to: - allow multiple bibliographies - specify the source file in the markdown text, as above I'm not positive this will work, though. As I mentioned, there are technical hurdles to having the bibliography file specified in the text. The markdown reader itself can't do IO, so it can't read the file. So we can't read the bibliography (or even verify that it can be found) until we've parsed the markdown source. That means that we can't check potential citations as we parse, to see if they're in the bibliography; instead, we have to parse everything that might be a citation as a citation. But what if we have a citation with id "foo" (from "@foo" in the text), but the bibliography contains no corresponding item? Then we need to reinsert the literal text. It's easy enough to generate "@foo" from a Cite inline, but what if the Cite was generated by parsing latex + bibtex? This gets complicated. I did think of one solution, which I'd like to get Andrea's feedback on (as it might require citeproc changes). Currently when we parse a citation, we just leave the [Inline] part empty; this gets filled in by citeproc when it processes the citations with the bibliography. My suggestion: instead of leaving it empty, fill in the [Inline] part of the Cite with the literal text that should be included in the document if the citation isn't found in the database. This way, if citeproc doesn't find the item, it can simply leave the Cite alone (rather than raising an error). That seems simple; it just adds a bit of complexity to the parsers, which have to generate the "replacement text" when they parse a Cite. Thoughts? John