From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/6569 Path: news.gmane.org!not-for-mail From: Makaken Affe Newsgroups: gmane.text.pandoc Subject: Re: How to extract all citation keys from a document Date: Wed, 5 Jun 2013 12:12:40 -0700 (PDT) Message-ID: References: <06b38ec2-b028-48d8-89cf-50b3151158d8@googlegroups.com> <20130604190129.GC5256@protagoras.phil.berkeley.edu> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_61_18767279.1370459560036" X-Trace: ger.gmane.org 1370459562 9612 80.91.229.3 (5 Jun 2013 19:12:42 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 5 Jun 2013 19:12:42 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBD57ZDMXVMHRBKE3X2GQKGQEMD5DLMQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Jun 05 21:12:44 2013 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ye0-f191.google.com ([209.85.213.191]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UkJ8U-0006BN-1r for gtp-pandoc-discuss@m.gmane.org; Wed, 05 Jun 2013 21:12:42 +0200 Original-Received: by mail-ye0-f191.google.com with SMTP id g12sf540152yee.28 for ; Wed, 05 Jun 2013 12:12:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=x-beenthere:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-google-group-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type; bh=Jb0IBOa6ajpTmwOuuT3Vp4DSMsNSozBqX8FwBH3N8wY=; b=y5eMhp9/Jn67Ql5ji5UQLaxWcFFKQvBnwQrdIVFRUlzSCk4QAgJtHny7PnrWGCTa// B03GmpYEqf+AP93KO8JCQeaeBJCXRv/EMWVLdxuziiK8GtmEoBtY1x5psz8bqi8w/CeI vACGSOj2lrWa+JyQV5/NvnYZKNh/oUL9D02gAtMdp+gMe20oxmbH6q8qUYFX/iA3oxje 9fFRFlyc6I5cPE+OuVbK5CcCVv8uxlFumX6BPvcGIIimp1sehx2DD8FlNXotRGGEDj+d EK+IGjdp19fRtdey49w5BjGgDtq/I4reYnlobvoVxUCMnT+qKAj7962Mwq4jUeTlJLXj uj/w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=x-beenthere:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-google-group-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type; bh=Jb0IBOa6ajpTmwOuuT3Vp4DSMsNSozBqX8FwBH3N8wY=; b=sqhHzHBvDGPwp1T9+2ZS0OaoFt/imd3j98cxYDhWBbuTSnWzywQoMTt220UBfKXG3g gi3K5PhQ2MdknKG5PgmG3K7B7HZHEUGz2CsdGTJ80HxfD5HBrysi0Vle6lrzqBUFW7oA idE9LJYjuNvdZqs9nBMlBDBaF+UkFFMEU03xpYwe7/QZZTNfgbw4z0Y/WNGEy6T2jPMy eEncHA5iR6xzv5L7/YW/psIyUvAbYb5qXCcFJIhxmfhTFphc3LBvoPEezGhv+9pQI1yI qk2bgnc9sKW7tlTzKsDZomK8YGuYPONqPZV7sBkyfoLMTVi47hBxZBvzcD7lfcYi3GAC 3CeA== X-Received: by 10.49.0.200 with SMTP id 8mr289978qeg.38.1370459561059; Wed, 05 Jun 2013 12:12:41 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.49.0.132 with SMTP id 4ls225532qee.18.gmail; Wed, 05 Jun 2013 12:12:40 -0700 (PDT) X-Received: by 10.49.95.3 with SMTP id dg3mr2525032qeb.41.1370459560525; Wed, 05 Jun 2013 12:12:40 -0700 (PDT) In-Reply-To: <20130604190129.GC5256-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> X-Original-Sender: sieheauch-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-Subscribe: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:6569 Archived-At: ------=_Part_61_18767279.1370459560036 Content-Type: text/plain; charset=ISO-8859-1 Thanks for answer and suggestions. As my citation ids have known format, its just a simple grep at the command line: egrep -o '@[A-Za-z]+[0-9]+[a-z]*' paper.md fiddlosopher wrote: Sorry, there's currently no way to do this. '@foo' is ambiguous in > pandoc markdown. It could be a reference to an example list, or a plain > string '@foo', or a citation. > I just want to extract potential citation ids, so some false-positives are ok. However it would be nice at least to exclude obvious cases, such as @foo inside code blocks. Anyway, a workflow is also possible with the simple regexp hack: #. extract possible citation ids #. remove blacklisted ids #. get references and report which ids have not been found #. manually adjust blacklist (false positives) Cheers! -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/aa6ac405-ac89-47e5-a8e8-896688d3325b%40googlegroups.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out. ------=_Part_61_18767279.1370459560036 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks for answer and suggestions. As my citation ids have known format, it= s just a simple grep at the command line:

    egrep -= o '@[A-Za-z]+[0-9]+[a-z]*' paper.md

fiddlosopher wrote:

Sorry, there's currently no way to = do this.  '@foo' is ambiguous in
pandoc markdown.  It could be a reference to an example list, or a= plain
string '@foo', or a citation.

I just want to e= xtract potential citation ids, so some false-positives are ok. However it w= ould be nice at least to exclude obvious cases, such as @foo inside code bl= ocks. Anyway, a workflow is also possible with the simple regexp hack:
=
#. extract possible citation ids
#. remove blacklisted ids
#. get= references and report which ids have not been found
#. manually adjust = blacklist (false positives)

Cheers!

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/aa6ac40= 5-ac89-47e5-a8e8-896688d3325b%40googlegroups.com?hl=3Den-US.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
------=_Part_61_18767279.1370459560036--