From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/78367 Path: news.gmane.org!not-for-mail From: Philipp Gesang Newsgroups: gmane.comp.tex.context Subject: Re: Bibliography, unicode strings, @ELECTRONIC, sorting and bibtex Date: Tue, 18 Sep 2012 17:34:10 +0200 Message-ID: <20120918153410.GA10513@phlegethon> References: <20120918122858.265bbad1@homerow> <20120918142533.24fb2c53@homerow> <20120918132854.GB9686@phlegethon> <20120918161945.5eeb0139@homerow> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0645536376==" X-Trace: ger.gmane.org 1347982469 26318 80.91.229.3 (18 Sep 2012 15:34:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 18 Sep 2012 15:34:29 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Tue Sep 18 17:34:31 2012 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TDzok-0007X4-U4 for gctc-ntg-context-518@m.gmane.org; Tue, 18 Sep 2012 17:34:31 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 4B072101F5; Tue, 18 Sep 2012 17:34:26 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 0RLpC4qUcBsx; Tue, 18 Sep 2012 17:34:23 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [IPv6:::1]) by balder.ntg.nl (Postfix) with ESMTP id 652A5101E6; Tue, 18 Sep 2012 17:34:23 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 7F831101E6 for ; Tue, 18 Sep 2012 17:34:22 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id w9QsXy4AwK0t for ; Tue, 18 Sep 2012 17:34:20 +0200 (CEST) Original-Received: from filter2-til.mf.surf.net (filter2-til.mf.surf.net [194.171.167.218]) by balder.ntg.nl (Postfix) with ESMTP id AFD2F101E0 for ; Tue, 18 Sep 2012 17:34:20 +0200 (CEST) Original-Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by filter2-til.mf.surf.net (8.14.3/8.14.3/Debian-9.4) with ESMTP id q8IFYFkd015980 for ; Tue, 18 Sep 2012 17:34:19 +0200 Original-Received: from ix.urz.uni-heidelberg.de (cyrus-portal.urz.uni-heidelberg.de [129.206.100.176]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id q8IFYECo029080 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 18 Sep 2012 17:34:15 +0200 Original-Received: from extmail.urz.uni-heidelberg.de (extmail.urz.uni-heidelberg.de [129.206.100.140]) by ix.urz.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id q8IFYEcE031460 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 18 Sep 2012 17:34:14 +0200 Original-Received: from localhost (p50846DA3.dip.t-dialin.net [80.132.109.163]) (authenticated bits=0) by extmail.urz.uni-heidelberg.de (8.13.4/8.13.1) with ESMTP id q8IFYC4r014171 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Tue, 18 Sep 2012 17:34:13 +0200 Mail-Followup-To: mailing list for ConTeXt users In-Reply-To: <20120918161945.5eeb0139@homerow> X-Operating-System: Linux phlegethon 3.5.3-1-ARCH X-License: "CC-BY-SA 3.0" User-Agent: Mutt/1.5.21 (2010-09-15) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=129.206.100.212; country=DE; region=01; city=Heidelberg; latitude=49.4167; longitude=8.7000; http://maps.google.com/maps?q=49.4167,8.7000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0TI0Dyfo7 - 5be7092e3147 - 20120918 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) on 194.171.167.218 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.14 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:78367 Archived-At: --===============0645536376== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="cWoXeonUoKmBZSoM" Content-Disposition: inline --cWoXeonUoKmBZSoM Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 > 2012-09-18 Philipp Gesang : >=20 > > [0] http://www.mail-archive.com/ntg-context@ntg.nl/msg62855.html >=20 > Thanks for the link. Since I usually don't deal much with > different bibliography styles I tend to skip those threads. >=20 > > > And BibTeX is used since it understands the semantics of > > > bib files, although a pure ConTeXt/Lua solution would be possible. > > > Without BibTeX this functionality would be missing since no one is > > > willing to implement a parser for .bib databases. > >=20 > > Context happens to have such a parser, written in Lua. Probably > > the best one around: > >=20 > > =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 > > \starttext > > \startluacode > > local db =3D bibtex.new() > > bibtex.load(db, "filename.bib") > > table.print(db) > > \stopluacode > > \stoptext >=20 > Interesting, I didn't know that. But the values are only parsed, not > interpreted. That means the only thing left for BibTeX is to do is > to interpret the ugly =E2=80=9Cauthor=E2=80=9D field? =46rom my bibliography (this assumes authors are separated by =E2=80=9C and =E2=80=9D; *warning*: ashamingly ugly code ahead): =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 -- adapted from Roberto -- www.inf.puc-rio.br/~roberto/lpeg.html function citator.split (s, sep) if type(sep) =3D=3D "string" then sep =3D P(sep) end local elem =3D C((1 - sep)^0) local p =3D Ct(elem * (sep * elem)^0) return lpegmatch(p, s) end local split =3D citator.split -- Return a list of authors' names from a string separated by "and". local _p_spaces =3D S" \n\t\v"^1 local _p_and =3D _p_spaces * P"and" * _p_spaces function citator.get_author_list (rawaut) if not stringfind(rawaut, "and") then return { rawaut } end return split(rawaut, _p_and) end local get_author_list =3D citator.get_author_list do local wl =3D P{ [1] =3D "words", left =3D P"{", right =3D P"}", space =3D P" ", tabs =3D S"\v\t", eol =3D P"\n", whitespace =3D V"space" + V"tabs" + V"eol", inbrace =3D V"left" * (1 - V"right")^1 * V"right", other =3D (1 - V"inbrace" - V"whitespace")^1, elm =3D V"inbrace" + V"other", words =3D Ct((V"whitespace"^0 * C(V"elm"))^0) } -- Takes a string and splits it into words, returning a list of words. function citator.get_word_list(s) return lpegmatch(wl, s) end end local get_word_list =3D citator.get_word_list -- from http://osdir.com/ml/lua@bazar2.conectiva.com.br/2009-12/msg00910.ht= ml do local space =3D S" \t\v\n" local nospace =3D 1 - space local ptrim =3D space^0 * C((space^0 * nospace^1)^0) function citator.strip (s) return lpegmatch(ptrim, s) end end -- Return the formatted author field for one author string. function citator.reverse_one_author (rawaut, form) local listaut =3D get_word_list(rawaut) local formaut, tmpaut =3D "", {} if (#listaut > 1) then for i,j in next, listaut do listaut[i] =3D citator.strip(j) end lastname =3D listaut[#listaut] .. "," tableremove(listaut, #listaut) tmpaut[#tmpaut+1] =3D lastname for i,j in next, listaut do tmpaut[#tmpaut+1] =3D j end for i,j in next, tmpaut do formaut =3D formaut .. " " .. j end else formaut =3D listaut[1] end return formaut end local reverse_one_author =3D citator.reverse_one_author -- Take a string of authors' names rawaut and return a list that is built -- according to the global citator.cite_inv_author. -- =E2=80=98resultformat=E2=80=99: if it has the value =E2=80=98st= ring=E2=80=99 then the function will -- return a string instead of a table. function citator.format_author_list (rawaut, resultformat) warn("author list", rawaut) local max =3D citator.compress_authors -- , default=3D3 local authorlist =3D get_author_list(rawaut) local cnt =3D 1 local tmplist =3D {} local citestyle =3D citator.styles[citator.cite_style] or fancy2 local etal =3D citestyle.cite_etal_string repeat if cnt =3D=3D 1 then if citator.cite_author_form =3D=3D "allinv" or citator.cite_author_form =3D=3D "firstinv" then tmplist[#tmplist+1] =3D reverse_one_author(authorlist[cnt]) warn("num: "..cnt, authorlist[cnt]) else -- don=E2=80=99t reverse anything tmplist[#tmplist+1] =3D authorlist[cnt] end elseif cnt > max then tmplist[#tmplist+1] =3D etal break else warn("num: "..cnt, authorlist[cnt]) if citator.cite_author_form =3D=3D "allinv" then tmplist[#tmplist+1] =3D reverse_one_author(authorlist[cnt]) elseif citator.cite_author_form =3D=3D "firstinv" then tmplist[#tmplist+1] =3D citestyle.cite_author_separator tmplist[#tmplist+1] =3D authorlist[cnt] else tmplist[#tmplist+1] =3D citestyle.cite_author_separator tmplist[#tmplist+1] =3D authorlist[cnt] end end cnt =3D cnt + 1 until authorlist[cnt] =3D=3D nil warn(#tmplist, tmplist[1]) if resultformat =3D=3D "string" then return tableconcat(tmplist) end return tmplist end local format_author_list =3D citator.format_author_list =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 As you can see, all I have to offer is spaghetti :P And the formatting rules for names (the fields author, bookauthor, translator, editor, bookeditor, commentator, etc. pp.) are by no means everything that bibtex handles. The hard part is the formatting of entries according to cite style (apa etc.) and method (short, number, full). Then strings (ibidem, et. al) need to respect i18n. Sorting of the bib has to take place on a certain set of fields in a certain order depending on whether the entry has an author field or only an editor or both ... and then there is the problem with names in general: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-na= mes/ I don=E2=80=99t want to be spreading pessimism, but these problems are easily understimated. Philipp >=20 >=20 > Marco >=20 > _________________________________________________________________________= __________ > If your question is of interest to others as well, please add an entry to= the Wiki! >=20 > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-co= ntext > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > _________________________________________________________________________= __________ --=20 () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments --cWoXeonUoKmBZSoM Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlBYlHIACgkQ02lYlJYWs9KG8ACfb32gTIZkf3JSiie9Dr51sm4a hx4AnAv3EL70/bxVczNFotrZIW44IWcu =3NNp -----END PGP SIGNATURE----- --cWoXeonUoKmBZSoM-- --===============0645536376== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============0645536376==--