ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Philipp Gesang <gesang@stud.uni-heidelberg.de>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: Bibliography, unicode strings, @ELECTRONIC, sorting and bibtex
Date: Tue, 18 Sep 2012 17:34:10 +0200	[thread overview]
Message-ID: <20120918153410.GA10513@phlegethon> (raw)
In-Reply-To: <20120918161945.5eeb0139@homerow>


[-- Attachment #1.1: Type: text/plain, Size: 7288 bytes --]

···<date: 2012-09-18, Tuesday>···<from: Marco Patzer>···

> 2012-09-18 Philipp Gesang <gesang@stud.uni-heidelberg.de>:
> 
> > [0] http://www.mail-archive.com/ntg-context@ntg.nl/msg62855.html
> 
> Thanks for the link. Since I usually don't deal much with
> different bibliography styles I tend to skip those threads.
> 
> > >            And BibTeX is used since it understands the semantics of
> > > bib files, although a pure ConTeXt/Lua solution would be possible.
> > > Without BibTeX this functionality would be missing since no one is
> > > willing to implement a parser for .bib databases.
> > 
> > Context happens to have such a parser, written in Lua. Probably
> > the best one around:
> > 
> > ·······································································
> > \starttext
> >   \startluacode
> >     local db = bibtex.new()
> >     bibtex.load(db, "filename.bib")
> >     table.print(db)
> >   \stopluacode
> > \stoptext
> 
> Interesting, I didn't know that. But the values are only parsed, not
> interpreted. That means the only thing left for BibTeX is to do is
> to interpret the ugly “author” field?

From my bibliography (this assumes authors are separated by
“ and ”; *warning*: ashamingly ugly code ahead):

·······································································
-- adapted from Roberto
-- www.inf.puc-rio.br/~roberto/lpeg.html
function citator.split (s, sep)
  if type(sep) == "string" then
    sep = P(sep)
  end
  local elem = C((1 - sep)^0)
  local p = Ct(elem * (sep * elem)^0)
  return lpegmatch(p, s)
end
local split = citator.split

-- Return a list of authors' names from a string separated by "and".
local _p_spaces = S" \n\t\v"^1
local _p_and    = _p_spaces * P"and" * _p_spaces
function citator.get_author_list (rawaut)
    if not stringfind(rawaut, "and") then return { rawaut } end
    return split(rawaut, _p_and)
end
local get_author_list = citator.get_author_list

do
    local wl = P{
        [1] = "words",

        left  = P"{",  right = P"}",
        space = P" ",  tabs  = S"\v\t",
        eol   = P"\n", whitespace = V"space" + V"tabs" + V"eol",

        inbrace = V"left" * (1 - V"right")^1 * V"right",
        other = (1 - V"inbrace" - V"whitespace")^1,

        elm = V"inbrace" + V"other",

        words = Ct((V"whitespace"^0 * C(V"elm"))^0)
    }

    -- Takes a string and splits it into words, returning a list of words.
    function citator.get_word_list(s)
        return lpegmatch(wl, s)
    end
end
local get_word_list = citator.get_word_list

-- from http://osdir.com/ml/lua@bazar2.conectiva.com.br/2009-12/msg00910.html
do
    local space = S" \t\v\n"
    local nospace = 1 - space
    local ptrim = space^0 * C((space^0 * nospace^1)^0)
    function citator.strip (s)
        return lpegmatch(ptrim, s)
    end
end


-- Return the formatted author field for one author string.
function citator.reverse_one_author (rawaut, form)
    local         listaut = get_word_list(rawaut)
    local formaut, tmpaut = "", {}
    if (#listaut > 1) then
        for i,j in next, listaut do
            listaut[i] = citator.strip(j)
        end
        lastname = listaut[#listaut] .. ","
        tableremove(listaut, #listaut)
        tmpaut[#tmpaut+1] = lastname
        for i,j in next, listaut  do tmpaut[#tmpaut+1] = j end
        for i,j in next, tmpaut   do formaut = formaut .. " " .. j end
    else
        formaut = listaut[1]
    end
    return formaut
end
local reverse_one_author = citator.reverse_one_author

-- Take a string of authors' names rawaut and return a list that is built
-- according to the global citator.cite_inv_author.
-- <string> ‘resultformat’: if it has the value ‘string’ then the function will
-- return a string instead of a table.
function citator.format_author_list (rawaut, resultformat)
    warn("author list", rawaut)
    local max        = citator.compress_authors -- <int>, default=3
    local authorlist = get_author_list(rawaut)
    local cnt = 1
    local tmplist = {}
    local citestyle = citator.styles[citator.cite_style] or fancy2
    local etal      = citestyle.cite_etal_string
    repeat
        if cnt == 1 then
            if citator.cite_author_form == "allinv"   or
               citator.cite_author_form == "firstinv" then
                tmplist[#tmplist+1] = reverse_one_author(authorlist[cnt])
                warn("num: "..cnt, authorlist[cnt])
            else -- don’t reverse anything
                tmplist[#tmplist+1] = authorlist[cnt]
            end
        elseif cnt > max then
            tmplist[#tmplist+1] = etal
            break
        else
            warn("num: "..cnt, authorlist[cnt])
            if citator.cite_author_form == "allinv" then
                tmplist[#tmplist+1] = reverse_one_author(authorlist[cnt])
            elseif citator.cite_author_form == "firstinv" then
                tmplist[#tmplist+1] = citestyle.cite_author_separator
                tmplist[#tmplist+1] = authorlist[cnt]
            else
                tmplist[#tmplist+1] = citestyle.cite_author_separator
                tmplist[#tmplist+1] = authorlist[cnt]
            end
        end
        cnt = cnt + 1
    until authorlist[cnt] == nil
    warn(#tmplist, tmplist[1])
    if resultformat == "string" then
        return tableconcat(tmplist)
    end
    return tmplist
end
local format_author_list = citator.format_author_list
·······································································

As you can see, all I have to offer is spaghetti :P And the
formatting rules for names (the fields author, bookauthor,
translator, editor, bookeditor, commentator, etc. pp.) are by no
means everything that bibtex handles.

The hard part is the formatting of entries according to cite
style (apa etc.) and method (short, number, full). Then strings
(ibidem, et. al) need to respect i18n. Sorting of the bib has to
take place on a certain set of fields in a certain order
depending on whether the entry has an author field or only an
editor or both ... and then there is the problem with names in
general:
http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

I don’t want to be spreading pessimism, but these problems are
easily understimated.

Philipp




> 
> 
> Marco
> 
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2012-09-18 15:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-18 10:28 Marco Patzer
2012-09-18 11:41 ` Schmitz Thomas A.
2012-09-18 12:25   ` Marco Patzer
2012-09-18 13:28     ` Philipp Gesang
2012-09-18 14:19       ` Marco Patzer
2012-09-18 15:34         ` Philipp Gesang [this message]
2012-09-18 13:48     ` Schmitz Thomas A.
2012-09-18 14:11       ` Marco Patzer
2012-09-18 15:15   ` Alan BRASLAU

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120918153410.GA10513@phlegethon \
    --to=gesang@stud.uni-heidelberg.de \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).