From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/27946 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: Reference IDs in XML output Date: Sun, 14 Mar 2021 14:18:03 -0700 Message-ID: References: <877dmc9l7b.fsf@zeitkraut.de> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35394"; mail-complaints-to="usenet@ciao.gmane.io" To: Albert Krewinkel , pandoc-discuss Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBGH3XGBAMGQEIELZCBY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sun Mar 14 22:18:19 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-pl1-f192.google.com ([209.85.214.192]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1lLY7v-00095W-9L for gtp-pandoc-discuss@m.gmane-mx.org; Sun, 14 Mar 2021 22:18:19 +0100 Original-Received: by mail-pl1-f192.google.com with SMTP id h4sf15349129plf.3 for ; Sun, 14 Mar 2021 14:18:19 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1615756698; cv=pass; d=google.com; s=arc-20160816; b=QW7ZO8Dil6QmUYqKNLaVjQoy1forWKnIFt5fEZhut7ZAQMHE6DujI6Zc6x5r03BBOf //vAr7fyLzWg2ugIaedDsjBhtAPa57HXr0imbPif6aNZxkkLScfowRHczSOUWaZWmYIi MY7hlA4oxgNyMqM+Nlq/g+F/SQ3ZUWgRCU5QUZCUCJLTljGhpHaeM2+STYw+MwYHi+qn qfiC0hlR/9ATccGbLFWZO1XJqQ17bd+0jD9kWytLZEhy4yNE4ge6bG5X6ZLLpnqNVYWH LGPvw2tmmWhrHlBd2OfTn5oNI20K/7YRSw8/ALEUwJO2tB7AZ4ThYNac4O6z1etGsOth 6Sgw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :mime-version:message-id:date:references:in-reply-to:subject:to:from :sender:dkim-signature; bh=GoI08A6EawbywcjQaIuKaCdD5mq7YME2A4mY81XmSOs=; b=ARbNUzpgADMhJ9/nb9QEX0dbgxFnuM2sIAPqYfehn+LLT+7cnrpZaW76c2a8HA1BRK oLJQzIEcBvPkxQjzVKJTV+ph94JJrzWMIulSI0wJ3NdTW/q3tZQHptXYYymp5I7S2887 cKtLGx8VnE/UaAzJGDhRWvaeQwl6huqZ6iuHanurFRfNFm0OWOr0i5suKYl1qkWaJiXG h5cVIFXpVgmhGxQAkuak4CH+2dhGLYEmolKwM3APkj5DWxE0CI5EEfG+dt1sShDYcbJV j8H2vyFcxSKUZTVdDf6tX29p73qsHkhGWXWnnCb/xBxQ/Kq8l84UsWpsK2YpIwM6235p grIg== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=FIwFGAWZ; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::530 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=GoI08A6EawbywcjQaIuKaCdD5mq7YME2A4mY81XmSOs=; b=OLTV3m7SH9kKdnnE6NBsnq6CAufsHlSQY+CTUDJW+dNtaEm+ILLatSSInWT7UyAA9j 3fPNq2BFalOXaIzb27zm166aBA/46HxSxXLOY7WFdCygy8zK8J+ORnWEgVHa3GMW7fyW 7Qqy/2eQzoN56auFGZ0vzKmLb2cMn/t80BebgWZ94HX4+qUZvOwZsKA7QS47dukeZLbz nHI3EJTCyQDckND6GzRUXwNF4xnwE3YxjOZt1L4lzs+0d6UamFP28TmbQGK7wsZjEh60 GYd72slSOFfOr5YZy7IT9SrgZ5cJS2dezTRDfU+ChEe8U+2EnUKAJU1KniME98fFgoHu 6v0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:subject:in-reply-to:references :date:message-id:mime-version:content-transfer-encoding :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=GoI08A6EawbywcjQaIuKaCdD5mq7YME2A4mY81XmSOs=; b=GEDEITCzbDKh2uIYARd22ry1xQgMtdTuBo87ku9oL0HkM6qe8p3R/6FZH93e1c+hgl Mofbfy0yYSIhprQoYsLyuKC9desi+tsBm+TJo5NBg+DgPtr8NRz71JcRzqi4wBmmj7yA WS1Kg9QIvzqQOHkS+jz6LQq6OQTlgc76ajPJZP7M8YT+MQwCnwcztRdr7SltjULf2QWC xYh3B7qFxjghto94RPu6X2fahkHx/vLZqHK9SeGldqTiN2YKXakQT5G2Ee80Knrvd6TI dHnJudzn16c9zURkJNbMmD4coOcS63iwCWca9QBKVr4PMkMkI9G1Mx49zK80zXI0/woS Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM53139BtjlKD+16fXthHln2a4/+lMnB0j5AR94FCj2Z7UkWrK5PqI 3b51twbz54fkVtWGmcMtpCs= X-Google-Smtp-Source: ABdhPJxHN/9k5j5hIyvFRlTXZ0mnqGGIbH1v998nsPoS6rpuSVeenAdQimHARmDyHqeWRHPV57sN6w== X-Received: by 2002:aa7:90c5:0:b029:1e3:5e84:4a7c with SMTP id k5-20020aa790c50000b02901e35e844a7cmr7602725pfk.71.1615756698051; Sun, 14 Mar 2021 14:18:18 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a17:90b:1057:: with SMTP id gq23ls8164457pjb.0.gmail; Sun, 14 Mar 2021 14:18:16 -0700 (PDT) X-Received: by 2002:a17:90a:3cc6:: with SMTP id k6mr9436751pjd.212.1615756696332; Sun, 14 Mar 2021 14:18:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1615756696; cv=none; d=google.com; s=arc-20160816; b=X7ybShOwDYMEwUWDmrgf0MoCYaT0qXqDPUnL8PvuxLOSwo8qyuVnAhzt9aYOClY9/C NtURHfLq1L138BmIUVw3HlTBtLkPYXF20MT376oo9ZmJemlEuZpelHHsh+W6YA0pivkf BlT7eZw4EqqH04EbDjjHMmVxZw+Z8BEW1emkAU69EL+f5QCcMQOZQgFVCOw+3cG9efOM kHgwy3T3MhouxlCqcr7DEHIAUmjyQHlQYHxsE6z68vA/3zvmNYn1I6ElrNOi5p0jD0eK pyAjrMlr/yPG2D8R44xkSb9qzn0DYnD5FOx3mYKipXROe0O7hpQgJ+O6H3iy0SwkOn97 O1bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:to:from:dkim-signature; bh=CVjeVGNU+aseJjIqEdlSLU47xvLQG4x4F4nbAAnPBLI=; b=p2YUwA3dUV6EgQJmRagvig67foH2vOTnhlqVgnAGiqwhRTfDCGElFpfzCl0/5YM0r0 x6Mvo9fbERBTnFGi3V/YSw2b5wVjswAYmBkyOxO+5oiCXakTWpV92cElqoemG24sf0kc O4JPMPWK2wqiHraOttsMswpgDJPl2WHD2AEEDNMTyh2LKmayo+gYAjQLbElYq9Sa6kEL q6C0FwYeazG1oYGRUsQ6kVbVp+g8sDztthmZB9YSXQLJXu5lo2Iy1OU90c7C9jBsFMkO bQ43L2EgEaHif7o1+nXEd+EVl3n2SsJIg099F409hDHbzGHesBQSRw85p2ZKwrNYK6wI W65A== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=FIwFGAWZ; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::530 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com. [2607:f8b0:4864:20::530]) by gmr-mx.google.com with ESMTPS id n10si664010pgq.2.2021.03.14.14.18.16 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 14 Mar 2021 14:18:16 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::530 as permitted sender) client-ip=2607:f8b0:4864:20::530; Original-Received: by mail-pg1-x530.google.com with SMTP id w34so18218072pga.8 for ; Sun, 14 Mar 2021 14:18:16 -0700 (PDT) X-Received: by 2002:a62:8203:0:b029:1f1:5ceb:4be7 with SMTP id w3-20020a6282030000b02901f15ceb4be7mr7828771pfd.48.1615756695371; Sun, 14 Mar 2021 14:18:15 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id 22sm4937986pjl.31.2021.03.14.14.18.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 14:18:14 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 0EFD5A182; Sun, 14 Mar 2021 17:18:04 -0400 (EDT) In-Reply-To: <877dmc9l7b.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=FIwFGAWZ; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::530 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:27946 Archived-At: If I recall, we have a similar issue in the LateX writer, since labels in LaTeX are limited in what they can contain. We use this function to map an identifier to a label: toLabel :: PandocMonad m =3D> Text -> LW m Text toLabel z =3D go `fmap` stringToLaTeX URLString z where go =3D T.concatMap $ \x -> case x of _ | (isLetter x || isDigit x) && isAscii x -> T.singleton x | x `elemText` "_-+=3D:;." -> T.singleton x | otherwise -> T.pack $ "ux" <> printf "%x" (ord x) Maybe something like this could work? Albert Krewinkel writes: > There is a small problem which I noticed lately: citation keys are used > as part of the id of the respective reference item; e.g., if a citation > has `@misc{foo, ...}` then the bibliography entry has id=3D"ref-foo". Thi= s > can be a problem when generating XML output, as the citation keys may > contain characters which are not allowed in XML names. E.g., BibTeX > allows slashes as part of the identifier, but those are illegal in an > `id` attribute, leading to the generation of invalid XML documents. As > far as I can see, this affects JATS, TEI, HTML4, and EPUB2. The HTML5 > standard is less restrictive, so EPUB3 is unaffected. > > I'd like to fix the problem, but am not sure where and how. > > - Where: in each affected writer, or in citeproc? > - How: by removing the offending characters, or by using a different > scheme to generate reference identifiers? Numbering, hashing, =E2=80=A6= ? > Do we check for duplicates, or can we assume that identifiers with > prefix "ref-" are reserved for pandoc? > > The more I think about this, the more questions I have and by now I'm > overthinking it. Any help to get me back to the ground is appreciated. > > > -- > Albert Krewinkel > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 > > --=20 > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/877dmc9l7b.fsf%40zeitkraut.de. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/m2ft0x8mes.fsf%40MacBook-Pro.hsd1.ca.comcast.net.