Nick, John, check out the documentation and code for Unicode::Collate::Locale which I linked for a lucid description of what seems to be the official syntax for locale tags, and an algorithm to handle them, progressively falling back to something more general depending on what is (not) included in the tag. Den ons 7 apr. 2021 11:36'Nick Bart' via pandoc-discuss < pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> skrev: > Bastien, BJP - many thanks, that’s helpful. Still, the main practical > question, > I guess, is whether the default sort order the "new" pandoc generates for > French > - either with or without the "optFrenchAccents" modification - is > acceptable > from the point of view of a native speaker of French or not, and if not, > what > you would suggest instead. > > > As to multiple collations, I commented earlier: > > > ... I tend to think that the default collation (which usually seems to > follow > > the most recent rules for a given language) would usually be sufficient. > > That being said, it seems that most of the information (in > https://github.com/jgm/unicode-collation/tree/main/data) and, I assume, > infrastructure for supporting different collation systems for a given > language is > in place already, so the following might be worth a try: > > pandoc is relying on IETF BCP 47 language tags anyway > [https://tools.ietf.org/rfc/bcp/bcp47.txt]. > > A number of locale attributes contained in the Common Locale Data > Repository > (CLDR), including those pertaining to collation, can be expressed as > extensions > to "simple" language tags of the form "en-US". > > IETF BCP 47 Extension U (Unicode Locale) is described in RFC 6067 > [https://tools.ietf.org/html/rfc6067]. Relevant quote: > > > For example, the language tag "de-DE-u-attr-co-phonebk" consists of: > > > > o The base language tag "de-DE" (German as used in Germany), exactly > as > > defined by [BCP47] using subtags from the IANA Language Subtag > Registry. > > > > o The singleton 'u', identifying this extension. > > > > o The attribute 'attr', which is an example for illustration (no > > attributes were defined at the time this document was published). > > > > o The keyword 'co-phonebk', consisting to the key 'co' (Collation) > and the > > type 'phonebk' (Phonebook collation order). > > On IETF BCP 47 extensions, see also > https://www.w3.org/International/articles/language-tags/#extension. > > So if this does not appear too difficult, it might provide a lot of > additional > flexibility if pandoc were to support the particular subset of "Extension > U" > strings pertaining to collation, i.e., those starting with "-u-co-" in > pandoc's > "lang" metadata field, or command line argument. (In the absence of such a > string, > pandoc should of course use the default collation order.) > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/fkSA06gm5QfCBknaCRunOSZwTsdOX6DMRGx0IQVOs9yszm16IeaCsTwX_cV-nhZ1kQ0LDEkxylV4IKJzSuiZbkjx3HSyD2NLgJTkW9DQB6U%3D%40protonmail.com > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAQhmzFhxTMFmJYSBjWb_wU%2Bi1dJnPCVREngwpO8zXdsg%40mail.gmail.com.