ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Re: Improved support for Norwegian in ConTeXt
       [not found] ` <6faad9f00702040916h5d31cfe1kc13a9a3fae5f0c52@mail.gmail.com>
@ 2007-02-10 14:10   ` Karl Ove Hufthammer
  0 siblings, 0 replies; only message in thread
From: Karl Ove Hufthammer @ 2007-02-10 14:10 UTC (permalink / raw)
  To: Mojca Miklavec; +Cc: dev-context, ntg-context

Sundag 04 februar 2007 18:16 skreiv Mojca Miklavec:

>I would suggest you to post some of the questions to the ntg-mailing
>list, where more Norwegian users can comment on it.

OK. I'm now crossposting this e-mail to both the dev and the ntg mailing list. 
See my answers to some of your questions below.

>On 2/4/07, Karl Ove Hufthammer wrote:
>> I'm writing this to suggest improvements in ConTeXt's support for the
>> Norwegian languages. ConTeXt already has rudimentary support for
>> Norwegian, but with some problems.
>>
>>
>> Language codes
>> --------------
>>
>> The main problem is that ConTeXt use the language code 'no' for Norwegian.
>> There actually *is* no written language called 'Norwegian'; Norway has two
>> official written languages, Norwegian Bokmål (ISO 639 language code 'nb')
>> and Norwegian Nynorsk (ISO 639 language code 'nn'). The current
>> definitions for 'no' in ConTeXt is for Norwegian Bokmål. (There is a ISO
>> 639 language code 'no' for Norwegian, but this should usually be used for
>> spoken Norwegian, or perhaps for transcriptions of spoken language.)
>>
>> The language code 'no' should be removed, and be replaced by the two
>> language codes 'nb' and 'nn'.
>
>Although I don't know the exact situation, a few remarks:
>
>- You should probably also provide the correct definitions for calling
>the language (so that one can say \mainlanguage[norwegian], but
>perhaps with what you consider to be the proper language tags). It's
>currently
>
>\installlanguage [norwegian]   [\s!no]
>\installlanguage [norsk]       [\s!no] % bonus switch
>
>You need to fix the two and perhaps add
>\installlanguage [???]       [\s!nb]
>\installlanguage [???]       [\s!nk]

OK. We will need:

\installlanguage [bokmal]   [\s!nb]
\installlanguage [nynorsk]   [\s!nn]

If it is possible to use non-ASCII characters safely, the following would also 
be nice:

\installlanguage [bokmål]   [\s!nb]

>- If you remove [no], older documents might break. I don't know much
>about the situation and the number of users, but can you say which of
>the two language variants [no] should default to? Since the current
>definitions probably point to "nb" (from the first blick) - would it
>make sense to use "nb" when one says \mainlanguage[no]?

Yes.

>Perhaps one can issue a warning when the language "no" is selected
>(statig something like "language 'no' is deprecated, please use 'nb'
>for Bokmål or nn for Nynorsk instead")

Yes, that would be the preferred solution. As Hans F. Nordhaug mentioned, 
the 'no' code should be considered deprecated in this context (no pun 
intended).

To sum up, we need the following language codes: nb and nn.
And we need the following mappings:

bokmal --> nb
bokmål --> nb (if possible)
nynorsk --> nn
norsk --> nb (with warning)
norwegian --> nb (with warning)

>Removing it probably doesn't affect the rest, so if other Norwegian
>users agree to remove it completely, it can still be done, but I would
>suggest you to ask the author of the original translations and the
>rest of users on the ntg-context mailing list first. Otherwise it can
>still default to one of the two varians (or to a new one if you
>provide also the third alternative for the "spoken language").
>
>> See http://en.wikipedia.org/wiki/Norwegian_language for a (not too good)
>> article on the Norwegian languages.
>>
>> For the record, the language names used in LaTeX/Babel is
>> (unfortunately) 'Norwegian' and 'norsk' for Norwegian Bokmål, and
>> 'nynorsk' for Norwegian Nynorsk, instead of 'bokmal'/'bokmål' and
>> 'nynorsk'. Norwegian Bokmål support was added first, and used up the
>> 'Norwegian' name.
>>
>>
>> Hyphenation
>> -----------
>>
>> The two written language are quite similar, and the current hyphenation
>> dictionary (nohyphbx) was made to support both. But there are (at least)
>> two words which are put in the hyphenation exceptions for this dictionary
>> because they would have different hyphenation (because of different
>> meaning) in Norwegian Nynorsk and Norwegian Bokmål. These are:
>>
>> attende -- nb: at-ten-de ('eighteenth'),       nn: att-en-de ('back')
>> betre   -- nb: be-tre ('enter'/'set foot on'), nn: bet-re ('better')
>>
>> Would it be possible to have two different hyphenation dictionaries for
>> 'nb' and 'nn', which would only differ in the hyphenation exceptions used
>> for these two words?
>
>This can be done. Hans was complaining about the mess of (naming of)
>Norwegian hyphenation patterns one month ago anyway, I guess that "he
>won't mind" adding yet another fix to the scripts ;)
>
>> Language setup
>> --------------
>>
>> Here is an improved/correct version of the language setup for Norwegian.
>> The setup for 'no' should be removed.
>>
>> \installlanguage
>>   [nn]
>>   [spacing=packed,
>>    lefthyphenmin=2,
>>    righthyphenmin=2,
>>    leftsentence=---,
>>    rightsentence=---,
>>    leftsubsentence=---,
>>    rightsubsentence=---,
>>    leftquote=\upperleftsinglesixquote,
>>    rightquote=\upperrightsingleninequote,
>>    leftquotation=\leftguillemot,
>>    rightquotation=\rightguillemot,
>>    date={day,{.},\ ,month,\ ,year},
>>    state=stop]
>>
>> This is for Norwegian Nynorsk ('nn'), but the same setup is used for
>> Norwegian Bokmål (the values used for 'day' differ, though -- see below).
>>
>> But I am not sure I understand what the four *sentence commands are used
>> for. We usually don't use em-dashes in Norwegian, so the entries look
>> incorrect. If you can explain what the commands are used for, I can supply
>> the correct Norwegian definitions.
>>
>> I also noticed that the Italian definitions use leftspeech, middlespeech
>> and rightspeech commands. What are these used for?
>>
>>
>> Other language-specific settings
>> --------------------------------
>>
>> Norwegian (Bokmål and Nynorsk) differs typographically from English in
>> several other ways. Here is three of them:
>>
>> We don't (usually) use bullets for the first level of unnumbered lists; we
>> use en-dashes.
>>
>> -- Item 1
>> -- Item 2
>> -- Item 3
>>
>> Bullets are commonly seen in document created by word processors of US
>> origin, and in the documents created by people without proper typographic
>> training, though. It would be nice if ConTeXt could use en-dashes by
>> default for lists in Norwegian text.
>
>The default is to use
>   bullet, dash, star, triangle
>for the four levels if itemization.
>
>If you want to change the behaviour in your document only, all you need to
> do is \definesymbol[1][\endash]
>but I guess that it could be adapted, so that Norwegian documents will
>all use endash by default.
>
>Similar supoprt has already been implemented for Slovenian (to use
>different set of characters when itemize uses characters).
>
>There are two questions:
>- do other Norwegian users agree to change the default set?
>- what should be the order then? (ie: what character should be used
>for the second level of itemization?)

My suggestion is

\definesymbol[1][{\symbol[dash]}]
\definesymbol[2][{\symbol[star]}]
\definesymbol[3][{\symbol[circle]}]
\definesymbol[4][{\symbol[bullet]}]
\definesymbol[5][{\symbol[triangle]}]

and leave levels 6+ at their defaults. Norwegian people, feel free to comment 
on this. :)

>> We don't use full stops in numbered lists. In other words, instead of
>>
>> 1. Item 1
>> 2. Item 2
>> 3. Item 3
>>
>> we write
>>
>> 1  Item 1
>> 2  Item 2
>> 3  Item 3
>
>That's the matter of
>\setupitemize[stopper=]
>
>I don't know how to set that in a langage-specific way, but it sounds
>reasonable me to add it.
>
>> The same holds for numbered headings, both in the main text and in the
>> TOC.
>
>But sections already start with
>   1 Section name
>rather than
>   1. Section name
>by default. (Support for the second case might be improved in the
>future. Or rather: I hope that it will be.)
>
>> Would it be possible to support this by default in ConTeXt?
>>
>> We also use the comma in decimal numbers (3,14 instead of 3.14).
>
>We too. In text this is no problem anyway. Math can be setup in that
>way, but I doubt that it's set up in any language (although it could
>be). This means that you should better write $3{,}14$ instead of
>$3,14$,

OK. I guess this is an adequate solution.

A problem occurs only when people write $3,14$ without thinking, and don't 
notice that the result looks really bad. In LaTeX, there is a package 
ncccomma that defines an 'intelligent' comma to fix this, so that you can use 
the comma as both a decimal separator and a list separator. (The comma in 
ncccomma is much more 'intelligent' than the one in 'icomma', BTW.)

>I don't know about any other consequences, since TeX almost 
>never writes out any calculated floats in the resulting document.
>
>> Norwegian labels
>> ----------------
>>
>> Here is labels for Norwegian (Bokmål and Nynorsk). The old 'no' labels
>> should be removed. The 'nb' ones are taken from the 'no' ones, but with
>> some corrections.
>>
>> Some comments: We don't usually capitalise the first letter in
>> crossreferences. Where one would in English write
>>
>> See Figure 5.22 ...
>>
>> we would write
>>
>> Se figur 5.22 ... (Bokmål)
>> Sjå figur 5.22 ... (Nynorsk)
>
>But when you crossreference, you only get 5.22, you have to write
>"figur" manually (you can set up that perhaps, so that you get
>"figure" attached to the number, but in any case you need to do that
>manually).

OK. No problem then. :)

>"Figur 5.22" will only be used under the actual image. When
>crossreferencing, we use lowercase too, but under the fugure itself I
>think that uppercase is OK, at least for our language (since it's
>caption of the figure anyway).
>
>> But we would of course write
>>
>> Figur 5.22 viser ...
>> (Figure 5.22 shows ...)
>>
>> The definitions below use a capital first letter. Will this be a problem?
>>
>> I was also unsure about what the 'lines' label should be. The plural of
>> 'line' ('linje') in Norwegian (both 'nb' and 'nn') is 'linjer', but we do
>> not use the plural when referencing more than one line. Where one would
>> write
>>
>> The discussion on lines 5--13 ...
>>
>> in English, we would write
>>
>> Drøftinga på linje 5--13 ...
>>
>> in Norwegian. In other words, we use the singular instead of the plural.
>> The same holds for the other cross-referencing terms ('Figure', 'Table'
>> &c.).
>>
>> Feel free to change the 'lines' label to 'linje' if this make it work
>> better.
>
>I don't know where exactly this is used, but I assume that it's for
>"List of Figures", "List of Tables". But I don't know exactly, I never
>use those. (I have just translated some of them and I hoped that the
>first one who will consider them wrong will complain ;)

:)

-- 
Karl Ove Hufthammer
E-mail and Jabber: karl@huftis.org

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2007-02-10 14:10 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200702041500.47359.karl@huftis.org>
     [not found] ` <6faad9f00702040916h5d31cfe1kc13a9a3fae5f0c52@mail.gmail.com>
2007-02-10 14:10   ` Improved support for Norwegian in ConTeXt Karl Ove Hufthammer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).