ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Adding built-in support for Serbian language
@ 2020-10-30 10:31 Ivan Pešić
  2020-10-30 12:42 ` Mojca Miklavec
  0 siblings, 1 reply; 9+ messages in thread
From: Ivan Pešić @ 2020-10-30 10:31 UTC (permalink / raw)
  To: ntg-context

Hello all,
I have recently started using ConTeXt. I've found that the distribution
includes a proper (cyrillic) hyphenation file for Serbian language,
but a complete language support is still not implemented. Therefore,
I've added what I think is required, did some testing by putting changed
files in my texmf-local, and the result looks fine.
There is only one thing that requires a decision from the development team.
Serbian language uses two scripts: cyrillic and latin. Context language
codes are using 2 letters for identification. So I'm not sure how to
include both scripts.
What I'm sending now is a cyrillic script implementation, using the code
"sr".

It is trivial to generate (completely automatic) latin script version of
these changes, once it is decided how to label it.

Best regards,
Ivan
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Adding built-in support for Serbian language
  2020-10-30 10:31 Adding built-in support for Serbian language Ivan Pešić
@ 2020-10-30 12:42 ` Mojca Miklavec
  2020-10-30 14:07   ` Hans Hagen
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Mojca Miklavec @ 2020-10-30 12:42 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Dear Ivan,

On Fri, 30 Oct 2020 at 11:32, Ivan Pešić wrote:
>
> Hello all,
> I have recently started using ConTeXt.

Welcome!

> I've found that the distribution
> includes a proper (cyrillic) hyphenation file for Serbian language,

I would say that this needs to be changed/improved.
There's no reason why it wouldn't load both scripts at the same time
(at least for Unicode engines, which is the only thing that's
currently supported anyway).

This is what XeTeX loads, for example:
    https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/loadhyph/loadhyph-sr-latn.tex#L25

    \input hyph-sh-latn.tex
    \input hyph-sh-cyrl.tex
That is: it loads both patterns at the same time.

Hans, would you be willing to merge two sets of hyphenation patterns together?
Alternatively maybe we could prepare hyph-sh.pat.txt on the hyph-utf8 side?
I'm actually not sure why we didn't do that already, but maybe it was
because we have two sets of cyrillic patterns and it has never been a
clear cut which ones to take.

The author of hyph-sh-[latn|cyrl] says that his patterns should work
universally for multiple languages (they are relatively old), but they
were initially only released for the Latin scripts. Later another
author wanted to have support for Cyrillic script and prepared his own
patterns (I'm no longer sure whether they were partially based on the
other ones) without the Latin alternative.

In Xe(La)TeX and Lua(La)TeX we use the "sh" patterns for both, for
consistency reasons, among others. (You likely want the same word to
be hyphenated in the same way in both scripts).

> but a complete language support is still not implemented. Therefore,
> I've added what I think is required, did some testing by putting changed
> files in my texmf-local, and the result looks fine.

Awesome, thank you.

> There is only one thing that requires a decision from the development team.
> Serbian language uses two scripts: cyrillic and latin. Context language
> codes are using 2 letters for identification. So I'm not sure how to
> include both scripts.

(Unless has plans to transliterate the translations on the fly :)
there should be two independent files. One should use the code sr-latn
and the other one sr-cyrl.

Two letter code simply doesn't work in this situation and we should
not even try to support one single script, or even attempt to decide
which one should be the default one. Both should be supported equally
well.

> What I'm sending now is a cyrillic script implementation, using the code
> "sr".
>
> It is trivial to generate (completely automatic) latin script version of
> these changes, once it is decided how to label it.

Would you be willing to also prepare the latin one then?
The codes should be sorted out by Hans (potentially with some help),
but we definitely want to use "sr-latn" and "sr-cyrl".

For the longer names there is some more freedom. LaTeX uses "serbianl"
and "serbianc", I think, but I believe we can come up with something
nicer.
Maybe something along the lines of the following?
    \mainlanguage[serbian][script=latn]
or
   \mainlanguage[serbian-latin]
   \mainlanguage[serbian-cyrillic]
No clue, really.

Thank you,
    Mojca

(PS: I would say that adding support for transliteration of the text
from one script to the other would be a really nice feature. Then you
could type your text for a book once and have it typeset in both
versions without any extra effort :)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Adding built-in support for Serbian language
  2020-10-30 12:42 ` Mojca Miklavec
@ 2020-10-30 14:07   ` Hans Hagen
  2020-10-30 16:45   ` Henri Menke
  2020-11-03 13:10   ` Hans Hagen
  2 siblings, 0 replies; 9+ messages in thread
From: Hans Hagen @ 2020-10-30 14:07 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Mojca Miklavec

On 10/30/2020 1:42 PM, Mojca Miklavec wrote:

> I would say that this needs to be changed/improved.
> There's no reason why it wouldn't load both scripts at the same time
> (at least for Unicode engines, which is the only thing that's
> currently supported anyway).

i'll look into it once i finished some new stuff (in the middle of fit)

> (PS: I would say that adding support for transliteration of the text
> from one script to the other would be a really nice feature. Then you
> could type your text for a book once and have it typeset in both
> versions without any extra effort :)
just gimme the specs ... sounds like some nice distraction for a rainy 
weekend

Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Adding built-in support for Serbian language
  2020-10-30 12:42 ` Mojca Miklavec
  2020-10-30 14:07   ` Hans Hagen
@ 2020-10-30 16:45   ` Henri Menke
  2020-11-03 13:10   ` Hans Hagen
  2 siblings, 0 replies; 9+ messages in thread
From: Henri Menke @ 2020-10-30 16:45 UTC (permalink / raw)
  To: ntg-context

> (PS: I would say that adding support for transliteration of the text
> from one script to the other would be a really nice feature. Then you
> could type your text for a book once and have it typeset in both
> versions without any extra effort :)

There is Philipp Gesang's transliterator package:
https://gitlab.com/phgsng/transliterator
https://modules.contextgarden.net/cgi-bin/module.cgi/ruid=199735311/action=view/id=50

Cheers, Henri
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Adding built-in support for Serbian language
  2020-10-30 12:42 ` Mojca Miklavec
  2020-10-30 14:07   ` Hans Hagen
  2020-10-30 16:45   ` Henri Menke
@ 2020-11-03 13:10   ` Hans Hagen
  2 siblings, 0 replies; 9+ messages in thread
From: Hans Hagen @ 2020-11-03 13:10 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Mojca Miklavec

Hi Mojca,

>      \input hyph-sh-latn.tex
>      \input hyph-sh-cyrl.tex
> That is: it loads both patterns at the same time.
> 
> Hans, would you be willing to merge two sets of hyphenation patterns together?
> Alternatively maybe we could prepare hyph-sh.pat.txt on the hyph-utf8 side?
> I'm actually not sure why we didn't do that already, but maybe it was
> because we have two sets of cyrillic patterns and it has never been a
> clear cut which ones to take.

I think that a merged file is the most natural approach (isn't it "sr" 
instesad od "sh"?). I can of course add all kind of code for merging btu 
at some point I guess a merged file will be used anyway.
  Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Adding built-in support for Serbian language
       [not found] <mailman.285.1604070640.1206.ntg-context@ntg.nl>
  2020-10-30 19:18 ` Ivan Pešić
@ 2020-10-30 20:59 ` Ivan Pešić
  1 sibling, 0 replies; 9+ messages in thread
From: Ivan Pešić @ 2020-10-30 20:59 UTC (permalink / raw)
  To: ntg-context

[-- Attachment #1: Type: text/plain, Size: 461 bytes --]


Дана 30.10.2020. у 16:42, Mojca Miklavec пише:
> Would you be willing to also prepare the latin one then?
> The codes should be sorted out by Hans (potentially with some help),
> but we definitely want to use "sr-latn" and "sr-cyrl".
Here is the lang-txt.lua diff with labels transliterated from serbian
cyrillic to latin script.
Other files basically do not differ, only codes should be sorted out.
Language definition stays the same.

Regards,
Ivan

[-- Attachment #2: Serbian-Latn.7z --]
[-- Type: application/octet-stream, Size: 4492 bytes --]

[-- Attachment #3: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Adding built-in support for Serbian language
  2020-10-30 19:18 ` Ivan Pešić
@ 2020-10-30 19:27   ` Hans Hagen
  0 siblings, 0 replies; 9+ messages in thread
From: Hans Hagen @ 2020-10-30 19:27 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Ivan Pešić

On 10/30/2020 8:18 PM, Ivan Pešić wrote:
> Dear Mojca,
> 
> Дана 30.10.2020. у 19:10, Mojca Miklavec пише:
>> Would you be willing to also prepare the latin one then?
>> The codes should be sorted out by Hans (potentially with some help),
>> but we definitely want to use "sr-latn" and "sr-cyrl".
> Sure, I will tomorrow create a transliteration to latin script and post
> diffs here.
> What you propose is in fact already used in some other places, I agree
> with you.
>> For the longer names there is some more freedom. LaTeX uses "serbianl"
>> and "serbianc", I think, but I believe we can come up with something
>> nicer.
>> Maybe something along the lines of the following?
>>      \mainlanguage[serbian][script=latn]
>> or
>>     \mainlanguage[serbian-latin]
>>     \mainlanguage[serbian-cyrillic]
>> No clue, really.
>>
>> Thank you,
>>      Mojca
>>
>> (PS: I would say that adding support for transliteration of the text
>> from one script to the other would be a really nice feature. Then you
>> could type your text for a book once and have it typeset in both
>> versions without any extra effort :)
> As for transliteration, cyrillic to latin is one-to-one, straightforward
> with no exceptions.
> A simple table lookup is enough.
> Going from latin to cyrillic, there are some exceptions, but we could
> solve that.
> I can provide Hans with all that is needed.

ok. there's quite some code already present in the core that we can use 
so it's no big deal to do it

also think of additional things you want (some tracing?)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Adding built-in support for Serbian language
       [not found] <mailman.285.1604070640.1206.ntg-context@ntg.nl>
@ 2020-10-30 19:18 ` Ivan Pešić
  2020-10-30 19:27   ` Hans Hagen
  2020-10-30 20:59 ` Ivan Pešić
  1 sibling, 1 reply; 9+ messages in thread
From: Ivan Pešić @ 2020-10-30 19:18 UTC (permalink / raw)
  To: ntg-context

Dear Mojca,

Дана 30.10.2020. у 19:10, Mojca Miklavec пише:
> Would you be willing to also prepare the latin one then?
> The codes should be sorted out by Hans (potentially with some help),
> but we definitely want to use "sr-latn" and "sr-cyrl".
Sure, I will tomorrow create a transliteration to latin script and post
diffs here.
What you propose is in fact already used in some other places, I agree
with you.
> For the longer names there is some more freedom. LaTeX uses "serbianl"
> and "serbianc", I think, but I believe we can come up with something
> nicer.
> Maybe something along the lines of the following?
>     \mainlanguage[serbian][script=latn]
> or
>    \mainlanguage[serbian-latin]
>    \mainlanguage[serbian-cyrillic]
> No clue, really.
>
> Thank you,
>     Mojca
>
> (PS: I would say that adding support for transliteration of the text
> from one script to the other would be a really nice feature. Then you
> could type your text for a book once and have it typeset in both
> versions without any extra effort :)
As for transliteration, cyrillic to latin is one-to-one, straightforward
with no exceptions.
A simple table lookup is enough.
Going from latin to cyrillic, there are some exceptions, but we could
solve that.
I can provide Hans with all that is needed.

Ivan

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Adding built-in support for Serbian language
@ 2020-10-30 10:35 Ivan Pešić
  0 siblings, 0 replies; 9+ messages in thread
From: Ivan Pešić @ 2020-10-30 10:35 UTC (permalink / raw)
  To: ntg-context

[-- Attachment #1: Type: text/plain, Size: 68 bytes --]

Appologies, I have forgot to attach the file :$

Here is it.

Ivan


[-- Attachment #2: serbian.7z --]
[-- Type: application/octet-stream, Size: 5212 bytes --]

[-- Attachment #3: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-03 13:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-30 10:31 Adding built-in support for Serbian language Ivan Pešić
2020-10-30 12:42 ` Mojca Miklavec
2020-10-30 14:07   ` Hans Hagen
2020-10-30 16:45   ` Henri Menke
2020-11-03 13:10   ` Hans Hagen
2020-10-30 10:35 Ivan Pešić
     [not found] <mailman.285.1604070640.1206.ntg-context@ntg.nl>
2020-10-30 19:18 ` Ivan Pešić
2020-10-30 19:27   ` Hans Hagen
2020-10-30 20:59 ` Ivan Pešić

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).