ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Some Ethiopic examples (hyphenation/breaking) in ConTeXt
@ 2011-05-06 20:00 Mojca Miklavec
  2011-05-07 11:20 ` Hans Hagen
  0 siblings, 1 reply; 6+ messages in thread
From: Mojca Miklavec @ 2011-05-06 20:00 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Adam McCollum

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]

Dear Hans,

We were originally preparing the example for XeTeX (which behaves very
weird anyway) and I would like to know how to typeset Ethiopic text in
ConTeXt.

The basic requirements are:

- Words may be split after any character (character = syllable; it's
in the range "1200-"139F), but not before word/sentence dividers. (We
have hyphenation patterns, but one could just as well use some other
mechanism to break.)

- "1361 and "1362 are word dividers and sentence dividers.

- One doesn't use spaces when writing.

- In output one should get something like space (approximately the
same width) before and something like space after word/sentence
divider, except that the "space" before divider should not be
breakable; I highly suspect that the amount of space before/after
dividers depends on the font being used, but I may be wrong.
- Text should be nicely justified (I wonder if microtypography would also help).

I'm attaching a sample text that does approximately what I expect it
to do, but I would like to avoid active characters, make the space
before and after divider of equal size and I'm not sure what is the
most appropriate approach in ConTeXt. The example also leaves a bit
too much whitespace after dividers that end the line.

Here's the font used in the example:
   http://scripts.sil.org/AbyssinicaSIL_Download

Thanks a lot,
    Mojca

PS: In char-def.lua see

 [0x1361]={
  category="po",
  description="ETHIOPIC WORDSPACE",
  direction="l",
  linebreak="ba",
  unicodeslot=0x1361,
 },

where linebreak="ba" means "break after" or "allow break after this
character". But I guess that ConTeXt ignores those meanings at the
moment.

[-- Attachment #2: context-geez.tex --]
[-- Type: application/x-tex, Size: 1596 bytes --]

[-- Attachment #3: lang-mul-ethi.lua --]
[-- Type: application/octet-stream, Size: 237 bytes --]

return {
	["metadata"]={
		["mnemonic"]="mul-ethi",
		["source"]="hyph-mul-ethi",
	},
	["patterns"]={
		["data"]=io.loaddata(resolvers.findfile("lang-mul-ethi.pat")),
		["minhyphenmin"]=1,
		["minhyphenmax"]=1,
	},
	["version"]="0.1",
}

[-- Attachment #4: lang-mul-ethi.pat --]
[-- Type: application/octet-stream, Size: 2538 bytes --]

1ሀ1
1ሁ1
1ሂ1
1ሃ1
1ሄ1
1ህ1
1ሆ1
1ሇ1
1ለ1
1ሉ1
1ሊ1
1ላ1
1ሌ1
1ል1
1ሎ1
1ሏ1
1ሐ1
1ሑ1
1ሒ1
1ሓ1
1ሔ1
1ሕ1
1ሖ1
1ሗ1
1መ1
1ሙ1
1ሚ1
1ማ1
1ሜ1
1ም1
1ሞ1
1ሟ1
1ሠ1
1ሡ1
1ሢ1
1ሣ1
1ሤ1
1ሥ1
1ሦ1
1ሧ1
1ረ1
1ሩ1
1ሪ1
1ራ1
1ሬ1
1ር1
1ሮ1
1ሯ1
1ሰ1
1ሱ1
1ሲ1
1ሳ1
1ሴ1
1ስ1
1ሶ1
1ሷ1
1ሸ1
1ሹ1
1ሺ1
1ሻ1
1ሼ1
1ሽ1
1ሾ1
1ሿ1
1ቀ1
1ቁ1
1ቂ1
1ቃ1
1ቄ1
1ቅ1
1ቆ1
1ቇ1
1ቈ1
1ቊ1
1ቋ1
1ቌ1
1ቍ1
1ቐ1
1ቑ1
1ቒ1
1ቓ1
1ቔ1
1ቕ1
1ቖ1
1ቘ1
1ቚ1
1ቛ1
1ቜ1
1ቝ1
1በ1
1ቡ1
1ቢ1
1ባ1
1ቤ1
1ብ1
1ቦ1
1ቧ1
1ቨ1
1ቩ1
1ቪ1
1ቫ1
1ቬ1
1ቭ1
1ቮ1
1ቯ1
1ተ1
1ቱ1
1ቲ1
1ታ1
1ቴ1
1ት1
1ቶ1
1ቷ1
1ቸ1
1ቹ1
1ቺ1
1ቻ1
1ቼ1
1ች1
1ቾ1
1ቿ1
1ኀ1
1ኁ1
1ኂ1
1ኃ1
1ኄ1
1ኅ1
1ኆ1
1ኇ1
1ኈ1
1ኊ1
1ኋ1
1ኌ1
1ኍ1
1ነ1
1ኑ1
1ኒ1
1ና1
1ኔ1
1ን1
1ኖ1
1ኗ1
1ኘ1
1ኙ1
1ኚ1
1ኛ1
1ኜ1
1ኝ1
1ኞ1
1ኟ1
1አ1
1ኡ1
1ኢ1
1ኣ1
1ኤ1
1እ1
1ኦ1
1ኧ1
1ከ1
1ኩ1
1ኪ1
1ካ1
1ኬ1
1ክ1
1ኮ1
1ኯ1
1ኰ1
1ኲ1
1ኳ1
1ኴ1
1ኵ1
1ኸ1
1ኹ1
1ኺ1
1ኻ1
1ኼ1
1ኽ1
1ኾ1
1ዀ1
1ዂ1
1ዃ1
1ዄ1
1ዅ1
1ወ1
1ዉ1
1ዊ1
1ዋ1
1ዌ1
1ው1
1ዎ1
1ዏ1
1ዐ1
1ዑ1
1ዒ1
1ዓ1
1ዔ1
1ዕ1
1ዖ1
1ዘ1
1ዙ1
1ዚ1
1ዛ1
1ዜ1
1ዝ1
1ዞ1
1ዟ1
1ዠ1
1ዡ1
1ዢ1
1ዣ1
1ዤ1
1ዥ1
1ዦ1
1ዧ1
1የ1
1ዩ1
1ዪ1
1ያ1
1ዬ1
1ይ1
1ዮ1
1ዯ1
1ደ1
1ዱ1
1ዲ1
1ዳ1
1ዴ1
1ድ1
1ዶ1
1ዷ1
1ዸ1
1ዹ1
1ዺ1
1ዻ1
1ዼ1
1ዽ1
1ዾ1
1ዿ1
1ጀ1
1ጁ1
1ጂ1
1ጃ1
1ጄ1
1ጅ1
1ጆ1
1ጇ1
1ገ1
1ጉ1
1ጊ1
1ጋ1
1ጌ1
1ግ1
1ጎ1
1ጏ1
1ጐ1
1ጒ1
1ጓ1
1ጔ1
1ጕ1
1ጘ1
1ጙ1
1ጚ1
1ጛ1
1ጜ1
1ጝ1
1ጞ1
1ጟ1
1ጠ1
1ጡ1
1ጢ1
1ጣ1
1ጤ1
1ጥ1
1ጦ1
1ጧ1
1ጨ1
1ጩ1
1ጪ1
1ጫ1
1ጬ1
1ጭ1
1ጮ1
1ጯ1
1ጰ1
1ጱ1
1ጲ1
1ጳ1
1ጴ1
1ጵ1
1ጶ1
1ጷ1
1ጸ1
1ጹ1
1ጺ1
1ጻ1
1ጼ1
1ጽ1
1ጾ1
1ጿ1
1ፀ1
1ፁ1
1ፂ1
1ፃ1
1ፄ1
1ፅ1
1ፆ1
1ፇ1
1ፈ1
1ፉ1
1ፊ1
1ፋ1
1ፌ1
1ፍ1
1ፎ1
1ፏ1
1ፐ1
1ፑ1
1ፒ1
1ፓ1
1ፔ1
1ፕ1
1ፖ1
1ፗ1
1ፘ1
1ፙ1
1ፚ1
1ᎀ1
1ᎁ1
1ᎂ1
1ᎃ1
1ᎄ1
1ᎅ1
1ᎆ1
1ᎇ1
1ᎈ1
1ᎉ1
1ᎊ1
1ᎋ1
1ᎌ1
1ᎍ1
1ᎎ1
1ᎏ1
1ⶀ1
1ⶁ1
1ⶂ1
1ⶃ1
1ⶄ1
1ⶅ1
1ⶆ1
1ⶇ1
1ⶈ1
1ⶉ1
1ⶊ1
1ⶋ1
1ⶌ1
1ⶍ1
1ⶎ1
1ⶏ1
1ⶐ1
1ⶑ1
1ⶒ1
1ⶓ1
1ⶔ1
1ⶕ1
1ⶖ1
1ⶠ1
1ⶡ1
1ⶢ1
1ⶣ1
1ⶤ1
1ⶥ1
1ⶦ1
1ⶨ1
1ⶩ1
1ⶪ1
1ⶫ1
1ⶬ1
1ⶭ1
1ⶮ1
1ⶰ1
1ⶱ1
1ⶲ1
1ⶳ1
1ⶴ1
1ⶵ1
1ⶶ1
1ⶸ1
1ⶹ1
1ⶺ1
1ⶻ1
1ⶼ1
1ⶽ1
1ⶾ1
1ⷀ1
1ⷁ1
1ⷂ1
1ⷃ1
1ⷄ1
1ⷅ1
1ⷆ1
1ⷈ1
1ⷉ1
1ⷊ1
1ⷋ1
1ⷌ1
1ⷍ1
1ⷎ1
1ⷐ1
1ⷑ1
1ⷒ1
1ⷓ1
1ⷔ1
1ⷕ1
1ⷖ1
1ⷘ1
1ⷙ1
1ⷚ1
1ⷛ1
1ⷜ1
1ⷝ1
1ⷞ1
2፡1
2።1

[-- Attachment #5: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Ethiopic examples (hyphenation/breaking) in ConTeXt
  2011-05-06 20:00 Some Ethiopic examples (hyphenation/breaking) in ConTeXt Mojca Miklavec
@ 2011-05-07 11:20 ` Hans Hagen
  2011-05-07 14:19   ` Mojca Miklavec
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Hagen @ 2011-05-07 11:20 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Adam McCollum, Mojca Miklavec

On 6-5-2011 10:00, Mojca Miklavec wrote:

> We were originally preparing the example for XeTeX (which behaves very
> weird anyway) and I would like to know how to typeset Ethiopic text in
> ConTeXt.

Let's forget about xetex then. It's not that complex to add to mkiv as 
we have mechanisms in place for it.

What is the otf language / script code?

> The basic requirements are:
>
> - Words may be split after any character (character = syllable; it's
> in the range "1200-"139F), but not before word/sentence dividers. (We
> have hyphenation patterns, but one could just as well use some other
> mechanism to break.)
>
> - "1361 and "1362 are word dividers and sentence dividers.
>
> - One doesn't use spaces when writing.

Like in cjk.

> - In output one should get something like space (approximately the
> same width) before and something like space after word/sentence
> divider, except that the "space" before divider should not be
> breakable; I highly suspect that the amount of space before/after
> dividers depends on the font being used, but I may be wrong.

so let's visualize that:

[1200][1200][1200][1361][1200][1200][1200][1362][1200][1200][1200]

valid breakpoints:

[1200]
[1200]
[1200][nbsp][1200]
[1200]
[1200][nbsp][1200]
[1200]
[1200]

Is that okay? How about spaces in the input (end of lines introduce them)?

> - Text should be nicely justified (I wonder if microtypography would also help).

That is independent of the logic.

> I'm attaching a sample text that does approximately what I expect it
> to do, but I would like to avoid active characters, make the space
> before and after divider of equal size and I'm not sure what is the
> most appropriate approach in ConTeXt. The example also leaves a bit
> too much whitespace after dividers that end the line.

Nothing attached.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Ethiopic examples (hyphenation/breaking) in ConTeXt
  2011-05-07 11:20 ` Hans Hagen
@ 2011-05-07 14:19   ` Mojca Miklavec
  2011-05-07 15:59     ` Arthur Reutenauer
  2011-05-08 17:43     ` Hans Hagen
  0 siblings, 2 replies; 6+ messages in thread
From: Mojca Miklavec @ 2011-05-07 14:19 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users, Adam McCollum

On Sat, May 7, 2011 at 13:20, Hans Hagen wrote:
> On 6-5-2011 10:00, Mojca Miklavec wrote:
>
> What is the otf language / script code?

Ethi for script and AMH for language (but language should probably not
be needed).

>> - In output one should get something like space (approximately the
>> same width) before and something like space after word/sentence
>> divider, except that the "space" before divider should not be
>> breakable; I highly suspect that the amount of space before/after
>> dividers depends on the font being used, but I may be wrong.
>
> so let's visualize that:
>
> [1200][1200][1200][1361][1200][1200][1200][1362][1200][1200][1200]
>
> valid breakpoints:
>
> [1200]
> [1200]
> [1200][nbsp][1200]
> [1200]
> [1200][nbsp][1200]
> [1200]
> [1200]
>
> Is that okay?

No, it should be:

[1200]
[1200]
[1200][nbsp][1361]
[1200]
[1200]
[1200][nbsp][1362]
[1200]
[1200]
[1200]

Word delimiters should be displayed.

> How about spaces in the input (end of lines introduce them)?

Adam?

My guess would be that they might not use end-of-lines except when
they want to start a new paragraph, but I may as well be wrong. If
there are end-of-lines, they should probably be ignored - no extra
space should be introduced (unless there are two, so that a new
paragraph is started).

But Adam should correct me.

In fact there are two different writing paradigms. One uses word
separator and another one uses spaces. My guess that the second one
might have arised in the modern era due to poor computer suppor. (If
they are using spaces, they have at least a chance that words break in
text editors and web browsers, but I may be wrong. Wikipedia uses
spaces for example, but all old books use separators.)

Anyway: in case that one uses the second paradigm (use spaces instead
of word separators), the end of line should be treated as a normal
space and writing should be no different than for any other European
language in Latin script.

> Nothing attached.

There was an attachment originally (see
http://article.gmane.org/gmane.comp.tex.context/68230), but maybe your
spam filter didn't like the Ethiopic spam.

(My roommate was just robbed/scammed in Ethiopia last week; no wonder
that even spam filters put the mails in the same category as Nigerian
scams :)

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Ethiopic examples (hyphenation/breaking) in ConTeXt
  2011-05-07 14:19   ` Mojca Miklavec
@ 2011-05-07 15:59     ` Arthur Reutenauer
  2011-05-08 17:43     ` Hans Hagen
  1 sibling, 0 replies; 6+ messages in thread
From: Arthur Reutenauer @ 2011-05-07 15:59 UTC (permalink / raw)
  To: mailing list for ConTeXt users
  Cc: mailing list for ConTeXt users, Hans Hagen, Adam McCollum

> Ethi for script and AMH for language (but language should probably not
> be needed).

  Indeed, as the same behaviour can be expected for several different languages using the same script.

    Arthur
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Ethiopic examples (hyphenation/breaking) in ConTeXt
  2011-05-07 14:19   ` Mojca Miklavec
  2011-05-07 15:59     ` Arthur Reutenauer
@ 2011-05-08 17:43     ` Hans Hagen
  2011-05-08 18:13       ` Arthur Reutenauer
  1 sibling, 1 reply; 6+ messages in thread
From: Hans Hagen @ 2011-05-08 17:43 UTC (permalink / raw)
  To: Mojca Miklavec; +Cc: mailing list for ConTeXt users, Adam McCollum

On 7-5-2011 4:19, Mojca Miklavec wrote:

> In fact there are two different writing paradigms. One uses word
> separator and another one uses spaces. My guess that the second one
> might have arised in the modern era due to poor computer suppor. (If
> they are using spaces, they have at least a chance that words break in
> text editors and web browsers, but I may be wrong. Wikipedia uses
> spaces for example, but all old books use separators.)

So what are the rules for mixing languages/scripts then?

[ethi] [latn] [ethi]

> (My roommate was just robbed/scammed in Ethiopia last week; no wonder
> that even spam filters put the mails in the same category as Nigerian
> scams :)

I recently installed language blocking to the routers ... at some point 
I think it will become 'block all' unless 'a few countries'.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Ethiopic examples (hyphenation/breaking) in ConTeXt
  2011-05-08 17:43     ` Hans Hagen
@ 2011-05-08 18:13       ` Arthur Reutenauer
  0 siblings, 0 replies; 6+ messages in thread
From: Arthur Reutenauer @ 2011-05-08 18:13 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Adam McCollum, Mojca Miklavec

> So what are the rules for mixing languages/scripts then?

  I'm not sure there are any rules.  There might be typographical traditions, though.

> I recently installed language blocking to the routers ... at some point I think it will become 'block all' unless 'a few countries'.

  There are scammers in every country, unfortunately.  In London there is a well-known scam when you are looking for a flat to rent; adverts that look like genuine offers at first turn out to be just a trick by scammers to rip you off of your money.  I made contact with some of them when I first moved here, but stopped talking to them very soon because what they told me didn't make any sense.  I feel uncomfortable when things don't make sense :-)

    Arthur 
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-05-08 18:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-06 20:00 Some Ethiopic examples (hyphenation/breaking) in ConTeXt Mojca Miklavec
2011-05-07 11:20 ` Hans Hagen
2011-05-07 14:19   ` Mojca Miklavec
2011-05-07 15:59     ` Arthur Reutenauer
2011-05-08 17:43     ` Hans Hagen
2011-05-08 18:13       ` Arthur Reutenauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).