ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* ASCII input - non ASCII output
@ 2017-11-07 12:48 Sava Maksimović
  2017-11-07 13:10 ` Mojca Miklavec
  2017-11-07 19:38 ` Thomas A. Schmitz
  0 siblings, 2 replies; 8+ messages in thread
From: Sava Maksimović @ 2017-11-07 12:48 UTC (permalink / raw)
  To: ntg-context


[-- Attachment #1.1: Type: text/plain, Size: 852 bytes --]

Hi,

Is there a way in context, that for some *text* ascii input (in source .tex
file) define mapping in internal tex system ?

For example, if i put two ascii characters "dj" in .tex file, can i  get
cyrillic character "ђ" in .pdf ?
And so on, for input b, v, g, d, ... to get output б, в, г, д, ...

Or more general, for every letter/string in unicode to define the way that
it should be read.

It's benefit for non ascii language users, because in that case they don't
need to switch keyboard layout all the time between command, math input and
text input.

In Latex, package fontenc(precisely OT2 encoding) do that things.

Minimal example:

\starttext

a, b, v, g, d, dj, e, zh, z

\stoptext

should produce

a, б, в, г, д, ђ, е, ж, з

Best regards,
Sava Maksimovic (Сава Максимовић :) )

[-- Attachment #1.2: Type: text/html, Size: 1225 bytes --]

[-- Attachment #2: Type: text/plain, Size: 492 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ASCII input - non ASCII output
  2017-11-07 12:48 ASCII input - non ASCII output Sava Maksimović
@ 2017-11-07 13:10 ` Mojca Miklavec
  2017-11-07 19:38 ` Thomas A. Schmitz
  1 sibling, 0 replies; 8+ messages in thread
From: Mojca Miklavec @ 2017-11-07 13:10 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Dear Sava,

On 7 November 2017 at 13:48, Sava Maksimović wrote:
> Hi,
>
> Is there a way in context, that for some text ascii input (in source .tex
> file) define mapping in internal tex system ?
>
> For example, if i put two ascii characters "dj" in .tex file, can i  get
> cyrillic character "ђ" in .pdf ?
> And so on, for input b, v, g, d, ... to get output б, в, г, д, ...
>
> Or more general, for every letter/string in unicode to define the way that
> it should be read.

ConTeXt can do that with some additional tricks (font features) in lua
(I don't know the code by heart, but I assume someone else will answer
that).

But you'll have to wrap all the code that you want transliterated in
blocks, so instead of having to switch the keyboard, you'll likely
have to type additional commands.

(Maybe it would work satisfactory without having to change too often,
but I would probably not want to do that and would prefer to go for
Unicode.)

> It's benefit for non ascii language users, because in that case they don't
> need to switch keyboard layout all the time between command, math input and
> text input.

Keep in mind that you could in principle also translate the command
names, so that you could use (excuse me, it's probably grammatically
incorrect):

    \почнитекст
    a, б, в, г, д, ђ, е, ж, з
    \завршитекст

> In Latex, package fontenc(precisely OT2 encoding) do that things.
>
> Minimal example:
>
> \starttext
>
> a, b, v, g, d, dj, e, zh, z
>
> \stoptext
>
> should produce
>
> a, б, в, г, д, ђ, е, ж, з

Just curious: why do you use "zh" instead of "ž"?

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ASCII input - non ASCII output
  2017-11-07 12:48 ASCII input - non ASCII output Sava Maksimović
  2017-11-07 13:10 ` Mojca Miklavec
@ 2017-11-07 19:38 ` Thomas A. Schmitz
  2017-11-08 10:00   ` Ulrike Fischer
  2017-11-08 10:34   ` Mojca Miklavec
  1 sibling, 2 replies; 8+ messages in thread
From: Thomas A. Schmitz @ 2017-11-07 19:38 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 11/07/2017 01:48 PM, Sava Maksimović wrote:
> Is there a way in context, that for some*text*  ascii input (in source .tex
> file) define mapping in internal tex system ?
> 
> For example, if i put two ascii characters "dj" in .tex file, can i  get
> cyrillic character "ђ" in .pdf ?
> And so on, for input b, v, g, d, ... to get output б, в, г, д, ...
> 
> Or more general, for every letter/string in unicode to define the way that
> it should be read.
> 
> It's benefit for non ascii language users, because in that case they don't
> need to switch keyboard layout all the time between command, math input and
> text input.
>
When mkiv was in its infancy, Hans helped me in writing something like 
this for my Greek module. It basically applies a Lua string.gsub to the 
input to produce and typeset utf8 output. But I pretty soon gave up 
using it. We're in the twenty-first century, and this sort of trickery 
really is not needed any more. And, as Mojca has said, you would have to 
have your text delimited, you don't want your ConTeXt commands to be 
transliterated as well. I can send you the relevant code if you want, 
and you could adapt it to your case. But I would advise against it. In 
the long run, changing keyboards is less hassle than this sort of 
semi-solution to an obsolete problem.

> In Latex, package fontenc(precisely OT2 encoding) do that things.

Yes, LaTeX stays firmly in the 1970s. But the world has moved on.

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ASCII input - non ASCII output
  2017-11-07 19:38 ` Thomas A. Schmitz
@ 2017-11-08 10:00   ` Ulrike Fischer
  2017-11-08 10:34   ` Mojca Miklavec
  1 sibling, 0 replies; 8+ messages in thread
From: Ulrike Fischer @ 2017-11-08 10:00 UTC (permalink / raw)
  To: ntg-context

Am Tue, 7 Nov 2017 20:38:14 +0100 schrieb Thomas A. Schmitz:

>> In Latex, package fontenc(precisely OT2 encoding) do that things.
 
> Yes, LaTeX stays firmly in the 1970s. But the world has moved on.

And LaTeX has moved on too. You can use luatex and utf8 input with
it without problem and for the unicode engines an unicode
fontencoding and open type fonts are the default in the kernel. That
you still *can* use special fontencodings to mimic transliteration,
8-bit-engines and type1-fonts with LaTeX doesn't mean that you
*have* to use them. You have the choice. 

-- 
Ulrike Fischer 
http://www.troubleshooting-tex.de/

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ASCII input - non ASCII output
  2017-11-07 19:38 ` Thomas A. Schmitz
  2017-11-08 10:00   ` Ulrike Fischer
@ 2017-11-08 10:34   ` Mojca Miklavec
  2017-11-08 14:36     ` Thomas A. Schmitz
  1 sibling, 1 reply; 8+ messages in thread
From: Mojca Miklavec @ 2017-11-08 10:34 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 7 November 2017 at 20:38, Thomas A. Schmitz wrote:
>
> When mkiv was in its infancy, Hans helped me in writing something like this
> for my Greek module. It basically applies a Lua string.gsub to the input to
> produce and typeset utf8 output.

This would be done with font features now. So it would in fact not be
applied at input (which is more dangerous), but rather before the
characters end up in PDF which makes a lot more sense anyway.

The documentation is here, but there should be plenty more examples:
    http://pragma-ade.com/general/manuals/fonts-mkiv.pdf

> I can send you the relevant code if you want, and you could adapt it
> to your case.

I'm pretty sure it's outdated if it was written at the infancy stage of mkiv.

> But I would advise against it. In the long run, changing
> keyboards is less hassle than this sort of semi-solution to an obsolete
> problem.

One reason why I would not abandon it *immediately* for Serbian is
that Serbian can actually be written in both scripts and there are
straightforward rules for transliteration (it's not exactly one-to-one
character, but those "dj"s should be easy enough to handle, in
particular because there are also all the required digraphs in Unicode
- or maybe more difficult exactly because of that). The fun part is
that they cannot decide which script to use themselves :), so you end
up with schoolmates using different scripts for their lecture notes.

I'm still not arguing that this is the most brilliant idea, but I can
totally imagine a Serbian professor wanting to "auto-generate" a
Cyrillic version of his book on top of the Latin edition with
close-to-zero extra effort.

Greek, in contrast, hardly makes any sense when written in Latin alphabet.

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ASCII input - non ASCII output
  2017-11-08 10:34   ` Mojca Miklavec
@ 2017-11-08 14:36     ` Thomas A. Schmitz
  2017-11-08 15:09       ` Mojca Miklavec
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas A. Schmitz @ 2017-11-08 14:36 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 11/08/2017 11:34 AM, Mojca Miklavec wrote:
> I'm still not arguing that this is the most brilliant idea, but I can
> totally imagine a Serbian professor wanting to "auto-generate" a
> Cyrillic version of his book on top of the Latin edition with
> close-to-zero extra effort.

Ok, I can see that this may be a convenient way of producing different 
output from the same source; I wasn't aware of this (and I was somewhat 
provocative about Latex, of course :-) From a conceptional point of 
view, it still feels a bit hackish to do these things on the font level, 
because they are not/should not be tied to specific fonts - you'd have 
to rewrite your features or goodies or whatever they are called now for 
every font you want to use (and you may run into a number of funny 
inconsistencies in character names or even unicode slots).

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ASCII input - non ASCII output
  2017-11-08 14:36     ` Thomas A. Schmitz
@ 2017-11-08 15:09       ` Mojca Miklavec
  2017-11-09  3:15         ` Henri
  0 siblings, 1 reply; 8+ messages in thread
From: Mojca Miklavec @ 2017-11-08 15:09 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 8 November 2017 at 15:36, Thomas A. Schmitz wrote:
> On 11/08/2017 11:34 AM, Mojca Miklavec wrote:
>>
>> I'm still not arguing that this is the most brilliant idea, but I can
>> totally imagine a Serbian professor wanting to "auto-generate" a
>> Cyrillic version of his book on top of the Latin edition with
>> close-to-zero extra effort.
>
> Ok, I can see that this may be a convenient way of producing different
> output from the same source; I wasn't aware of this (and I was somewhat
> provocative about Latex, of course :-) From a conceptional point of view, it
> still feels a bit hackish to do these things on the font level, because they
> are not/should not be tied to specific fonts - you'd have to rewrite your
> features or goodies or whatever they are called now for every font you want
> to use (and you may run into a number of funny inconsistencies in character
> names or even unicode slots).

Now for a bit of off-topic-ness.

Trivia. (Ignoring the attempts to make our own national keyboard) we
are using "Croatian" keyboard (which is probably the same as Serbian
layout) which has all the relevant-for-TeX keys ({}[]\) on the third
plane (alt-gr+<something>), but I learnt computer programming on an US
keyboard and preferred using US layout to those strange keys in the
third plane. In computer programming there's basically never the need
to use non-ascii characters. And in writing texts in native language
there's no need to use those strange backslashes, so life was mostly
good until I started using TeX in UTF-8. Back then I was basically
switching the keyboard a couple of times per sentence (if not per
word) and somewhat hated typing any TeX in native language for that
reason. Then I switched to Dvorak and made myself a special layout.
Now I have all the special keys from US keyboard easily accessible and
all those strange non-ascii character on the third plane (alt-gr-C to
get "Č"). That works much better for me now. So at least I know the
pain of constant need of switching the layouts. Nevertheless I would
still say that it makes more sense to put some effort to get nice
UTF-8 documents. (Except, again, giving Serbian a bit of an exception
due to the fact that the document would still be valid and perfectly
readable in its Latin form.)

One could argue in the other direction as well. It should be pretty
straightforward to "transliterate" all ConTeXt commands into Cyrillic
(ok, I have no clue what people usually do with q, x, y, w, ... but
I'm sure there's a solution for that as well) and simply use English
commands in Cyrillic script to simplify typing :) :) :)

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ASCII input - non ASCII output
  2017-11-08 15:09       ` Mojca Miklavec
@ 2017-11-09  3:15         ` Henri
  0 siblings, 0 replies; 8+ messages in thread
From: Henri @ 2017-11-09  3:15 UTC (permalink / raw)
  To: ntg-context

How about using the "Transliterator" module by Philipp Gesang?
https://modules.contextgarden.net/cgi-bin/module.cgi/ruid=6004710974/action=view/id=50
Comes with TeXlive and ConTeXt standalone.

On Wed, 2017-11-08 at 16:09 +0100, Mojca Miklavec wrote:
> On 8 November 2017 at 15:36, Thomas A. Schmitz wrote:
> > 
> > On 11/08/2017 11:34 AM, Mojca Miklavec wrote:
> > > 
> > > 
> > > I'm still not arguing that this is the most brilliant idea, but I can
> > > totally imagine a Serbian professor wanting to "auto-generate" a
> > > Cyrillic version of his book on top of the Latin edition with
> > > close-to-zero extra effort.
> > Ok, I can see that this may be a convenient way of producing different
> > output from the same source; I wasn't aware of this (and I was somewhat
> > provocative about Latex, of course :-) From a conceptional point of view, it
> > still feels a bit hackish to do these things on the font level, because they
> > are not/should not be tied to specific fonts - you'd have to rewrite your
> > features or goodies or whatever they are called now for every font you want
> > to use (and you may run into a number of funny inconsistencies in character
> > names or even unicode slots).
> Now for a bit of off-topic-ness.
> 
> Trivia. (Ignoring the attempts to make our own national keyboard) we
> are using "Croatian" keyboard (which is probably the same as Serbian
> layout) which has all the relevant-for-TeX keys ({}[]\) on the third
> plane (alt-gr+<something>), but I learnt computer programming on an US
> keyboard and preferred using US layout to those strange keys in the
> third plane. In computer programming there's basically never the need
> to use non-ascii characters. And in writing texts in native language
> there's no need to use those strange backslashes, so life was mostly
> good until I started using TeX in UTF-8. Back then I was basically
> switching the keyboard a couple of times per sentence (if not per
> word) and somewhat hated typing any TeX in native language for that
> reason. Then I switched to Dvorak and made myself a special layout.
> Now I have all the special keys from US keyboard easily accessible and
> all those strange non-ascii character on the third plane (alt-gr-C to
> get "Č"). That works much better for me now. So at least I know the
> pain of constant need of switching the layouts. Nevertheless I would
> still say that it makes more sense to put some effort to get nice
> UTF-8 documents. (Except, again, giving Serbian a bit of an exception
> due to the fact that the document would still be valid and perfectly
> readable in its Latin form.)
> 
> One could argue in the other direction as well. It should be pretty
> straightforward to "transliterate" all ConTeXt commands into Cyrillic
> (ok, I have no clue what people usually do with q, x, y, w, ... but
> I'm sure there's a solution for that as well) and simply use English
> commands in Cyrillic script to simplify typing :) :) :)
> 
> Mojca
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
> archive  : https://bitbucket.org/phg/context-mirror/commits/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-09  3:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07 12:48 ASCII input - non ASCII output Sava Maksimović
2017-11-07 13:10 ` Mojca Miklavec
2017-11-07 19:38 ` Thomas A. Schmitz
2017-11-08 10:00   ` Ulrike Fischer
2017-11-08 10:34   ` Mojca Miklavec
2017-11-08 14:36     ` Thomas A. Schmitz
2017-11-08 15:09       ` Mojca Miklavec
2017-11-09  3:15         ` Henri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).