ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* UTF conversion via Lua
@ 2012-02-10 10:22 Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 10:26 ` Thomas A. Schmitz
  2012-02-10 10:57 ` Philipp Gesang
  0 siblings, 2 replies; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-10 10:22 UTC (permalink / raw)
  To: ConTeXt

Hello,

I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program.

When I work with them in ConTeXt, I need to convert them to UTF.

Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion?

Something like:

\startluacode
   local str = loadFile("a.txt") -- ASCII coded

   str = context.ACSII2UTF(str) -- Or something like this
\stopluacode

Best regards,

Lukas


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 10:22 UTF conversion via Lua Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 10:26 ` Thomas A. Schmitz
  2012-02-10 10:57 ` Philipp Gesang
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas A. Schmitz @ 2012-02-10 10:26 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 02/10/12 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> Hello,
>
> I have many files with ASCII encoding; this encoding must be kept as
> these files are processed also by another program.
>
> When I work with them in ConTeXt, I need to convert them to UTF.
>
> Does Lua (in ConTeXt scope) offer a transformation function or a table
> of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion?
>
> Something like:
>
> \startluacode
>    local str = loadFile("a.txt") -- ASCII coded
>
>    str = context.ACSII2UTF(str) -- Or something like this
> \stopluacode
>
> Best regards,
>
> Lukas

Have a look at tex/texmf-context/scripts/context/lua/mtx-babel.lua. 
That's a converter Hans wrote a while ago for a similar problem I had. I 
don't know if it still works out of the box, but it should help you get 
an idea what you could do.

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 10:22 UTF conversion via Lua Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 10:26 ` Thomas A. Schmitz
@ 2012-02-10 10:57 ` Philipp Gesang
  2012-02-10 11:11   ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 11:13   ` UTF conversion via Lua (now with attachment) Procházka Lukáš Ing. - Pontex s. r. o.
  1 sibling, 2 replies; 22+ messages in thread
From: Philipp Gesang @ 2012-02-10 10:57 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1898 bytes --]

On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> Hello,
> 
> I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program.
> 
> When I work with them in ConTeXt, I need to convert them to UTF.

Not needed, as every ASCII string is a valid UTF8  string:
   “The UTF encoding has several good properties. By far the most
    important is that a byte in the ASCII range 0-127 represents
    itself in UTF. Thus UTF is backward compatible with ASCII.”
    http://doc.cat-v.org/plan_9/4th_edition/papers/utf
You can use them in Luatex without further conversion.

Regards
Philipp


> 
> Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion?
> 
> Something like:
> 
> \startluacode
>   local str = loadFile("a.txt") -- ASCII coded
> 
>   str = context.ACSII2UTF(str) -- Or something like this
> \stopluacode
> 
> Best regards,
> 
> Lukas
> 
> 
> -- 
> Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
> Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
> Bezová 1658
> 147 14 Praha 4
> 
> Tel: +420 244 062 238
> Fax: +420 244 461 038
> 
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 10:57 ` Philipp Gesang
@ 2012-02-10 11:11   ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 11:14     ` luigi scarso
                       ` (2 more replies)
  2012-02-10 11:13   ` UTF conversion via Lua (now with attachment) Procházka Lukáš Ing. - Pontex s. r. o.
  1 sibling, 3 replies; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-10 11:11 UTC (permalink / raw)
  To: mailing list for ConTeXt users

... Well, my information was not correct.

There are characters > 127 in the file, like "ř", "š"...

Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.

But I have problem loading them into ConTeXt.

I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.

@Thomas:

The table looks nice but there are no entries for CP 1250 to UTF conversion.

I prepared some tables: character conversion and removal of diacritics (see the attachment);
maybe it would be handful to include them into ConTeXt somehow.

Best regards,

Lukas


On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang <gesang@stud.uni-heidelberg.de> wrote:

> On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
>> Hello,
>>
>> I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program.
>>
>> When I work with them in ConTeXt, I need to convert them to UTF.
>
> Not needed, as every ASCII string is a valid UTF8  string:
>    “The UTF encoding has several good properties. By far the most
>     important is that a byte in the ASCII range 0-127 represents
>     itself in UTF. Thus UTF is backward compatible with ASCII.”
>     http://doc.cat-v.org/plan_9/4th_edition/papers/utf
> You can use them in Luatex without further conversion.
>
> Regards
> Philipp
>
>
>>
>> Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion?
>>
>> Something like:
>>
>> \startluacode
>>   local str = loadFile("a.txt") -- ASCII coded
>>
>>   str = context.ACSII2UTF(str) -- Or something like this
>> \stopluacode
>>
>> Best regards,
>>
>> Lukas
>>
>>
>> --
>> Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
>> Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
>> Bezová 1658
>> 147 14 Praha 4
>>
>> Tel: +420 244 062 238
>> Fax: +420 244 461 038
>>
>> ___________________________________________________________________________________
>> If your question is of interest to others as well, please add an entry to the Wiki!
>>
>> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
>> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
>> archive  : http://foundry.supelec.fr/projects/contextrev/
>> wiki     : http://contextgarden.net
>> ___________________________________________________________________________________
>


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua (now with attachment)
  2012-02-10 10:57 ` Philipp Gesang
  2012-02-10 11:11   ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 11:13   ` Procházka Lukáš Ing. - Pontex s. r. o.
  1 sibling, 0 replies; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-10 11:13 UTC (permalink / raw)
  To: mailing list for ConTeXt users

[-- Attachment #1: Type: text/plain, Size: 2758 bytes --]

... Well, my information was not correct.

There are characters > 127 in the file, like "ř", "š"...

Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.

But I have problem loading them into ConTeXt.

I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.

@Thomas:

The table looks nice but there are no entries for CP 1250 to UTF conversion.

I prepared some tables: character conversion and removal of diacritics (see the attachment);
maybe it would be handful to include them into ConTeXt somehow.

Best regards,

Lukas


On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang <gesang@stud.uni-heidelberg.de> wrote:

> On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
>> Hello,
>>
>> I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program.
>>
>> When I work with them in ConTeXt, I need to convert them to UTF.
>
> Not needed, as every ASCII string is a valid UTF8  string:
>    “The UTF encoding has several good properties. By far the most
>     important is that a byte in the ASCII range 0-127 represents
>     itself in UTF. Thus UTF is backward compatible with ASCII.”
>     http://doc.cat-v.org/plan_9/4th_edition/papers/utf
> You can use them in Luatex without further conversion.
>
> Regards
> Philipp
>
>
>>
>> Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion?
>>
>> Something like:
>>
>> \startluacode
>>   local str = loadFile("a.txt") -- ASCII coded
>>
>>   str = context.ACSII2UTF(str) -- Or something like this
>> \stopluacode
>>
>> Best regards,
>>
>> Lukas
>>
>>
>> --
>> Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
>> Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
>> Bezová 1658
>> 147 14 Praha 4
>>
>> Tel: +420 244 062 238
>> Fax: +420 244 461 038
>>
>> ___________________________________________________________________________________
>> If your question is of interest to others as well, please add an entry to the Wiki!
>>
>> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
>> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
>> archive  : http://foundry.supelec.fr/projects/contextrev/
>> wiki     : http://contextgarden.net
>> ___________________________________________________________________________________
>


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

[-- Attachment #2: Cz2UTF.lua --]
[-- Type: application/octet-stream, Size: 1804 bytes --]

[-- Attachment #3: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 11:11   ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 11:14     ` luigi scarso
  2012-02-13 11:42       ` Ulrike Fischer
  2012-02-10 11:15     ` Wolfgang Schuster
  2012-02-10 11:30     ` Philipp Gesang
  2 siblings, 1 reply; 22+ messages in thread
From: luigi scarso @ 2012-02-10 11:14 UTC (permalink / raw)
  To: mailing list for ConTeXt users

2012/2/10 Procházka Lukáš Ing. - Pontex s. r. o. <LPr@pontex.cz>:
> ... Well, my information was not correct.
>
> There are characters > 127 in the file, like "ř", "š"...
>
> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters
> are displayed correctly.
>
> But I have problem loading them into ConTeXt.
>
> I need to convert the bytes > 127 to UTF sequence, which would be acceptable
> by ConTeXt.
>
> @Thomas:
>
> The table looks nice but there are no entries for CP 1250 to UTF conversion.
>
> I prepared some tables: character conversion and removal of diacritics (see
> the attachment);
> maybe it would be handful to include them into ConTeXt somehow.
>
> Best regards,
>
> Lukas

To avoid confusion :
If you mean ASCII with coderange 0-127, there is no need to conversion;
if you mean ASCII with coderange 0-255 *and*  ISO-8859-1 (Latin 1)
encoding there is no need to conversion;
otherwise you need to specify an encoding (i.e. CP 1250)


From wikipedia
"""
Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a
much wider array of characters, and their various encoding forms have
begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments.
While ASCII is limited to 128 characters, Unicode and the UCS support
more characters by separating the concepts of unique identification
(using natural numbers called code points) and encoding (to 8-, 16- or
32-bit binary formats, called UTF-8, UTF-16 and UTF-32).
To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1
(Latin 1) characters are assigned Unicode/UCS code points that are the
same as their codes in the earlier standards. Therefore, ASCII can be
considered a 7-bit encoding scheme for a very small subset of
Unicode/UCS, and, conversely, the UTF-8 encoding forms are
binary-compatible with ASCII for code points below 128, meaning all
ASCII is valid UTF-8. The other encoding forms resemble ASCII in how
they represent the first 128 characters of Unicode, but use 16 or 32
bits per character, so they require conversion for compatibility.
(similarly UCS-2 is upwards compatible with UTF-16)
"""
If you have iconv, convert between encoding is easy --- you can always
call it as an external program with os.execute(cmd)

-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 11:11   ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 11:14     ` luigi scarso
@ 2012-02-10 11:15     ` Wolfgang Schuster
  2012-02-10 11:32       ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 11:40       ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 11:30     ` Philipp Gesang
  2 siblings, 2 replies; 22+ messages in thread
From: Wolfgang Schuster @ 2012-02-10 11:15 UTC (permalink / raw)
  To: mailing list for ConTeXt users


Am 10.02.2012 um 12:11 schrieb Procházka Lukáš Ing. - Pontex s. r. o.:

> ... Well, my information was not correct.
> 
> There are characters > 127 in the file, like "ř", "š"...
> 
> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.
> 
> But I have problem loading them into ConTeXt.
> 
> I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.
> 
> @Thomas:
> 
> The table looks nice but there are no entries for CP 1250 to UTF conversion.
> 
> I prepared some tables: character conversion and removal of diacritics (see the attachment);
> maybe it would be handful to include them into ConTeXt somehow.

Why don’t you let do context the conversion:

\starttext

this is something in utf8

\startregime[cp1250]
\input filewithcp1250encoding
\stopregime

more text encoded in utf8

\stoptext

Wolfgang
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 11:11   ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 11:14     ` luigi scarso
  2012-02-10 11:15     ` Wolfgang Schuster
@ 2012-02-10 11:30     ` Philipp Gesang
  2 siblings, 0 replies; 22+ messages in thread
From: Philipp Gesang @ 2012-02-10 11:30 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 3938 bytes --]

On 2012-02-10 12:11, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> ... Well, my information was not correct.
> 
> There are characters > 127 in the file, like "ř", "š"...
> 
> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.

So it wasn’t ASCII after all ;-) No problem, just use iconv:

     iconv -f CP1250 -t UTF8 infile > outfile

I do this a lot with movie subtitles …

Hth, Philipp


PS: If you still insist on converting at the Lua end only then
    your starting point might be “regi-cp1250.lua” in the
    Context base/ dir.




> 
> But I have problem loading them into ConTeXt.
> 
> I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.
> 
> @Thomas:
> 
> The table looks nice but there are no entries for CP 1250 to UTF conversion.
> 
> I prepared some tables: character conversion and removal of diacritics (see the attachment);
> maybe it would be handful to include them into ConTeXt somehow.
> 
> Best regards,
> 
> Lukas
> 
> 
> On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang <gesang@stud.uni-heidelberg.de> wrote:
> 
> >On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> >>Hello,
> >>
> >>I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program.
> >>
> >>When I work with them in ConTeXt, I need to convert them to UTF.
> >
> >Not needed, as every ASCII string is a valid UTF8  string:
> >   “The UTF encoding has several good properties. By far the most
> >    important is that a byte in the ASCII range 0-127 represents
> >    itself in UTF. Thus UTF is backward compatible with ASCII.”
> >    http://doc.cat-v.org/plan_9/4th_edition/papers/utf
> >You can use them in Luatex without further conversion.
> >
> >Regards
> >Philipp
> >
> >
> >>
> >>Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion?
> >>
> >>Something like:
> >>
> >>\startluacode
> >>  local str = loadFile("a.txt") -- ASCII coded
> >>
> >>  str = context.ACSII2UTF(str) -- Or something like this
> >>\stopluacode
> >>
> >>Best regards,
> >>
> >>Lukas
> >>
> >>
> >>--
> >>Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
> >>Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
> >>Bezová 1658
> >>147 14 Praha 4
> >>
> >>Tel: +420 244 062 238
> >>Fax: +420 244 461 038
> >>
> >>___________________________________________________________________________________
> >>If your question is of interest to others as well, please add an entry to the Wiki!
> >>
> >>maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> >>webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> >>archive  : http://foundry.supelec.fr/projects/contextrev/
> >>wiki     : http://contextgarden.net
> >>___________________________________________________________________________________
> >
> 
> 
> -- 
> Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
> Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
> Bezová 1658
> 147 14 Praha 4
> 
> Tel: +420 244 062 238
> Fax: +420 244 461 038
> 
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 11:15     ` Wolfgang Schuster
@ 2012-02-10 11:32       ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 12:25         ` Wolfgang Schuster
  2012-02-10 11:40       ` Procházka Lukáš Ing. - Pontex s. r. o.
  1 sibling, 1 reply; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-10 11:32 UTC (permalink / raw)
  To: mailing list for ConTeXt users

[-- Attachment #1: Type: text/plain, Size: 2275 bytes --]

... \enableregime - nice idea!

Despite this, I'm still not able to make work the example:

---- Test.mkiv
\enableregime[cp1250]

\starttext
   \startluacode
     function loadFile(fn)
       local fh = assert(io.open(fn, "r"))
       local str = fh:read("*all")

       fh:close()

       return str
     end

     context.startregime{"cp1250"}
       context(loadFile("a.txt"))
     context.stopregime()
   \stopluacode
\stoptext
----

Where's the problem?

Lukas


On Fri, 10 Feb 2012 12:15:29 +0100, Wolfgang Schuster <schuster.wolfgang@googlemail.com> wrote:

>
> Am 10.02.2012 um 12:11 schrieb Procházka Lukáš Ing. - Pontex s. r. o.:
>
>> ... Well, my information was not correct.
>>
>> There are characters > 127 in the file, like "ř", "š"...
>>
>> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.
>>
>> But I have problem loading them into ConTeXt.
>>
>> I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.
>>
>> @Thomas:
>>
>> The table looks nice but there are no entries for CP 1250 to UTF conversion.
>>
>> I prepared some tables: character conversion and removal of diacritics (see the attachment);
>> maybe it would be handful to include them into ConTeXt somehow.
>
> Why don’t you let do context the conversion:
>
> \starttext
>
> this is something in utf8
>
> \startregime[cp1250]
> \input filewithcp1250encoding
> \stopregime
>
> more text encoded in utf8
>
> \stoptext
>
> Wolfgang
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

[-- Attachment #2: a.txt --]
[-- Type: text/plain, Size: 9 bytes --]

abc
žý

[-- Attachment #3: Test.mkiv --]
[-- Type: application/octet-stream, Size: 332 bytes --]

\enableregime[cp1250]

\starttext
  \startluacode
    function loadFile(fn)
      local fh = assert(io.open(fn, "r"))
      local str = fh:read("*all")

      fh:close()

      return str
    end

    context.startregime{"cp1250"}
      context(loadFile("a.txt"))
    context.stopregime()
  \stopluacode
\stoptext

[-- Attachment #4: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 11:15     ` Wolfgang Schuster
  2012-02-10 11:32       ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 11:40       ` Procházka Lukáš Ing. - Pontex s. r. o.
  1 sibling, 0 replies; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-10 11:40 UTC (permalink / raw)
  To: mailing list for ConTeXt users

One more note -

On Fri, 10 Feb 2012 12:15:29 +0100, Wolfgang Schuster <schuster.wolfgang@googlemail.com> wrote:

> Why don’t you let do context the conversion:
>
> \starttext
>
> this is something in utf8
>
> \startregime[cp1250]
> \input filewithcp1250encoding
> \stopregime
>
> more text encoded in utf8
>
> \stoptext

I cannot \input the file as this is not a valid ConTeXt source.

I do (at least) "%" -> "\%" conversion;
that's why I need to use Lua to load file into a string;
the conversion step was removed - to make it simple - in the sample sent previously.

Lukas


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 11:32       ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 12:25         ` Wolfgang Schuster
  2012-02-10 13:15           ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-16 11:13           ` Procházka Lukáš Ing. - Pontex s. r. o.
  0 siblings, 2 replies; 22+ messages in thread
From: Wolfgang Schuster @ 2012-02-10 12:25 UTC (permalink / raw)
  To: mailing list for ConTeXt users


Am 10.02.2012 um 12:32 schrieb Procházka Lukáš Ing. - Pontex s. r. o.:

> ... \enableregime - nice idea!
> 
> Despite this, I'm still not able to make work the example:
> 
> ---- Test.mkiv
> \enableregime[cp1250]
> 
> \starttext
>  \startluacode
>    function loadFile(fn)
>      local fh = assert(io.open(fn, "r"))
>      local str = fh:read("*all")
> 
>      fh:close()
> 
>      return str
>    end
> 
>    context.startregime{"cp1250"}
>      context(loadFile("a.txt"))
>    context.stopregime()
>  \stopluacode
> \stoptext
> ----
> 
> Where's the problem?


Dunno but it works when you use “regimes.translate” in your code but it’s better
to ask Hans for a function in the commands namespace which you can use.

\starttext

\startluacode

function loadFile(fn)
    local fh = assert(io.open(fn, "r"))
    local str = fh:read("*all")
    fh:close()
    str = regimes.translate(str,"cp1250")
    context(str)
end

loadFile("a.txt")

\stopluacode

\stoptext

Wolfgang
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 12:25         ` Wolfgang Schuster
@ 2012-02-10 13:15           ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-10 15:08             ` Hans Hagen
  2012-02-16 11:13           ` Procházka Lukáš Ing. - Pontex s. r. o.
  1 sibling, 1 reply; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-10 13:15 UTC (permalink / raw)
  To: mailing list for ConTeXt users

> Dunno but it works when you use “regimes.translate” in your code but it’s better
> to ask Hans for a function in the commands namespace which you can use.
>
> \starttext
>
> \startluacode
>
> function loadFile(fn)
>     local fh = assert(io.open(fn, "r"))
>     local str = fh:read("*all")
>     fh:close()
>     str = regimes.translate(str,"cp1250")
>     context(str)
> end
>
> loadFile("a.txt")
>
> \stopluacode
>
> \stoptext
>
> Wolfgang

Thank you, Wolfgang.

Your code works perfectly and does exactly what I need.

Best regards,

Lukas


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 13:15           ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 15:08             ` Hans Hagen
  0 siblings, 0 replies; 22+ messages in thread
From: Hans Hagen @ 2012-02-10 15:08 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 10-2-2012 14:15, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
>> Dunno but it works when you use “regimes.translate” in your code but
>> it’s better
>> to ask Hans for a function in the commands namespace which you can use.
>>
>> \starttext
>>
>> \startluacode
>>
>> function loadFile(fn)
>> local fh = assert(io.open(fn, "r"))
>> local str = fh:read("*all")
>> fh:close()
>> str = regimes.translate(str,"cp1250")
>> context(str)
>> end
>>
>> loadFile("a.txt")
>>
>> \stopluacode
>>
>> \stoptext
>>
>> Wolfgang
>
> Thank you, Wolfgang.
>
> Your code works perfectly and does exactly what I need.

As oneliner ...

function document.MyLoadFile(name)
 
context(regimes.translate(io.loaddata(resolvers.findfile(name)),"cp1250"))
end

(resolvers will look up in the tree if needed)

Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 11:14     ` luigi scarso
@ 2012-02-13 11:42       ` Ulrike Fischer
  2012-02-13 13:14         ` luigi scarso
  0 siblings, 1 reply; 22+ messages in thread
From: Ulrike Fischer @ 2012-02-13 11:42 UTC (permalink / raw)
  To: ntg-context

Am Fri, 10 Feb 2012 12:14:15 +0100 schrieb luigi scarso:

> if you mean ASCII with coderange 0-255 *and*  ISO-8859-1 (Latin 1)
> encoding there is no need to conversion;

This is not true. You are mixing up unicode positions and utf8
encoding.

E.g. "ä" has the same position in unicode and latin1 (dez 228, hex
E4). But its utf8 code consist of 16 bits (1100001110100100, hex
c3a4) while its latin 1 code is 8-bit long (11100100).


-- 
Ulrike Fischer 

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-13 11:42       ` Ulrike Fischer
@ 2012-02-13 13:14         ` luigi scarso
  0 siblings, 0 replies; 22+ messages in thread
From: luigi scarso @ 2012-02-13 13:14 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Mon, Feb 13, 2012 at 12:42 PM, Ulrike Fischer <news3@nililand.de> wrote:
> Am Fri, 10 Feb 2012 12:14:15 +0100 schrieb luigi scarso:
>
>> if you mean ASCII with coderange 0-255 *and*  ISO-8859-1 (Latin 1)
>> encoding there is no need to conversion;
>
> This is not true. You are mixing up unicode positions and utf8
> encoding.
>
> E.g. "ä" has the same position in unicode and latin1 (dez 228, hex
> E4). But its utf8 code consist of 16 bits (1100001110100100, hex
> c3a4) while its latin 1 code is 8-bit long (11100100).
ah yes you are right -- I've made the implicit assumption that his
file was already utf-8 encoded .
 I'm using only utf-8 from long time and I almost forget about
! String contains an invalid utf-8 sequence.

system          > tex > error on line 10 in file t1.txt: String
contains an invalid utf-8 sequence ...


(I believe  he met the error during the next tries because he wrote
> I cannot \input the file as this is not a valid ConTeXt source.
)
What I meant was, as I wrote below,
"To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1
(Latin 1) characters are assigned Unicode/UCS code points that are the
same as their codes in the earlier standards"
and this is true only for iso-8859-1 .

-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-10 12:25         ` Wolfgang Schuster
  2012-02-10 13:15           ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-16 11:13           ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-16 12:08             ` Hans Hagen
  1 sibling, 1 reply; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-16 11:13 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hello,

one more question -

- does "regimes.translate" allow to translate between arbitrary encodings or only from the specified to the current one?

----
   str = regimes.translate(str, "cp1250") -- = Translate from "cp1250" to the current encoding (UTF) (or always to UTF?)
----

I'm looking for something like:

----
   src_enc = "utf8"
   tgt_enc = "cp1250"

   str = regimes.translate(str, src_enc, tgt_enc)
----

Any idea?

Best regards,

Lukas


On Fri, 10 Feb 2012 13:25:40 +0100, Wolfgang Schuster <schuster.wolfgang@googlemail.com> wrote:

>     str = regimes.translate(str,"cp1250")


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-16 11:13           ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-16 12:08             ` Hans Hagen
  2012-02-16 13:15               ` UTF conversion via Lua (renaming attachments to .scr_) Procházka Lukáš Ing. - Pontex s. r. o.
       [not found]               ` <op.v9rwejuttpjj8f@lpr>
  0 siblings, 2 replies; 22+ messages in thread
From: Hans Hagen @ 2012-02-16 12:08 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 16-2-2012 12:13, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> Hello,
>
> one more question -
>
> - does "regimes.translate" allow to translate between arbitrary
> encodings or only from the specified to the current one?

no, although it's no big deal to provide that (of course there is then 
the matter of utf being more complete than the target)

> ----
> str = regimes.translate(str, "cp1250") -- = Translate from "cp1250" to
> the current encoding (UTF) (or always to UTF?)
> ----
>
> I'm looking for something like:
>
> ----
> src_enc = "utf8"
> tgt_enc = "cp1250"
>
> str = regimes.translate(str, src_enc, tgt_enc)
> ----
>
> Any idea?

is there a reason not to stick to utf?

Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua (renaming attachments to .scr_)
  2012-02-16 12:08             ` Hans Hagen
@ 2012-02-16 13:15               ` Procházka Lukáš Ing. - Pontex s. r. o.
       [not found]               ` <op.v9rwejuttpjj8f@lpr>
  1 sibling, 0 replies; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-16 13:15 UTC (permalink / raw)
  To: Hans Hagen; +Cc: ConTeXt

[-- Attachment #1: Type: text/plain, Size: 1785 bytes --]

On Thu, 16 Feb 2012 13:08:09 +0100, Hans Hagen <pragma@wxs.nl> wrote:

> no, although it's no big deal to provide that (of course there is then
> the matter of utf being more complete than the target)
>
>> ----
>> src_enc = "utf8"
>> tgt_enc = "cp1250"
>>
>> str = regimes.translate(str, src_enc, tgt_enc)
>> ----
>
> is there a reason not to stick to utf?
>
> Hans

Well - I'm working with a .cld document (with UTF encoding). There are some functions which typeset texts. And there is also a part which creates a .scr file.

.Scr files are sequences of AutoCAD commands - their contents are passed directly to AutoCAD command prompt.

When AutoCAD is creating a text entity, it reads the input stream (in our case: the .scr file) BYTE-PER-BYTE. When bytes represent a text to be drawn, unknown bytes (= bytes that don't have any graphical representation in AutoCAD font file ("shape" file in AutoCAD's terminology)) are shown as "?".

Of course, valid representation of language-specific-characters (like "čřž..." in Czech) requires an appropriate .shx (= "shape compiled") file.

Anyway, when AutoCAD is to write "č", it requires just ONE BYTE to be passed to it; so 2-byte UTF representation gives bad result (= "??").

So back to the origin, when I call the .cld's function that writes a command to the .scr file, I need to convert a UTF string back to CP 1250.

Would it be possible to provide this?

NB: There are two examples of .scr files; CP1250.scr works well in AutoCAD, the latter draws "????ST" instead of "ČÁST".

Kind reagrds,

Lukas


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

[-- Attachment #2: UTF.scr_ --]
[-- Type: application/octet-stream, Size: 60 bytes --]

_.TEXT _ST ROMAND _J _R _NON 14.48,0.0925 0.007 0.0 ČÁST

[-- Attachment #3: CP1250.scr_ --]
[-- Type: application/octet-stream, Size: 58 bytes --]

_.TEXT _ST ROMAND _J _R _NON 14.48,0.0925 0.007 0.0 ÈÁST

[-- Attachment #4: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua (renaming attachments to .scr_)
       [not found]               ` <op.v9rwejuttpjj8f@lpr>
@ 2012-02-16 22:56                 ` Hans Hagen
  2012-02-17  8:09                   ` UTF conversion via Lua Procházka Lukáš Ing. - Pontex s. r. o.
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Hagen @ 2012-02-16 22:56 UTC (permalink / raw)
  To: "Procházka Lukáš Ing. - Pontex s. r. o.",
	mailing list for ConTeXt users

On 16-2-2012 14:14, Procházka Lukáš Ing. - Pontex s. r. o. wrote:

> Would it be possible to provide this?

I'll provide:

regimes.toregime('8859-1',"abcde Ä","?")

but you'll have to test and wikify it.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-16 22:56                 ` Hans Hagen
@ 2012-02-17  8:09                   ` Procházka Lukáš Ing. - Pontex s. r. o.
  2012-02-17  8:19                     ` Hans Hagen
  0 siblings, 1 reply; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-17  8:09 UTC (permalink / raw)
  To: ConTeXt

Hello Hans,

thank you for the extension; I've tested and it works perfectly.

On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen <pragma@wxs.nl> wrote:

> regimes.toregime('8859-1',"abcde Ä","?")
>
> but you'll have to test and wikify it.

I'll going to wikify it -

- I supppose:

	regimes.toregime(<target-regime>, <text-to-convert>, <third-arg>)

so question - what is the <third-argment> used for?

Maybe as default character when the UTF code cannot be mapped to <target-regime>?

(It didn't happen in my case, so I can just estimate what <third-arg> is for.)

Best regards,

Lukas


> Hans


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-17  8:09                   ` UTF conversion via Lua Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-17  8:19                     ` Hans Hagen
  2012-02-17  9:20                       ` Procházka Lukáš Ing. - Pontex s. r. o.
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Hagen @ 2012-02-17  8:19 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 17-2-2012 09:09, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> Hello Hans,
>
> thank you for the extension; I've tested and it works perfectly.
>
> On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen <pragma@wxs.nl> wrote:
>
>> regimes.toregime('8859-1',"abcde Ä","?")
>>
>> but you'll have to test and wikify it.
>
> I'll going to wikify it -
>
> - I supppose:
>
> regimes.toregime(<target-regime>, <text-to-convert>, <third-arg>)
>
> so question - what is the <third-argment> used for?
>
> Maybe as default character when the UTF code cannot be mapped to
> <target-regime>?

yes

> (It didn't happen in my case, so I can just estimate what <third-arg> is
> for.)

then you should make a test for it (just take some chinese character and 
see if it becomes a ?)

Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UTF conversion via Lua
  2012-02-17  8:19                     ` Hans Hagen
@ 2012-02-17  9:20                       ` Procházka Lukáš Ing. - Pontex s. r. o.
  0 siblings, 0 replies; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-17  9:20 UTC (permalink / raw)
  To: ConTeXt

On Fri, 17 Feb 2012 09:19:16 +0100, Hans Hagen <pragma@wxs.nl> wrote:

>> On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen <pragma@wxs.nl> wrote:
>>
>>> regimes.toregime('8859-1',"abcde Ä","?")
>>>
>>> but you'll have to test and wikify it.

Wikified - http://wiki.contextgarden.net/Encodings_and_Regimes#Conversion_between_encodings.

Lukas


-- 
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o.      [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4

Tel: +420 244 062 238
Fax: +420 244 461 038

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-02-17  9:20 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-10 10:22 UTF conversion via Lua Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 10:26 ` Thomas A. Schmitz
2012-02-10 10:57 ` Philipp Gesang
2012-02-10 11:11   ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 11:14     ` luigi scarso
2012-02-13 11:42       ` Ulrike Fischer
2012-02-13 13:14         ` luigi scarso
2012-02-10 11:15     ` Wolfgang Schuster
2012-02-10 11:32       ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 12:25         ` Wolfgang Schuster
2012-02-10 13:15           ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 15:08             ` Hans Hagen
2012-02-16 11:13           ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-16 12:08             ` Hans Hagen
2012-02-16 13:15               ` UTF conversion via Lua (renaming attachments to .scr_) Procházka Lukáš Ing. - Pontex s. r. o.
     [not found]               ` <op.v9rwejuttpjj8f@lpr>
2012-02-16 22:56                 ` Hans Hagen
2012-02-17  8:09                   ` UTF conversion via Lua Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-17  8:19                     ` Hans Hagen
2012-02-17  9:20                       ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 11:40       ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 11:30     ` Philipp Gesang
2012-02-10 11:13   ` UTF conversion via Lua (now with attachment) Procházka Lukáš Ing. - Pontex s. r. o.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).