* Re: UTF conversion via Lua
2012-02-10 11:11 ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 11:14 ` luigi scarso
2012-02-13 11:42 ` Ulrike Fischer
2012-02-10 11:15 ` Wolfgang Schuster
2012-02-10 11:30 ` Philipp Gesang
2 siblings, 1 reply; 22+ messages in thread
From: luigi scarso @ 2012-02-10 11:14 UTC (permalink / raw)
To: mailing list for ConTeXt users
2012/2/10 Procházka Lukáš Ing. - Pontex s. r. o. <LPr@pontex.cz>:
> ... Well, my information was not correct.
>
> There are characters > 127 in the file, like "ř", "š"...
>
> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters
> are displayed correctly.
>
> But I have problem loading them into ConTeXt.
>
> I need to convert the bytes > 127 to UTF sequence, which would be acceptable
> by ConTeXt.
>
> @Thomas:
>
> The table looks nice but there are no entries for CP 1250 to UTF conversion.
>
> I prepared some tables: character conversion and removal of diacritics (see
> the attachment);
> maybe it would be handful to include them into ConTeXt somehow.
>
> Best regards,
>
> Lukas
To avoid confusion :
If you mean ASCII with coderange 0-127, there is no need to conversion;
if you mean ASCII with coderange 0-255 *and* ISO-8859-1 (Latin 1)
encoding there is no need to conversion;
otherwise you need to specify an encoding (i.e. CP 1250)
From wikipedia
"""
Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a
much wider array of characters, and their various encoding forms have
begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments.
While ASCII is limited to 128 characters, Unicode and the UCS support
more characters by separating the concepts of unique identification
(using natural numbers called code points) and encoding (to 8-, 16- or
32-bit binary formats, called UTF-8, UTF-16 and UTF-32).
To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1
(Latin 1) characters are assigned Unicode/UCS code points that are the
same as their codes in the earlier standards. Therefore, ASCII can be
considered a 7-bit encoding scheme for a very small subset of
Unicode/UCS, and, conversely, the UTF-8 encoding forms are
binary-compatible with ASCII for code points below 128, meaning all
ASCII is valid UTF-8. The other encoding forms resemble ASCII in how
they represent the first 128 characters of Unicode, but use 16 or 32
bits per character, so they require conversion for compatibility.
(similarly UCS-2 is upwards compatible with UTF-16)
"""
If you have iconv, convert between encoding is easy --- you can always
call it as an external program with os.execute(cmd)
--
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-10 11:14 ` luigi scarso
@ 2012-02-13 11:42 ` Ulrike Fischer
2012-02-13 13:14 ` luigi scarso
0 siblings, 1 reply; 22+ messages in thread
From: Ulrike Fischer @ 2012-02-13 11:42 UTC (permalink / raw)
To: ntg-context
Am Fri, 10 Feb 2012 12:14:15 +0100 schrieb luigi scarso:
> if you mean ASCII with coderange 0-255 *and* ISO-8859-1 (Latin 1)
> encoding there is no need to conversion;
This is not true. You are mixing up unicode positions and utf8
encoding.
E.g. "ä" has the same position in unicode and latin1 (dez 228, hex
E4). But its utf8 code consist of 16 bits (1100001110100100, hex
c3a4) while its latin 1 code is 8-bit long (11100100).
--
Ulrike Fischer
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-13 11:42 ` Ulrike Fischer
@ 2012-02-13 13:14 ` luigi scarso
0 siblings, 0 replies; 22+ messages in thread
From: luigi scarso @ 2012-02-13 13:14 UTC (permalink / raw)
To: mailing list for ConTeXt users
On Mon, Feb 13, 2012 at 12:42 PM, Ulrike Fischer <news3@nililand.de> wrote:
> Am Fri, 10 Feb 2012 12:14:15 +0100 schrieb luigi scarso:
>
>> if you mean ASCII with coderange 0-255 *and* ISO-8859-1 (Latin 1)
>> encoding there is no need to conversion;
>
> This is not true. You are mixing up unicode positions and utf8
> encoding.
>
> E.g. "ä" has the same position in unicode and latin1 (dez 228, hex
> E4). But its utf8 code consist of 16 bits (1100001110100100, hex
> c3a4) while its latin 1 code is 8-bit long (11100100).
ah yes you are right -- I've made the implicit assumption that his
file was already utf-8 encoded .
I'm using only utf-8 from long time and I almost forget about
! String contains an invalid utf-8 sequence.
system > tex > error on line 10 in file t1.txt: String
contains an invalid utf-8 sequence ...
(I believe he met the error during the next tries because he wrote
> I cannot \input the file as this is not a valid ConTeXt source.
)
What I meant was, as I wrote below,
"To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1
(Latin 1) characters are assigned Unicode/UCS code points that are the
same as their codes in the earlier standards"
and this is true only for iso-8859-1 .
--
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-10 11:11 ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 11:14 ` luigi scarso
@ 2012-02-10 11:15 ` Wolfgang Schuster
2012-02-10 11:32 ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 11:40 ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 11:30 ` Philipp Gesang
2 siblings, 2 replies; 22+ messages in thread
From: Wolfgang Schuster @ 2012-02-10 11:15 UTC (permalink / raw)
To: mailing list for ConTeXt users
Am 10.02.2012 um 12:11 schrieb Procházka Lukáš Ing. - Pontex s. r. o.:
> ... Well, my information was not correct.
>
> There are characters > 127 in the file, like "ř", "š"...
>
> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.
>
> But I have problem loading them into ConTeXt.
>
> I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.
>
> @Thomas:
>
> The table looks nice but there are no entries for CP 1250 to UTF conversion.
>
> I prepared some tables: character conversion and removal of diacritics (see the attachment);
> maybe it would be handful to include them into ConTeXt somehow.
Why don’t you let do context the conversion:
\starttext
this is something in utf8
\startregime[cp1250]
\input filewithcp1250encoding
\stopregime
more text encoded in utf8
\stoptext
Wolfgang
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-10 11:15 ` Wolfgang Schuster
@ 2012-02-10 11:32 ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 12:25 ` Wolfgang Schuster
2012-02-10 11:40 ` Procházka Lukáš Ing. - Pontex s. r. o.
1 sibling, 1 reply; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-10 11:32 UTC (permalink / raw)
To: mailing list for ConTeXt users
[-- Attachment #1: Type: text/plain, Size: 2275 bytes --]
... \enableregime - nice idea!
Despite this, I'm still not able to make work the example:
---- Test.mkiv
\enableregime[cp1250]
\starttext
\startluacode
function loadFile(fn)
local fh = assert(io.open(fn, "r"))
local str = fh:read("*all")
fh:close()
return str
end
context.startregime{"cp1250"}
context(loadFile("a.txt"))
context.stopregime()
\stopluacode
\stoptext
----
Where's the problem?
Lukas
On Fri, 10 Feb 2012 12:15:29 +0100, Wolfgang Schuster <schuster.wolfgang@googlemail.com> wrote:
>
> Am 10.02.2012 um 12:11 schrieb Procházka Lukáš Ing. - Pontex s. r. o.:
>
>> ... Well, my information was not correct.
>>
>> There are characters > 127 in the file, like "ř", "š"...
>>
>> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.
>>
>> But I have problem loading them into ConTeXt.
>>
>> I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.
>>
>> @Thomas:
>>
>> The table looks nice but there are no entries for CP 1250 to UTF conversion.
>>
>> I prepared some tables: character conversion and removal of diacritics (see the attachment);
>> maybe it would be handful to include them into ConTeXt somehow.
>
> Why don’t you let do context the conversion:
>
> \starttext
>
> this is something in utf8
>
> \startregime[cp1250]
> \input filewithcp1250encoding
> \stopregime
>
> more text encoded in utf8
>
> \stoptext
>
> Wolfgang
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive : http://foundry.supelec.fr/projects/contextrev/
> wiki : http://contextgarden.net
> ___________________________________________________________________________________
--
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4
Tel: +420 244 062 238
Fax: +420 244 461 038
[-- Attachment #2: a.txt --]
[-- Type: text/plain, Size: 9 bytes --]
abc
ý
[-- Attachment #3: Test.mkiv --]
[-- Type: application/octet-stream, Size: 332 bytes --]
\enableregime[cp1250]
\starttext
\startluacode
function loadFile(fn)
local fh = assert(io.open(fn, "r"))
local str = fh:read("*all")
fh:close()
return str
end
context.startregime{"cp1250"}
context(loadFile("a.txt"))
context.stopregime()
\stopluacode
\stoptext
[-- Attachment #4: Type: text/plain, Size: 485 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-10 11:32 ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 12:25 ` Wolfgang Schuster
2012-02-10 13:15 ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-16 11:13 ` Procházka Lukáš Ing. - Pontex s. r. o.
0 siblings, 2 replies; 22+ messages in thread
From: Wolfgang Schuster @ 2012-02-10 12:25 UTC (permalink / raw)
To: mailing list for ConTeXt users
Am 10.02.2012 um 12:32 schrieb Procházka Lukáš Ing. - Pontex s. r. o.:
> ... \enableregime - nice idea!
>
> Despite this, I'm still not able to make work the example:
>
> ---- Test.mkiv
> \enableregime[cp1250]
>
> \starttext
> \startluacode
> function loadFile(fn)
> local fh = assert(io.open(fn, "r"))
> local str = fh:read("*all")
>
> fh:close()
>
> return str
> end
>
> context.startregime{"cp1250"}
> context(loadFile("a.txt"))
> context.stopregime()
> \stopluacode
> \stoptext
> ----
>
> Where's the problem?
Dunno but it works when you use “regimes.translate” in your code but it’s better
to ask Hans for a function in the commands namespace which you can use.
\starttext
\startluacode
function loadFile(fn)
local fh = assert(io.open(fn, "r"))
local str = fh:read("*all")
fh:close()
str = regimes.translate(str,"cp1250")
context(str)
end
loadFile("a.txt")
\stopluacode
\stoptext
Wolfgang
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-10 12:25 ` Wolfgang Schuster
@ 2012-02-10 13:15 ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 15:08 ` Hans Hagen
2012-02-16 11:13 ` Procházka Lukáš Ing. - Pontex s. r. o.
1 sibling, 1 reply; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-10 13:15 UTC (permalink / raw)
To: mailing list for ConTeXt users
> Dunno but it works when you use “regimes.translate” in your code but it’s better
> to ask Hans for a function in the commands namespace which you can use.
>
> \starttext
>
> \startluacode
>
> function loadFile(fn)
> local fh = assert(io.open(fn, "r"))
> local str = fh:read("*all")
> fh:close()
> str = regimes.translate(str,"cp1250")
> context(str)
> end
>
> loadFile("a.txt")
>
> \stopluacode
>
> \stoptext
>
> Wolfgang
Thank you, Wolfgang.
Your code works perfectly and does exactly what I need.
Best regards,
Lukas
--
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4
Tel: +420 244 062 238
Fax: +420 244 461 038
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-10 13:15 ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 15:08 ` Hans Hagen
0 siblings, 0 replies; 22+ messages in thread
From: Hans Hagen @ 2012-02-10 15:08 UTC (permalink / raw)
To: mailing list for ConTeXt users
On 10-2-2012 14:15, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
>> Dunno but it works when you use “regimes.translate” in your code but
>> it’s better
>> to ask Hans for a function in the commands namespace which you can use.
>>
>> \starttext
>>
>> \startluacode
>>
>> function loadFile(fn)
>> local fh = assert(io.open(fn, "r"))
>> local str = fh:read("*all")
>> fh:close()
>> str = regimes.translate(str,"cp1250")
>> context(str)
>> end
>>
>> loadFile("a.txt")
>>
>> \stopluacode
>>
>> \stoptext
>>
>> Wolfgang
>
> Thank you, Wolfgang.
>
> Your code works perfectly and does exactly what I need.
As oneliner ...
function document.MyLoadFile(name)
context(regimes.translate(io.loaddata(resolvers.findfile(name)),"cp1250"))
end
(resolvers will look up in the tree if needed)
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-10 12:25 ` Wolfgang Schuster
2012-02-10 13:15 ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-16 11:13 ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-16 12:08 ` Hans Hagen
1 sibling, 1 reply; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-16 11:13 UTC (permalink / raw)
To: mailing list for ConTeXt users
Hello,
one more question -
- does "regimes.translate" allow to translate between arbitrary encodings or only from the specified to the current one?
----
str = regimes.translate(str, "cp1250") -- = Translate from "cp1250" to the current encoding (UTF) (or always to UTF?)
----
I'm looking for something like:
----
src_enc = "utf8"
tgt_enc = "cp1250"
str = regimes.translate(str, src_enc, tgt_enc)
----
Any idea?
Best regards,
Lukas
On Fri, 10 Feb 2012 13:25:40 +0100, Wolfgang Schuster <schuster.wolfgang@googlemail.com> wrote:
> str = regimes.translate(str,"cp1250")
--
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4
Tel: +420 244 062 238
Fax: +420 244 461 038
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-16 11:13 ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-16 12:08 ` Hans Hagen
2012-02-16 13:15 ` UTF conversion via Lua (renaming attachments to .scr_) Procházka Lukáš Ing. - Pontex s. r. o.
[not found] ` <op.v9rwejuttpjj8f@lpr>
0 siblings, 2 replies; 22+ messages in thread
From: Hans Hagen @ 2012-02-16 12:08 UTC (permalink / raw)
To: mailing list for ConTeXt users
On 16-2-2012 12:13, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> Hello,
>
> one more question -
>
> - does "regimes.translate" allow to translate between arbitrary
> encodings or only from the specified to the current one?
no, although it's no big deal to provide that (of course there is then
the matter of utf being more complete than the target)
> ----
> str = regimes.translate(str, "cp1250") -- = Translate from "cp1250" to
> the current encoding (UTF) (or always to UTF?)
> ----
>
> I'm looking for something like:
>
> ----
> src_enc = "utf8"
> tgt_enc = "cp1250"
>
> str = regimes.translate(str, src_enc, tgt_enc)
> ----
>
> Any idea?
is there a reason not to stick to utf?
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua (renaming attachments to .scr_)
2012-02-16 12:08 ` Hans Hagen
@ 2012-02-16 13:15 ` Procházka Lukáš Ing. - Pontex s. r. o.
[not found] ` <op.v9rwejuttpjj8f@lpr>
1 sibling, 0 replies; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-16 13:15 UTC (permalink / raw)
To: Hans Hagen; +Cc: ConTeXt
[-- Attachment #1: Type: text/plain, Size: 1785 bytes --]
On Thu, 16 Feb 2012 13:08:09 +0100, Hans Hagen <pragma@wxs.nl> wrote:
> no, although it's no big deal to provide that (of course there is then
> the matter of utf being more complete than the target)
>
>> ----
>> src_enc = "utf8"
>> tgt_enc = "cp1250"
>>
>> str = regimes.translate(str, src_enc, tgt_enc)
>> ----
>
> is there a reason not to stick to utf?
>
> Hans
Well - I'm working with a .cld document (with UTF encoding). There are some functions which typeset texts. And there is also a part which creates a .scr file.
.Scr files are sequences of AutoCAD commands - their contents are passed directly to AutoCAD command prompt.
When AutoCAD is creating a text entity, it reads the input stream (in our case: the .scr file) BYTE-PER-BYTE. When bytes represent a text to be drawn, unknown bytes (= bytes that don't have any graphical representation in AutoCAD font file ("shape" file in AutoCAD's terminology)) are shown as "?".
Of course, valid representation of language-specific-characters (like "čřž..." in Czech) requires an appropriate .shx (= "shape compiled") file.
Anyway, when AutoCAD is to write "č", it requires just ONE BYTE to be passed to it; so 2-byte UTF representation gives bad result (= "??").
So back to the origin, when I call the .cld's function that writes a command to the .scr file, I need to convert a UTF string back to CP 1250.
Would it be possible to provide this?
NB: There are two examples of .scr files; CP1250.scr works well in AutoCAD, the latter draws "????ST" instead of "ČÁST".
Kind reagrds,
Lukas
--
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4
Tel: +420 244 062 238
Fax: +420 244 461 038
[-- Attachment #2: UTF.scr_ --]
[-- Type: application/octet-stream, Size: 60 bytes --]
_.TEXT _ST ROMAND _J _R _NON 14.48,0.0925 0.007 0.0 ČÁST
[-- Attachment #3: CP1250.scr_ --]
[-- Type: application/octet-stream, Size: 58 bytes --]
_.TEXT _ST ROMAND _J _R _NON 14.48,0.0925 0.007 0.0 ÈÁST
[-- Attachment #4: Type: text/plain, Size: 485 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <op.v9rwejuttpjj8f@lpr>]
* Re: UTF conversion via Lua (renaming attachments to .scr_)
[not found] ` <op.v9rwejuttpjj8f@lpr>
@ 2012-02-16 22:56 ` Hans Hagen
2012-02-17 8:09 ` UTF conversion via Lua Procházka Lukáš Ing. - Pontex s. r. o.
0 siblings, 1 reply; 22+ messages in thread
From: Hans Hagen @ 2012-02-16 22:56 UTC (permalink / raw)
To: "Procházka Lukáš Ing. - Pontex s. r. o.",
mailing list for ConTeXt users
On 16-2-2012 14:14, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> Would it be possible to provide this?
I'll provide:
regimes.toregime('8859-1',"abcde Ä","?")
but you'll have to test and wikify it.
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-16 22:56 ` Hans Hagen
@ 2012-02-17 8:09 ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-17 8:19 ` Hans Hagen
0 siblings, 1 reply; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-17 8:09 UTC (permalink / raw)
To: ConTeXt
Hello Hans,
thank you for the extension; I've tested and it works perfectly.
On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen <pragma@wxs.nl> wrote:
> regimes.toregime('8859-1',"abcde Ä","?")
>
> but you'll have to test and wikify it.
I'll going to wikify it -
- I supppose:
regimes.toregime(<target-regime>, <text-to-convert>, <third-arg>)
so question - what is the <third-argment> used for?
Maybe as default character when the UTF code cannot be mapped to <target-regime>?
(It didn't happen in my case, so I can just estimate what <third-arg> is for.)
Best regards,
Lukas
> Hans
--
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4
Tel: +420 244 062 238
Fax: +420 244 461 038
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-17 8:09 ` UTF conversion via Lua Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-17 8:19 ` Hans Hagen
2012-02-17 9:20 ` Procházka Lukáš Ing. - Pontex s. r. o.
0 siblings, 1 reply; 22+ messages in thread
From: Hans Hagen @ 2012-02-17 8:19 UTC (permalink / raw)
To: mailing list for ConTeXt users
On 17-2-2012 09:09, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> Hello Hans,
>
> thank you for the extension; I've tested and it works perfectly.
>
> On Thu, 16 Feb 2012 23:56:44 +0100, Hans Hagen <pragma@wxs.nl> wrote:
>
>> regimes.toregime('8859-1',"abcde Ä","?")
>>
>> but you'll have to test and wikify it.
>
> I'll going to wikify it -
>
> - I supppose:
>
> regimes.toregime(<target-regime>, <text-to-convert>, <third-arg>)
>
> so question - what is the <third-argment> used for?
>
> Maybe as default character when the UTF code cannot be mapped to
> <target-regime>?
yes
> (It didn't happen in my case, so I can just estimate what <third-arg> is
> for.)
then you should make a test for it (just take some chinese character and
see if it becomes a ?)
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-10 11:15 ` Wolfgang Schuster
2012-02-10 11:32 ` Procházka Lukáš Ing. - Pontex s. r. o.
@ 2012-02-10 11:40 ` Procházka Lukáš Ing. - Pontex s. r. o.
1 sibling, 0 replies; 22+ messages in thread
From: Procházka Lukáš Ing. - Pontex s. r. o. @ 2012-02-10 11:40 UTC (permalink / raw)
To: mailing list for ConTeXt users
One more note -
On Fri, 10 Feb 2012 12:15:29 +0100, Wolfgang Schuster <schuster.wolfgang@googlemail.com> wrote:
> Why don’t you let do context the conversion:
>
> \starttext
>
> this is something in utf8
>
> \startregime[cp1250]
> \input filewithcp1250encoding
> \stopregime
>
> more text encoded in utf8
>
> \stoptext
I cannot \input the file as this is not a valid ConTeXt source.
I do (at least) "%" -> "\%" conversion;
that's why I need to use Lua to load file into a string;
the conversion step was removed - to make it simple - in the sample sent previously.
Lukas
--
Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz]
Bezová 1658
147 14 Praha 4
Tel: +420 244 062 238
Fax: +420 244 461 038
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UTF conversion via Lua
2012-02-10 11:11 ` Procházka Lukáš Ing. - Pontex s. r. o.
2012-02-10 11:14 ` luigi scarso
2012-02-10 11:15 ` Wolfgang Schuster
@ 2012-02-10 11:30 ` Philipp Gesang
2 siblings, 0 replies; 22+ messages in thread
From: Philipp Gesang @ 2012-02-10 11:30 UTC (permalink / raw)
To: mailing list for ConTeXt users
[-- Attachment #1.1: Type: text/plain, Size: 3938 bytes --]
On 2012-02-10 12:11, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> ... Well, my information was not correct.
>
> There are characters > 127 in the file, like "ř", "š"...
>
> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.
So it wasn’t ASCII after all ;-) No problem, just use iconv:
iconv -f CP1250 -t UTF8 infile > outfile
I do this a lot with movie subtitles …
Hth, Philipp
PS: If you still insist on converting at the Lua end only then
your starting point might be “regi-cp1250.lua” in the
Context base/ dir.
>
> But I have problem loading them into ConTeXt.
>
> I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.
>
> @Thomas:
>
> The table looks nice but there are no entries for CP 1250 to UTF conversion.
>
> I prepared some tables: character conversion and removal of diacritics (see the attachment);
> maybe it would be handful to include them into ConTeXt somehow.
>
> Best regards,
>
> Lukas
>
>
> On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang <gesang@stud.uni-heidelberg.de> wrote:
>
> >On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> >>Hello,
> >>
> >>I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program.
> >>
> >>When I work with them in ConTeXt, I need to convert them to UTF.
> >
> >Not needed, as every ASCII string is a valid UTF8 string:
> > “The UTF encoding has several good properties. By far the most
> > important is that a byte in the ASCII range 0-127 represents
> > itself in UTF. Thus UTF is backward compatible with ASCII.”
> > http://doc.cat-v.org/plan_9/4th_edition/papers/utf
> >You can use them in Luatex without further conversion.
> >
> >Regards
> >Philipp
> >
> >
> >>
> >>Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion?
> >>
> >>Something like:
> >>
> >>\startluacode
> >> local str = loadFile("a.txt") -- ASCII coded
> >>
> >> str = context.ACSII2UTF(str) -- Or something like this
> >>\stopluacode
> >>
> >>Best regards,
> >>
> >>Lukas
> >>
> >>
> >>--
> >>Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
> >>Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz]
> >>Bezová 1658
> >>147 14 Praha 4
> >>
> >>Tel: +420 244 062 238
> >>Fax: +420 244 461 038
> >>
> >>___________________________________________________________________________________
> >>If your question is of interest to others as well, please add an entry to the Wiki!
> >>
> >>maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> >>webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
> >>archive : http://foundry.supelec.fr/projects/contextrev/
> >>wiki : http://contextgarden.net
> >>___________________________________________________________________________________
> >
>
>
> --
> Ing. Lukáš Procházka [mailto:LPr@pontex.cz]
> Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz]
> Bezová 1658
> 147 14 Praha 4
>
> Tel: +420 244 062 238
> Fax: +420 244 461 038
>
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive : http://foundry.supelec.fr/projects/contextrev/
> wiki : http://contextgarden.net
> ___________________________________________________________________________________
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 485 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 22+ messages in thread