* [TUHS] Canonical Historic Approach to iconv(1)
@ 2024-11-27 18:56 segaloco via TUHS
2024-11-27 19:08 ` [TUHS] " Henry Bent
0 siblings, 1 reply; 4+ messages in thread
From: segaloco via TUHS @ 2024-11-27 18:56 UTC (permalink / raw)
To: The Eunuchs Hysterical Society
So a project I'm working on recently includes a need to store UTF-8 Japanese kana text in source files for readability, but then process those source files through tools only guaranteed to support single-byte code points, with something mapping the UTF-8 code points to single-byte points in the destination execution environment. After a bit of futzing, I've landed on the definition of iconv(1) provided by the Single UNIX Specification to push this character mapping concern to the tip of my pipelines. It is working well thus far and insulates the utilities down-pipe from needing multi-byte support (I'm looking at you Apple).
I started thumbing through my old manuals and noted that iconv(1) is not a historic utility, rather, SUS picked it up from HP-UX along the way.
Was there any older utility or set of practices for converting files between character encodings besides the ASCII/EBCDIC stuff in dd(1)? As I understand it, iconv(1) is just recognizing sequences of bytes, mapping them to a symbolic name, then emitting them in the complementary series of bytes assigned to that symbolic name in a second charmap file. This sounds like a simple filter operation that could be done in a few other ways. I'm curious if any particular approach was relatively ubiquitous, or if this was an exercise largely left to the individual and so solutions were wide and varied? My tool chain doesn't need to work on historic UNIX, but it would be cool to understand how to make it work on the least common denominator.
- Matt G.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [TUHS] Re: Canonical Historic Approach to iconv(1)
2024-11-27 18:56 [TUHS] Canonical Historic Approach to iconv(1) segaloco via TUHS
@ 2024-11-27 19:08 ` Henry Bent
2024-11-28 0:07 ` segaloco via TUHS
0 siblings, 1 reply; 4+ messages in thread
From: Henry Bent @ 2024-11-27 19:08 UTC (permalink / raw)
To: segaloco; +Cc: The Eunuchs Hysterical Society
[-- Attachment #1: Type: text/plain, Size: 361 bytes --]
On Wed, 27 Nov 2024 at 13:56, segaloco via TUHS <tuhs@tuhs.org> wrote:
> I started thumbing through my old manuals and noted that iconv(1) is not a
> historic utility, rather, SUS picked it up from HP-UX along the way.
>
I see iconv(1) (and iconv(5)) in the SVR4 sources, but I don't see any
references to HP there - what manpages are you looking at?
-Henry
[-- Attachment #2: Type: text/html, Size: 665 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* [TUHS] Re: Canonical Historic Approach to iconv(1)
2024-11-27 19:08 ` [TUHS] " Henry Bent
@ 2024-11-28 0:07 ` segaloco via TUHS
2024-11-28 0:57 ` Greg A. Woods
0 siblings, 1 reply; 4+ messages in thread
From: segaloco via TUHS @ 2024-11-28 0:07 UTC (permalink / raw)
To: The Eunuchs Hysterical Society
On Wednesday, November 27th, 2024 at 11:08 AM, Henry Bent <henry.r.bent@gmail.com> wrote:
> On Wed, 27 Nov 2024 at 13:56, segaloco via TUHS <tuhs@tuhs.org> wrote:
>
> > I started thumbing through my old manuals and noted that iconv(1) is not a historic utility, rather, SUS picked it up from HP-UX along the way.
>
>
> I see iconv(1) (and iconv(5)) in the SVR4 sources, but I don't see any references to HP there - what manpages are you looking at?
>
> -Henry
My mistake, the HP-UX reference was for iconv(3), not iconv(1). The source is the current issue of POSIX, Issue 8 (2024). Indeed iconv(1) is in the SVR4 manuals but only supporting system-provided charmaps. Additionally, while in the SVR4 manuals, I didn't spot it on first pass through SVID Issue 3 which is the SVR4-era issue. It looks like specifying local charmap files was added to the spec in IEEE 1003.1-2004:
> Issue 6
> This utility has been rewritten to align with the IEEE P1003.2b draft standard. Specifically, the ability to use charmap files for conversion has been added.
- Matt G.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [TUHS] Re: Canonical Historic Approach to iconv(1)
2024-11-28 0:07 ` segaloco via TUHS
@ 2024-11-28 0:57 ` Greg A. Woods
0 siblings, 0 replies; 4+ messages in thread
From: Greg A. Woods @ 2024-11-28 0:57 UTC (permalink / raw)
To: The Unix Heritage Society mailing list
[-- Attachment #1: Type: text/plain, Size: 1142 bytes --]
At Thu, 28 Nov 2024 00:07:56 +0000, segaloco via TUHS <tuhs@tuhs.org> wrote:
Subject: [TUHS] Re: Canonical Historic Approach to iconv(1)
>
> My mistake, the HP-UX reference was for iconv(3), not iconv(1). The
> source is the current issue of POSIX, Issue 8 (2024). Indeed iconv(1)
> is in the SVR4 manuals but only supporting system-provided charmaps.
> Additionally, while in the SVR4 manuals, I didn't spot it on first
> pass through SVID Issue 3 which is the SVR4-era issue.
It is documented in the System V Interface Definition, Fourth Edition,
Volume 2, Pages 211,212.
There's no mention of Unicode or ISO 10646 of course -- they're too new,
at least to be in any system standards by that time! After all SVID-4
was only getting caught up with POSIX 1003.1-1990. UTF-8 didn't even
show up more widely until early 1993, and wasn't in force in the IETF
until 1998 so you can't really expect SVID-4 to use it (even in 1995)
for a 1990-based standard.
--
Greg A. Woods <gwoods@acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca>
[-- Attachment #2: OpenPGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-11-28 3:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-27 18:56 [TUHS] Canonical Historic Approach to iconv(1) segaloco via TUHS
2024-11-27 19:08 ` [TUHS] " Henry Bent
2024-11-28 0:07 ` segaloco via TUHS
2024-11-28 0:57 ` Greg A. Woods
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).