* [9fans] ext2srv understands only 7bit ASCII file names?
@ 2011-10-13 11:15 slash
2011-10-13 11:37 ` dexen deVries
2011-10-13 13:22 ` Russ Cox
0 siblings, 2 replies; 9+ messages in thread
From: slash @ 2011-10-13 11:15 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
I have some files on an external ext2 drive that have whitespace and
umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv
presents umlauts as a question mark symbol (�) and won't let me access
the file (error: file does not exist).
Where is the problem? These files show correctly in linux.
As a workaround I can certainly boot that other OS and rename the
files. It's just every time I see that penguin I get a rash.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] ext2srv understands only 7bit ASCII file names?
2011-10-13 11:15 [9fans] ext2srv understands only 7bit ASCII file names? slash
@ 2011-10-13 11:37 ` dexen deVries
2011-10-13 13:20 ` erik quanstrom
2011-10-13 13:22 ` Russ Cox
1 sibling, 1 reply; 9+ messages in thread
From: dexen deVries @ 2011-10-13 11:37 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Thursday 13 of October 2011 13:15:57 slash wrote:
> I have some files on an external ext2 drive that have whitespace and
> umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv
> presents umlauts as a question mark symbol (�) and won't let me access
> the file (error: file does not exist).
i believe -- but i am not sure! -- that linux stores and reads names on
ext2/3/4 without any conversion between filesystem and I/O syscalls like
open(). if you have iso8859-1 or similar single-byte locale on linux, your
ext2 contains iso8859-1 encoded filenames.
to the contrary, for thos filesystems that always store file names in UTF-16 or
similar (NTFS, FAT32 with LFN, Jolliet extension of ISO9660 etc.), there's
`iocharset' mount option that converts between on-disk UTF-16 and I/O syscalls
like open(). normally you set it to match your locale settings. but for
ext2/3/4, anything goes literally, literally.
you'd need to convert the pathnames, either one-time on disk or upon every r/o
access (yuck!).
it may be sensible to use only UTF8 locale on linux, like LANG=en_US.utf8, but
that'll not update names stored in ext2/3/4 filesystem automagically. it's just
about interpretation.
again, that's what i believe, but i dunno how to verify that. any ideas?
--
dexen deVries
[[[↓][→]]]
http://xkcd.com/732/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] ext2srv understands only 7bit ASCII file names?
2011-10-13 11:37 ` dexen deVries
@ 2011-10-13 13:20 ` erik quanstrom
2011-10-13 14:28 ` slash
0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2011-10-13 13:20 UTC (permalink / raw)
To: 9fans
On Thu Oct 13 07:38:54 EDT 2011, dexen.devries@gmail.com wrote:
> On Thursday 13 of October 2011 13:15:57 slash wrote:
> > I have some files on an external ext2 drive that have whitespace and
> > umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv
> > presents umlauts as a question mark symbol (�) and won't let me access
> > the file (error: file does not exist).
>
> i believe -- but i am not sure! -- that linux stores and reads names on
> ext2/3/4 without any conversion between filesystem and I/O syscalls like
> open(). if you have iso8859-1 or similar single-byte locale on linux, your
> ext2 contains iso8859-1 encoded filenames.
correct.
if you know what the charset on disk is, you could probablly hack ext2fs
into translating names. or (less hacky) you could write a transliterating fs,
or add this to trfs' duties.
i don't know if this i helpful, but if you use p9p tools you will always get utf8,
without any oddness. it used to be easier because the system tools weren't
trying so hard to break utf-8. it used to just all work. ymmv with a utf-8
locale. i found it messed up some scripts because the beauty of locale is that
you just can't count on the format of anything.
- erik
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] ext2srv understands only 7bit ASCII file names?
2011-10-13 11:15 [9fans] ext2srv understands only 7bit ASCII file names? slash
2011-10-13 11:37 ` dexen deVries
@ 2011-10-13 13:22 ` Russ Cox
1 sibling, 0 replies; 9+ messages in thread
From: Russ Cox @ 2011-10-13 13:22 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Oct 13, 2011 at 07:16, slash wrote:
> I have some files on an external ext2 drive that have whitespace and
> umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv
> presents umlauts as a question mark symbol (�) and won't let me access
> the file (error: file does not exist).
>
> Where is the problem? These files show correctly in linux.
The names are probably encoded in latin-1, as dexen said.
One option is to change your Linux locale and rename all your files.
Another is to change ext2srv to interpret disk names as Latin-1
if given a flag (say, -1). A third, and perhaps the easiest,
is to use trfs to translate between UTF-8 names and Latin-1 names.
I say perhaps because it is possible that the kernel will reject
the Latin-1 as being malformed UTF-8, but I think the odds are
good that it will just let it through.
Russ
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] ext2srv understands only 7bit ASCII file names?
2011-10-13 13:20 ` erik quanstrom
@ 2011-10-13 14:28 ` slash
2011-10-13 15:20 ` Russ Cox
0 siblings, 1 reply; 9+ messages in thread
From: slash @ 2011-10-13 14:28 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> if you know what the charset on disk is, you could probablly hack ext2fs
> into translating names. or (less hacky) you could write a transliterating fs,
> or add this to trfs' duties.
Thank you. So now I know ext2srv is not doing any file name conversion. Good.
Say I wanted to add the following capability to trfs: convert latin-1
ä and ö into their utf equivalents. I guess I would just follow the
example of whitespace handling etc in trfs.c and recompile. Now, where
is the latin-1 code table again...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] ext2srv understands only 7bit ASCII file names?
2011-10-13 14:28 ` slash
@ 2011-10-13 15:20 ` Russ Cox
2011-10-13 15:25 ` slash.9fans
0 siblings, 1 reply; 9+ messages in thread
From: Russ Cox @ 2011-10-13 15:20 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> example of whitespace handling etc in trfs.c and recompile. Now, where
> is the latin-1 code table again...
latin-1 bytes 00-FF turn into unicode runes 00-FF.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] ext2srv understands only 7bit ASCII file names?
2011-10-13 15:20 ` Russ Cox
@ 2011-10-13 15:25 ` slash.9fans
2011-10-13 15:30 ` erik quanstrom
0 siblings, 1 reply; 9+ messages in thread
From: slash.9fans @ 2011-10-13 15:25 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> latin-1 bytes 00-FF turn into unicode runes 00-FF.
Then why doesn't it Just Work? Now I am confused (again).
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] ext2srv understands only 7bit ASCII file names?
2011-10-13 15:25 ` slash.9fans
@ 2011-10-13 15:30 ` erik quanstrom
2011-10-16 13:19 ` slash
0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2011-10-13 15:30 UTC (permalink / raw)
To: 9fans
On Thu Oct 13 11:27:00 EDT 2011, slash.9fans@gmail.com wrote:
>
> > latin-1 bytes 00-FF turn into unicode runes 00-FF.
>
> Then why doesn't it Just Work? Now I am confused (again).
unicode codepoints (runes) are abstract. we need to deal with encodings.
the encoding utf-8 uses is not a single byte for anything above 0x7f.
so essentially the encoding phase would be name[i] = (uchar)r. the decoding phase
would be r = (Rune)name[i].
i think you can see the decoding in action in upas/fs.
but you can probablly write the code faster.
- erik
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] ext2srv understands only 7bit ASCII file names?
2011-10-13 15:30 ` erik quanstrom
@ 2011-10-16 13:19 ` slash
0 siblings, 0 replies; 9+ messages in thread
From: slash @ 2011-10-16 13:19 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> unicode codepoints (runes) are abstract. we need to deal with encodings.
> the encoding utf-8 uses is not a single byte for anything above 0x7f.
> so essentially the encoding phase would be name[i] = (uchar)r. the decoding phase
> would be r = (Rune)name[i].
Thank you. I modified trfs.c and wrote trfs.latin1 which does this.
Now I can do:
disk/partfs /dev/sdU7.0/data
disk/fdisk -p /dev/sdXX/data >/dev/sdXX/ctl
ext2srv -r -f /dev/sdXX/linux
trfs.latin1 /srv/ext2
mount /srv/trfs /n/ext2
cd /n/ext2
dircp . $home
and get no errors.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-10-16 13:19 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-13 11:15 [9fans] ext2srv understands only 7bit ASCII file names? slash
2011-10-13 11:37 ` dexen deVries
2011-10-13 13:20 ` erik quanstrom
2011-10-13 14:28 ` slash
2011-10-13 15:20 ` Russ Cox
2011-10-13 15:25 ` slash.9fans
2011-10-13 15:30 ` erik quanstrom
2011-10-16 13:19 ` slash
2011-10-13 13:22 ` Russ Cox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).