9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] ext2srv understands only 7bit ASCII file names?
@ 2011-10-13 11:15 slash
  2011-10-13 11:37 ` dexen deVries
  2011-10-13 13:22 ` Russ Cox
  0 siblings, 2 replies; 9+ messages in thread
From: slash @ 2011-10-13 11:15 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I have some files on an external ext2 drive that have whitespace and
umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv
presents umlauts as a question mark symbol (�) and won't let me access
the file (error: file does not exist).

Where is the problem? These files show correctly in linux.

As a workaround I can certainly boot that other OS and rename the
files. It's just every time I see that penguin I get a rash.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] ext2srv understands only 7bit ASCII file names?
  2011-10-13 11:15 [9fans] ext2srv understands only 7bit ASCII file names? slash
@ 2011-10-13 11:37 ` dexen deVries
  2011-10-13 13:20   ` erik quanstrom
  2011-10-13 13:22 ` Russ Cox
  1 sibling, 1 reply; 9+ messages in thread
From: dexen deVries @ 2011-10-13 11:37 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thursday 13 of October 2011 13:15:57 slash wrote:
> I have some files on an external ext2 drive that have whitespace and
> umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv
> presents umlauts as a question mark symbol (�) and won't let me access
> the file (error: file does not exist).

i believe -- but i am not sure! -- that linux stores and reads names on 
ext2/3/4 without any conversion between filesystem and I/O syscalls like 
open(). if you have iso8859-1 or similar single-byte locale on linux, your 
ext2 contains iso8859-1 encoded filenames.

to the contrary, for thos filesystems that always store file names in UTF-16 or 
similar (NTFS, FAT32 with LFN, Jolliet extension of ISO9660 etc.), there's 
`iocharset' mount option that converts between on-disk UTF-16 and I/O syscalls 
like open(). normally you set it to match your locale settings. but for 
ext2/3/4, anything goes literally, literally.

you'd need to convert the pathnames, either one-time on disk or upon every r/o 
access (yuck!).

it may be sensible to use only UTF8 locale on linux, like LANG=en_US.utf8, but 
that'll not update names stored in ext2/3/4 filesystem automagically. it's just 
about interpretation.

again, that's what i believe, but i dunno how to verify that. any ideas?

-- 
dexen deVries

[[[↓][→]]]

http://xkcd.com/732/



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] ext2srv understands only 7bit ASCII file names?
  2011-10-13 11:37 ` dexen deVries
@ 2011-10-13 13:20   ` erik quanstrom
  2011-10-13 14:28     ` slash
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2011-10-13 13:20 UTC (permalink / raw)
  To: 9fans

On Thu Oct 13 07:38:54 EDT 2011, dexen.devries@gmail.com wrote:
> On Thursday 13 of October 2011 13:15:57 slash wrote:
> > I have some files on an external ext2 drive that have whitespace and
> > umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv
> > presents umlauts as a question mark symbol (�) and won't let me access
> > the file (error: file does not exist).
>
> i believe -- but i am not sure! -- that linux stores and reads names on
> ext2/3/4 without any conversion between filesystem and I/O syscalls like
> open(). if you have iso8859-1 or similar single-byte locale on linux, your
> ext2 contains iso8859-1 encoded filenames.

correct.

if you know what the charset on disk is, you could probablly hack ext2fs
into translating names.  or (less hacky) you could write a transliterating fs,
or add this to trfs' duties.

i don't know if this i helpful, but if you use p9p tools you will always get utf8,
without any oddness.  it used to be easier because the system tools weren't
trying so hard to break utf-8.  it used to just all work.  ymmv with a utf-8
locale.  i found it messed up some scripts because the beauty of locale is that
you just can't count on the format of anything.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] ext2srv understands only 7bit ASCII file names?
  2011-10-13 11:15 [9fans] ext2srv understands only 7bit ASCII file names? slash
  2011-10-13 11:37 ` dexen deVries
@ 2011-10-13 13:22 ` Russ Cox
  1 sibling, 0 replies; 9+ messages in thread
From: Russ Cox @ 2011-10-13 13:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Oct 13, 2011 at 07:16, slash wrote:
> I have some files on an external ext2 drive that have whitespace and
> umlauts (ä, ö) in them. trfs took care of the whitespace. But ext2srv
> presents umlauts as a question mark symbol (�) and won't let me access
> the file (error: file does not exist).
> 
> Where is the problem? These files show correctly in linux.

The names are probably encoded in latin-1, as dexen said.
One option is to change your Linux locale and rename all your files.
Another is to change ext2srv to interpret disk names as Latin-1
if given a flag (say, -1).  A third, and perhaps the easiest,
is to use trfs to translate between UTF-8 names and Latin-1 names.
I say perhaps because it is possible that the kernel will reject
the Latin-1 as being malformed UTF-8, but I think the odds are
good that it will just let it through.

Russ

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] ext2srv understands only 7bit ASCII file names?
  2011-10-13 13:20   ` erik quanstrom
@ 2011-10-13 14:28     ` slash
  2011-10-13 15:20       ` Russ Cox
  0 siblings, 1 reply; 9+ messages in thread
From: slash @ 2011-10-13 14:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> if you know what the charset on disk is, you could probablly hack ext2fs
> into translating names.  or (less hacky) you could write a transliterating fs,
> or add this to trfs' duties.

Thank you. So now I know ext2srv is not doing any file name conversion. Good.

Say I wanted to add the following capability to trfs: convert latin-1
ä and ö into their utf equivalents. I guess I would just follow the
example of whitespace handling etc in trfs.c and recompile. Now, where
is the latin-1 code table again...



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] ext2srv understands only 7bit ASCII file names?
  2011-10-13 14:28     ` slash
@ 2011-10-13 15:20       ` Russ Cox
  2011-10-13 15:25         ` slash.9fans
  0 siblings, 1 reply; 9+ messages in thread
From: Russ Cox @ 2011-10-13 15:20 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> example of whitespace handling etc in trfs.c and recompile. Now, where
> is the latin-1 code table again...

latin-1 bytes 00-FF turn into unicode runes 00-FF.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] ext2srv understands only 7bit ASCII file names?
  2011-10-13 15:20       ` Russ Cox
@ 2011-10-13 15:25         ` slash.9fans
  2011-10-13 15:30           ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: slash.9fans @ 2011-10-13 15:25 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


> latin-1 bytes 00-FF turn into unicode runes 00-FF.

Then why doesn't it Just Work? Now I am confused (again).




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] ext2srv understands only 7bit ASCII file names?
  2011-10-13 15:25         ` slash.9fans
@ 2011-10-13 15:30           ` erik quanstrom
  2011-10-16 13:19             ` slash
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2011-10-13 15:30 UTC (permalink / raw)
  To: 9fans

On Thu Oct 13 11:27:00 EDT 2011, slash.9fans@gmail.com wrote:
>
> > latin-1 bytes 00-FF turn into unicode runes 00-FF.
>
> Then why doesn't it Just Work? Now I am confused (again).

unicode codepoints (runes) are abstract.  we need to deal with encodings.
the encoding utf-8 uses is not a single byte for anything above 0x7f.
so essentially the encoding phase would be name[i] = (uchar)r.  the decoding phase
would be r = (Rune)name[i].

i think you can see the decoding in action in upas/fs.
but you can probablly write the code faster.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] ext2srv understands only 7bit ASCII file names?
  2011-10-13 15:30           ` erik quanstrom
@ 2011-10-16 13:19             ` slash
  0 siblings, 0 replies; 9+ messages in thread
From: slash @ 2011-10-16 13:19 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> unicode codepoints (runes) are abstract.  we need to deal with encodings.
> the encoding utf-8 uses is not a single byte for anything above 0x7f.
> so essentially the encoding phase would be name[i] = (uchar)r.  the decoding phase
> would be r = (Rune)name[i].

Thank you. I modified trfs.c and wrote trfs.latin1 which does this.
Now I can do:

disk/partfs /dev/sdU7.0/data
disk/fdisk -p /dev/sdXX/data >/dev/sdXX/ctl
ext2srv -r -f /dev/sdXX/linux
trfs.latin1 /srv/ext2
mount /srv/trfs /n/ext2
cd /n/ext2
dircp . $home

and get no errors.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-10-16 13:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-13 11:15 [9fans] ext2srv understands only 7bit ASCII file names? slash
2011-10-13 11:37 ` dexen deVries
2011-10-13 13:20   ` erik quanstrom
2011-10-13 14:28     ` slash
2011-10-13 15:20       ` Russ Cox
2011-10-13 15:25         ` slash.9fans
2011-10-13 15:30           ` erik quanstrom
2011-10-16 13:19             ` slash
2011-10-13 13:22 ` Russ Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).