From mboxrd@z Thu Jan 1 00:00:00 1970 From: dexen deVries To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Date: Thu, 13 Oct 2011 13:37:43 +0200 User-Agent: KMail/1.13.6 (Linux/3.1.0-rc9-l40+; KDE/4.5.5; x86_64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201110131337.48830.dexen.devries@gmail.com> Subject: Re: [9fans] ext2srv understands only 7bit ASCII file names? Topicbox-Message-UUID: 3735b0e4-ead7-11e9-9d60-3106f5b1d025 On Thursday 13 of October 2011 13:15:57 slash wrote: > I have some files on an external ext2 drive that have whitespace and > umlauts (=C3=A4, =C3=B6) in them. trfs took care of the whitespace. But e= xt2srv > presents umlauts as a question mark symbol (=EF=BF=BD) and won't let me a= ccess > the file (error: file does not exist). i believe -- but i am not sure! -- that linux stores and reads names on=20 ext2/3/4 without any conversion between filesystem and I/O syscalls like=20 open(). if you have iso8859-1 or similar single-byte locale on linux, your= =20 ext2 contains iso8859-1 encoded filenames. to the contrary, for thos filesystems that always store file names in UTF-1= 6 or=20 similar (NTFS, FAT32 with LFN, Jolliet extension of ISO9660 etc.), there's= =20 `iocharset' mount option that converts between on-disk UTF-16 and I/O sysca= lls=20 like open(). normally you set it to match your locale settings. but for=20 ext2/3/4, anything goes literally, literally. you'd need to convert the pathnames, either one-time on disk or upon every = r/o=20 access (yuck!). it may be sensible to use only UTF8 locale on linux, like LANG=3Den_US.utf8= , but=20 that'll not update names stored in ext2/3/4 filesystem automagically. it's = just=20 about interpretation. again, that's what i believe, but i dunno how to verify that. any ideas? =2D-=20 dexen deVries [[[=E2=86=93][=E2=86=92]]] http://xkcd.com/732/