I wrote something up: https://orib.dev/9hack1.html if I forgot anything, feel free to complain.
> This allows not only better input methods, but also things like keyboard shortcut handlers, as-you-type spellchecking, and other neat toys, demos, and tools.
Volume keys?
> As expected, there was pontificating about the next version of 9p,
> and how to make it handle latency better. Ideas were floated and
> rejected, and the difficulties were explored. While no solid
> conclusions were reached, there are a few things we know are
> essential to the making the next version better.
Every time I see this discussed, I never see 9P.ix (code is in branch
NxM of harvey-os) mentioned (either as something to adopt or with
a reason why not to adopt). Just saying...
> But this wasn't the only set of ideas that was discussed.
> Lots of thought was put into the directions to go with
> VMX and Unix integration,
Throwing an idea out because I've never seen anyone
else say it: the best possible linuxemu is Linux itself,
running as User-Mode Linux, which ought to be possible
with Linux syscall emulation (again, the code is in NIX),
except not really because there's no ptrace and UML is
centred around that.
Networking for UML is PPP, which 9front already provides,
and filesystem is just a directory (like linuxemu, and devfs
in 9vx), so actually running the binary is the only hurdle (I
would think).
please elaborate. this is your chance.
sl
> On Aug 18, 2022, at 6:53 PM, Stuart Morrow <morrow.stuart@gmail.com> wrote:
>
>
>>
>> As expected, there was pontificating about the next version of 9p,
>> and how to make it handle latency better. Ideas were floated and
>> rejected, and the difficulties were explored. While no solid
>> conclusions were reached, there are a few things we know are
>> essential to the making the next version better.
>
> Every time I see this discussed, I never see 9P.ix (code is in branch
> NxM of harvey-os) mentioned (either as something to adopt or with
> a reason why not to adopt). Just saying...
>
On Thu, Aug 18, 2022 at 11:52:16PM +0100, Stuart Morrow wrote:
> Every time I see this discussed, I never see 9P.ix (code is in branch
> NxM of harvey-os) mentioned (either as something to adopt or with
> a reason why not to adopt). Just saying...
Where in the tree is it? Their commit history is a disaster area and
there doesn't seem to be any explicit reference to it, nor documentation
regarding it.
khm
Quoth Stuart Morrow <morrow.stuart@gmail.com>:
> Throwing an idea out because I've never seen anyone
> else say it: the best possible linuxemu is Linux itself,
> running as User-Mode Linux, which ought to be possible
> with Linux syscall emulation (again, the code is in NIX),
> except not really because there's no ptrace and UML is
> centred around that.
Fuck that noise. It's easier to do hardware emulation than
linux syscall emulation. And it's more generally useful.
On 19/08/2022, Stanley Lieber <sl@stanleylieber.com> wrote:
> please elaborate. this is your chance.
Chance for what? To make the case for it? I don't even know if it's
good, why would I make the case for it? All I'm saying is it should
be considered rather than it should be adopted. The code already
exists, it's written by smart people; I would think it deserves a chance.
On 19/08/2022, Kurt H Maier <khm@sciops.net> wrote:
> Where in the tree is it? Their commit history is a disaster area and
> there doesn't seem to be any explicit reference to it, nor documentation
> regarding it.
Errrrr, devmnt?
On 19/08/2022, ori@eigenstate.org <ori@eigenstate.org> wrote:
> Fuck that noise. It's easier to do hardware emulation than
> linux syscall emulation. And it's more generally useful.
ok.
But you'd only need the syscalls that UML actually uses.
And VMX doesn't run on my X301 even though virtualisation
exists on Core 2 Duo (and is/was enabled in the BIOS).
And people run 9front on hardware that doesn't have virtualisation
at all.
Plan 9 as Lguest host?
Quoth Stuart Morrow <morrow.stuart@gmail.com>:
> On 19/08/2022, ori@eigenstate.org <ori@eigenstate.org> wrote:
> > Fuck that noise. It's easier to do hardware emulation than
> > linux syscall emulation. And it's more generally useful.
>
> ok.
>
> But you'd only need the syscalls that UML actually uses.
>
> And VMX doesn't run on my X301 even though virtualisation
> exists on Core 2 Duo (and is/was enabled in the BIOS).
>
> And people run 9front on hardware that doesn't have virtualisation
> at all.
>
> Plan 9 as Lguest host?
you're welcome to work on it if you think it's a good
approach; I'm fairly certain it's a waste of time, but
feel free to prove me wrong.
On 8/18/22 17:33, Stuart Morrow wrote: > On 19/08/2022, Kurt H Maier <khm@sciops.net> wrote: >> Where in the tree is it? Their commit history is a disaster area and >> there doesn't seem to be any explicit reference to it, nor documentation >> regarding it. > > Errrrr, devmnt? https://github.com/fjballest/nixMarkIV/blob/master/port/devmnt.c I think you're talking about this? Based on the context you seem to be talking about it in.
On Fri, Aug 19, 2022 at 12:37:04AM +0100, Stuart Morrow wrote: > And VMX doesn't run on my X301 even though virtualisation > exists on Core 2 Duo (and is/was enabled in the BIOS). VMX requires a Westmere or newer Intel processor. The earliest generation of Thinkpad X series laptop to support this is the X230, a machine otherwise roundly inferior to the X301. Core 2 Duo chips support the VT-d instructions but not EPT or unrestricted guests. > And people run 9front on hardware that doesn't have virtualisation > at all. With the possible exception of the Ryzen line, hardware too old to support virtualization is unlikely to run linuxemu at an acceptable speed. As for User-Mode Linux, it's single-cpu and doesn't support anything but x86-based processors either. (Sure, it works on Itanium and PowerPC, but 9front doesn't.) The syscalls problem isn't that we don't know which ones to implement, it's that Linux creates and deprecates them at a rate faster than anyone can keep up with. I am positive this is the case worldwide, not just for us -- otherwise, things like Docker on OS X would not virtualize an entire OS just to make use of Linux containers. The engineering cost to virtualize a platform and boot Linux on it is lower than the engineering cost to keep up with whatever the hell the Linux horde throws at the kernel this week. khm
Quoth Jacob Moody <moody@mail.posixcafe.org>: > On 8/18/22 17:33, Stuart Morrow wrote: > > On 19/08/2022, Kurt H Maier <khm@sciops.net> wrote: > >> Where in the tree is it? Their commit history is a disaster area and > >> there doesn't seem to be any explicit reference to it, nor documentation > >> regarding it. > > > > Errrrr, devmnt? > > https://github.com/fjballest/nixMarkIV/blob/master/port/devmnt.c > > I think you're talking about this? Based on the context you seem > to be talking about it in. see also: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.466.4876&rep=rep1&type=pdf
On Fri Aug 19 02:04:48 +0200 2022, khm@sciops.net wrote:
> On Fri, Aug 19, 2022 at 12:37:04AM +0100, Stuart Morrow wrote:
> > And VMX doesn't run on my X301 even though virtualisation
> > exists on Core 2 Duo (and is/was enabled in the BIOS).
>
> VMX requires a Westmere or newer Intel processor. The earliest
> generation of Thinkpad X series laptop to support this is the X230, a
> machine otherwise roundly inferior to the X301. Core 2 Duo chips
> support the VT-d instructions but not EPT or unrestricted guests.
Are you sure about the x230? I've run vmx on x220 and w520
(sandybridge) and I see references to westmere for t*10 and x201. Am
I confusing things?
Thanks,
qwx
On Fri, Aug 19, 2022 at 02:23:53AM +0200, qwx@sciops.net wrote:
> On Fri Aug 19 02:04:48 +0200 2022, khm@sciops.net wrote:
> > On Fri, Aug 19, 2022 at 12:37:04AM +0100, Stuart Morrow wrote:
> > > And VMX doesn't run on my X301 even though virtualisation
> > > exists on Core 2 Duo (and is/was enabled in the BIOS).
> >
> > VMX requires a Westmere or newer Intel processor. The earliest
> > generation of Thinkpad X series laptop to support this is the X230, a
> > machine otherwise roundly inferior to the X301. Core 2 Duo chips
> > support the VT-d instructions but not EPT or unrestricted guests.
>
> Are you sure about the x230? I've run vmx on x220 and w520
> (sandybridge) and I see references to westmere for t*10 and x201. Am
> I confusing things?
>
> Thanks,
> qwx
The X230 line in the sand is one we hashed out in IRC back when there
was much confusion regarding which SKUs Intel blessed with which
features at which times. There might be the odd *20-series or older
machine that managed to get a 'good' processor, but the X230 is the
first generation that *for sure* has the features. The further back in
time you go, the more of a crap shoot it is. I tend to just claim X230
as the earliest as it greatly simplifies the issue.
All of this stuff is moot on a Core 2 Duo system, as there were no
products based on that architecture to support all the features needed
by vmx(1).
khm
> The X230 line in the sand is one we hashed out in IRC back when there
> was much confusion regarding which SKUs Intel blessed with which
> features at which times. There might be the odd *20-series or older
> machine that managed to get a 'good' processor, but the X230 is the
> first generation that *for sure* has the features. The further back in
> time you go, the more of a crap shoot it is. I tend to just claim X230
> as the earliest as it greatly simplifies the issue.
I see, that's what confused me. Thanks!
Cheers,
qwx
having worked on cinaps linux emu about 10 years ago, trying to catch up with linux I think you are absolutely correct here. sad but true.
> The engineering cost to
> virtualize a platform and boot Linux on it is lower than the engineering
> cost to keep up with whatever the hell the Linux horde throws at the
> kernel this week.
>
> khm
On 19/08/2022, Jacob Moody <moody@mail.posixcafe.org> wrote: > https://github.com/fjballest/nixMarkIV/blob/master/port/devmnt.c > > I think you're talking about this? Based on the context you seem > to be talking about it in. Yeah, I guess so? I said NxM rather than NIX Mark IV because I recall seeing Creepy[1] and the 9P.ix Fossil[2] in there. But it seems there's no sign of support for it in the kernel (/sys/src/nxm). Weird. [1] I was right: https://github.com/rminnich/NxM/tree/master/sys/src/cmd/creepy [2] I was not right: https://github.com/rminnich/NxM/blob/master/sys/src/cmd/fossil/9p.c#L1133
On 8/19/22 05:34, Stuart Morrow wrote:
> On 19/08/2022, Jacob Moody <moody@mail.posixcafe.org> wrote:
>> https://github.com/fjballest/nixMarkIV/blob/master/port/devmnt.c
>>
>> I think you're talking about this? Based on the context you seem
>> to be talking about it in.
>
> Yeah, I guess so? I said NxM rather than NIX Mark IV because I
> recall seeing Creepy[1] and the 9P.ix Fossil[2] in there. But it seems
> there's no sign of support for it in the kernel (/sys/src/nxm). Weird.
>
> [1] I was right: https://github.com/rminnich/NxM/tree/master/sys/src/cmd/creepy
> [2] I was not right:
> https://github.com/rminnich/NxM/blob/master/sys/src/cmd/fossil/9p.c#L1133
Ah thanks. I took some time reading through the paper ori had posted.
There was quite a number of interesting ideas there, I appreciate you
mentioning this. I wanted to talk through some of these improvements,
both so my understanding could be corrected and for others who would
wish to discuss them.
Firstly it seems namec in the kernel was adapted to provide
batches of requests in a single burst. At the 9p layer this
was manifested in the form of sending multiple dependent
requests to the server all under the same tag. For example
a wstat involves the following requests:
walk from the rootfid to the file, getting a new fid
wstat that new fid with the user provided data
clunk the fid now that it is no longer being used.
These three requests would all be sent together to the server,
under the same tag to be labeled as dependent. In order to
facilitate this, devmnt was changed to allow each process to
have a number of RPC's in flight at once. The real change at
the protocol level was this batching, that and a move of
opening with truncation to being just create instead of open.
The later of which is a seemingly general (but good) bugfix.
For the implementation detail, the new devmnt api was explained
in terms of promises. Each rpc request generating a "promise"
that can be blocked on later once the result is needed.
For the logistics of batching on the kernel side, this was
accomplished using their 'later' device. Essentially a lazy
evaluation of of a channel to a mount point. The later device
hands it's own Channel's back from walks, waiting for the real
use of the fid to be given by userspace. Once it is known
what the user is doing with the file (wstat, stat)
the walk, action, and (possibly) clunk are all sent together.
This later device helps this be transparent to the rest of the kernel.
The last thing discussed is the read ahead and write behind. Now that
the kernel can submit dependent concurrent 9p messages we can use them
for actual IO. The ability to read ahead on a mnt point was signaled
by the user through the MCACHE mount option, this allowed the kernel to
eagerly fill the cache with readahead calls. The cache maintained an idea
of where the last known offset of the file was to contain data, and will attempt
to service requests under that offset from within the cache. It interpreted a short
read to indicate EOF.
The write behind is similar, but defined by the userspace program with a magic open
flag OBEHIND. This allowed the kernel to submit write rpc requests but not wait for their responses before
returning back to userspace. This was accompanied by a new fdflush system call, for a userspace program to sync
and block on outstanding write rpcs.
Not specifically mentioned here (but details in the paper) on the rework that was done to the mount table and how
chan's keep their current path stored. In general it seemed like this was a simplification from their removal of the
client submitting ".." walks. But its a bit hard for me to fully conceptualize these changes without having dug
in to the code.
Like I said, a lot of ideas in here. So let's start chewing through them.
The concept of the later device, and the rpc batching in general mostly makes sense.
Specifically this 'later' device seems like a great mechanism for doing this. There is a bit of a shortcoming
with how the later device has to handle channels that are part of a union. If a union is crossed then we no longer
can defer these walks, they have to be evaluated now. Of course the underlying channel used for the mnt point itself
could refer to a channel given by this 'later' device, but the details are fuzzy here.
Of course the batching itself is reliant on the devmnt changes to permit multiple in flight rpc requests by a single
process, along with the servers ability to receive multiple dependent operations concurrently. These as a first step
I find hard to disagree with.
The read ahead also sounds nice, when we were discussing at the hackthon there was a desire for the kernel to know whether
a file was 'synthetic' ala /dev/kbd or if it was a 'disk' file, something exported from cwfs,hjfs,fossil etc. We had discussed
the server providing this within the QID type, the approach here seems to infer that an entire mnt point contains 'disk' files
through the MCACHE option. The issue I take with this is that the 9p server isn't giving this information, it is being inferred
from how the user mounted it. This seems to place the information in the wrong place, I think it is a bit strange for a user to know
if a file server is going to serve 'synthetic' files or not. I would much prefer the server giving this type of information itself.
For the write behind, I am not convinced this is the "write" direction. A magic open flag and magic system calls to sync seem too specific.
It seems it would be difficult for a program to know, given just a path, whether a write behind is appropriate or not. Not to mention that
this requires quite a bit of code changes to use fdflush. When ori and I were discussing this kind of 'deferred' system calls, our benchmark
for design was: "How would it look like to support this in cat", and the error handling here gets ugly. If you are using write behind you can't
toss what you wrote away, because the server could error and you need that data back to submit the next request. Cat would have to be changed to
keep the data it read() around until it knows for sure that data has been committed by the file server. To me this just seems like a complicated
kernel dance for doing what a Bio writer can accomplish purely in userspace. The key difference here is that the Bio method only works for sequential
writes, not random writes. But I don't think we're missing much there.
> the best possible linuxemu is Linux itself, running as User-Mode
> Linux, which ought to be possible with Linux syscall emulation
If one has Linux syscall emulation, then one already has the ability
to run Linux binaries. What would be the advantage of running UML
on top of that?
--
Aram Hăvărneanu
> The syscalls problem isn't that we don't know which ones to implement,
> it's that Linux creates and deprecates them at a rate faster than anyone
> can keep up with. I am positive this is the case worldwide, not just
> for us -- otherwise, things like Docker on OS X would not virtualize an
> entire OS just to make use of Linux containers. The engineering cost to
> virtualize a platform and boot Linux on it is lower than the engineering
> cost to keep up with whatever the hell the Linux horde throws at the
> kernel this week.
>
> khm
Indeed. Even Microsoft gave up emulating a Linux kernel in WSL and
instead virtualize a Linux kernel in Hyper-V in WSL2.
I see no point trying to hammer on that giant mess and instead prefer
the effort be put into improving vmx and adding AMD-V support. Then
run whatever dumpster fire you want in the VM.
On 19/08/2022, Aram Hăvărneanu <aram.h@mgk.ro> wrote:
> If one has Linux syscall emulation, then one already has the ability
> to run Linux binaries. What would be the advantage of running UML
> on top of that?
Linux kernel modules, Linux filesystems... oh yeah, and you would only
need the ability to run *one* Linux binary (UML itself), in particular one
that isn't that much a moving target, although the stuff it links against
probably would still be.
> On 20 Aug 2022, at 05:52, Thaddeus Woskowiak <tswoskowiak@gmail.com> wrote:
>
>> The syscalls problem isn't that we don't know which ones to implement,
>> it's that Linux creates and deprecates them at a rate faster than anyone
>> can keep up with. I am positive this is the case worldwide, not just
>> for us -- otherwise, things like Docker on OS X would not virtualize an
>> entire OS just to make use of Linux containers. The engineering cost to
>> virtualize a platform and boot Linux on it is lower than the engineering
>> cost to keep up with whatever the hell the Linux horde throws at the
>> kernel this week.
>>
>> khm
>
> Indeed. Even Microsoft gave up emulating a Linux kernel in WSL and
> instead virtualize a Linux kernel in Hyper-V in WSL2.
To be fair, that was largely due to the impedance mismatch between Linux’
Unix-style syscalls and the NT kernel model causing performance problems.
That said, operating systems appear to have adapted to emulated hardware
better than applications have adapted to emulated operating systems.
d
> On Aug 19, 2022, at 2:08 AM, Steve Simon <steve@quintile.net> wrote:
>
> having worked on cinaps linux emu about 10 years ago, trying to catch up with linux I think you are absolutely correct here. sad but true.
>
>> The engineering cost to
>> virtualize a platform and boot Linux on it is lower than the engineering
>> cost to keep up with whatever the hell the Linux horde throws at the
>> kernel this week.
>>
>> khm
Would bsdemu be an easier target? Open, Free, or Net?
Quoth Xiao-Yong Jin <meta.jxy@gmail.com>:
>
> > On Aug 19, 2022, at 2:08 AM, Steve Simon <steve@quintile.net> wrote:
> >
> > having worked on cinaps linux emu about 10 years ago, trying to catch up with linux I think you are absolutely correct here. sad but true.
> >
> >> The engineering cost to
> >> virtualize a platform and boot Linux on it is lower than the engineering
> >> cost to keep up with whatever the hell the Linux horde throws at the
> >> kernel this week.
> >>
> >> khm
>
> Would bsdemu be an easier target? Open, Free, or Net?
Try it and see -- though OpenBSD regularly changes syscalls,
and doesn't guarantee ABI compatibility at all between versions.
> oh yeah, and you would only
> need the ability to run *one* Linux binary (UML itself),
And you wouldn't be emulating sockets, in particular, to do so.
Show us the code. All this talk is a waste of time. -- Aram Hăvărneanu
On Mon, Aug 22, 2022 at 10:54:35AM -0700, Aram Hăvărneanu wrote:
> Show us the code.
>
> All this talk is a waste of time.
>
> --
> Aram Hăvărneanu
the code is in the repo and ori was directly asked to produce all this
talk
Quoth ori@eigenstate.org: > Quoth Jacob Moody <moody@mail.posixcafe.org>: > > On 8/18/22 17:33, Stuart Morrow wrote: > > > On 19/08/2022, Kurt H Maier <khm@sciops.net> wrote: > > >> Where in the tree is it? Their commit history is a disaster area and > > >> there doesn't seem to be any explicit reference to it, nor documentation > > >> regarding it. > > > > > > Errrrr, devmnt? > > > > https://github.com/fjballest/nixMarkIV/blob/master/port/devmnt.c > > > > I think you're talking about this? Based on the context you seem > > to be talking about it in. It's also in his nix.markII repo. Doing a diff of devmnt.c found in nix.markII and 9front's latest devmnt.c doesn't look too wild, but I have yet to look at the other affected files mentioned in Nemo's article. > > see also: > > https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.466.4876&rep=rep1&type=pdf https://lsub.org/books-papers/ also have some related texts as well (including the one ori mentioned). I just saw https://linus.schreibt.jetzt/posts/qemu-9p-performance.html on hackernews, which reminded me of this thread.