* Re: [9fans] A case of Mail locking up
[not found] <65B38D2B147CDBB9D2C2C8896965B064@0x80.stream>
@ 2025-06-21 15:37 ` ori
0 siblings, 0 replies; 3+ messages in thread
From: ori @ 2025-06-21 15:37 UTC (permalink / raw)
To: 9fans
First off, this is specific to 9front mail, so you may want to try
the 9front list.
Second, I'm a bit confused; all of our plumb file descriptors
are opened OREAD (except for plumbsendfd, which was unused).
We dont' write to any ports.
The plumb ports we use:
seemail: a message has come in or changed state
showmail: we want to open a message
sendmail: we want to compose a message
send: (unused, we don't send plumb messages)
As far as I can tell, we don't send any messsages from
the paths called from mbflush (the function that put
invokes).
The most useful thing you could do would be to capture
stacks from all of the Mail procs that are hanging,
and send them; see lstk(1) or acid.
Quoth Nicola Girardi via 9fans <9fans@9fans.net>:
> Hi all,
>
> I have a curious problem with Mail hanging when attempting to write to
> the seemail port (that's according to ratrace) after marking a message
> deleted and middle-clicking Put.
>
> AFAICS, on the one hand, the Put causes a receive in mbmain (on the
> Cevent channel), which causes a write to plumbsend, which should cause
> a read on the seemail proc, which would try to send a message to the
> Cseemail channel, but that can't be received as mbmain is busy
> already; so I'd be tempted to think this may be a deadlock scenario.
> On the other hand, though, this can't be the explanation as such a
> deadlock would've been reported by anyone ever deleting a message… so
> I'm more inclined to think I'm doing something wrong. Which brings me
> to:
>
> My setup:
>
> - 9front (not the latest, but I've seen this behavior for ages, but
> didn't have the knowledge to even start troubleshooting then)
> - upas/fs launched in lib/profile, after plumber, before rio
> - acme started from riostart
>
> Just in case it's relevant, though I wouldn't think so:
> - booted as QEMU amd64 instance
> - used via drawterm
> - a bespoke unusual 9P fs as the root
>
> Lastly, and what makes me most curious, is that if I change my set up
> to this:
>
> - add -s to the upas/fs command in lib/profile
> - use Local mount /srv/upasfs.ng /mail/fs in acme
> - use Mail in acme
>
> such lock-up does not happen.
>
> I'm grateful if anyone can help dissipate my confusion. :-)
>
------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T92af154d081c9c25-Mb2be9c89a9810300e7a2420e
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [9fans] A case of Mail locking up
2025-06-21 22:42 ` ori
@ 2025-06-22 8:09 ` Nicola Girardi via 9fans
0 siblings, 0 replies; 3+ messages in thread
From: Nicola Girardi via 9fans @ 2025-06-22 8:09 UTC (permalink / raw)
To: 9fans
Quoth ori@eigenstate.org:
> Interesting. What's the plumber up to?
Bingo! The plumber was checking whether /mail/fs/.git was a
directory. This was due to a plumbing rule I use to visualize commits,
which, at the time of the test, read:
type is text
arg isdir $wdir/.git
data matches '^[0-9a-f]+$'
plumb start window -dx 1024 -dy 1280 rc -c '''cd '$wdir' ; git/export '$0' | vdiff'''
I swapped the isdir and data tests so this rule won't match seemail
messages, and the lock up doesn't happen anymore. And it explains why
mounting /srv/upasfs.ng in acme's namespace entailed different
behavior. In that case, /mail/fs wasn't in the plumber namespace so
the isdir test would not interact with upas/fs. Glad to understand a
bit more what's happening.
This debugging session also highlighted a hidden assumption that I
had, that if a plumbing message had a destination port, it would go to
whoever has that port open for reading, and no rules would be matched.
I see now in plumb(6) that sending the message to the port is the
fallback if no rule matches, quite the opposite of what I'd assumed!
The behavior I was expecting would be ensured by changing the basic
plumbing to the below (just tested). This seems more sensible to me.
dst is 'seemail'
plumb to seemail
There were other hidden assumptions in my original interpretation of
the syscall and stack traces. (a) I thought Mail would not be able to
receive the seemail message while in doevent(), but I now see that was
wrong; it would receive the message just fine, in its dedicated proc.
Then it would be blocked on sending to the seemail channel until
doevent() returned, but that's fine in this upas+Mail scenario. (So,
I guess, a properly timed seemail message between the acme Put and
upas's seemail message would cause a similar lock up.) (b) Lastly, I'd
assumed the plumber to be just passing through the bytes, I had
completely abstracted it out!
Thanks for the nudges Ori, much appreciated.
--
Nico
------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T92af154d081c9c25-Mbc0fb0d2275b9701c1100550
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [9fans] A case of Mail locking up
[not found] <A3C02B3EDEE108AA9867CB114B863D2E@0x80.stream>
@ 2025-06-21 22:42 ` ori
2025-06-22 8:09 ` Nicola Girardi via 9fans
0 siblings, 1 reply; 3+ messages in thread
From: ori @ 2025-06-21 22:42 UTC (permalink / raw)
To: 9fans
Quoth Nicola Girardi via 9fans <9fans@9fans.net>:
> Quoth Nicola Girardi via 9fans <9fans@9fans.net>:
> > The process stuck writing to the seemail port is indeed upas/fs:
> >
> > 312 fs Pwrite 21a04c 5 0x44c160/"mailfs.seemail./mail/fs.text.filetype=mail.sender=commits@git.9f" 205 -1
> >
> > This makes sense, because the process that needs to receive the
> > message is Mail, which is itself stuck, here are the last few lines
> > from its trace:
> >
> > 403 Mail Open 21ac1c 0x40e000/"/mail/fs/ctl" 0x1 = 16 "" 173579477431 173584571672
> > [... omitted two successful writes to other fids ...]
> > 403 Mail Pwrite 20ace8 16 0x472c78/"delete.mbox.9" 13 -1
> >
> > which suggests that Mail is waiting for upas/fs to receive the message
> > written to the control file, so they're deadlocked; I'll test this hypothesis.
> >
> > For comparison, here's what the trace would look like when Mail does
> > not get stuck:
> >
> > 415 Mail Pwrite 20ace8 16 0x472c78/"delete.mbox.6" 13 -1 = 13 "" 163054398681 163274619092
> > 415 Mail Close 21ac3a 16 = 0 "" 163275175643 163275183465
> >
> > > The most useful thing you could do would be to capture
> > > stacks from all of the Mail procs that are hanging,
> > > and send them; see lstk(1) or acid.
> >
> > Okay, thanks the pointers Ori. I'll send this reply while I have a
> > working Mail+upas/fs pair! and will check stack traces later.
>
> AFAICS the stack traces support that theory. Mbmain() processes both
> seemail messages (from upas/fs) and events (from acme). Maybe those
> could be different procs?
>
> acid: lstk() # Mail
> pwrite(a0=0x10)+0xe /sys/src/libc/9syscall/pwrite.s:6
> write(buf=0x472c78,n=0x200000000d)+0x27 /sys/src/libc/9sys/write.c:7
> _fmtFdFlush(f=0x472d78)+0x3a /sys/src/libc/fmt/vfprint.c:15
> vfprint(args=0x472e20,fmt=0x405d7a)+0x6f /sys/src/libc/fmt/vfprint.c:31
> fprint(fmt=0x405d7a)+0x22 /sys/src/libc/fmt/fprint.c:13
> mbflush()+0x15a /sys/src/cmd/upas/Mail/mbox.c:733
> doevent(ev=0x40d460)+0x11f /sys/src/cmd/upas/Mail/mbox.c:990
> mbmain(cmd=0x40ee60)+0x1fc /sys/src/cmd/upas/Mail/mbox.c:1039
> launcheramd64(arg=0x40ee60,f=0x20270b)+0x10 /sys/src/libthread/amd64.c:11
> 0xfefefefefefefefe ?file?:0
>
> And he's upas/fs sending to the seemail port in the proc that
> processes the commands:
>
> acid: lstk() # upas/fs
> pwrite(a0=0x5)+0xe /sys/src/libc/9syscall/pwrite.s:6
> write(buf=0x44a7a0,n=0x7fff000000c4)+0x27 /sys/src/libc/9sys/write.c:7
> myplumbsend(fd=0x5,m=0x7fffffffc7b0)+0x47 /sys/src/cmd/upas/fs/mbox.c:1573
> mailplumb(m=0x455370,mb=0x4387f0)+0x3d4 /sys/src/cmd/upas/fs/mbox.c:1650
> syncmbox(mb=0x4387f0,doplumb=0x1)+0x177 /sys/src/cmd/upas/fs/mbox.c:105
> delmessages(ac=0x2,av=0x7fffffffcb90)+0x115 /sys/src/cmd/upas/fs/mbox.c:1107
> rwrite(f=0x1ad)+0x285 /sys/src/cmd/upas/fs/fs.c:1238
> io()+0x1e9 /sys/src/cmd/upas/fs/fs.c:1436
> main(argc=0x0,argv=0x7fffffffef70)+0x31a /sys/src/cmd/upas/fs/fs.c:353
> _callmain+0x38 /sys/src/libc/9sys/callmain.c:21
>
Interesting. What's the plumber up to?
------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T92af154d081c9c25-M7ac62968df77959af15c7bcc
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-06-22 12:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <65B38D2B147CDBB9D2C2C8896965B064@0x80.stream>
2025-06-21 15:37 ` [9fans] A case of Mail locking up ori
[not found] <A3C02B3EDEE108AA9867CB114B863D2E@0x80.stream>
2025-06-21 22:42 ` ori
2025-06-22 8:09 ` Nicola Girardi via 9fans
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).