9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Jacob Moody <moody@posixcafe.org>
To: 9fans@9fans.net
Subject: Re: [9fans] syscall silently kill processes
Date: Fri, 17 Jun 2022 13:28:57 -0600	[thread overview]
Message-ID: <03831b77-a3a4-5be2-aabc-7b3c2997c4b1@posixcafe.org> (raw)
In-Reply-To: <ba26123ccd161fc31564fd1f8f275bbf74bd876b.camel@gmail.com>

On 6/17/22 12:48, andrey100100100@gmail.com wrote:
> В Пт, 17/06/2022 в 10:11 -0600, Jacob Moody пишет:
>> On 6/17/22 09:06, andrey100100100@gmail.com wrote:
>>> В Пт, 17/06/2022 в 08:11 -0600, Jacob Moody пишет:
>>>> On 6/17/22 07:46, Thaddeus Woskowiak wrote:
>>>>> I believe threadnotify() should be called from threadmain() to
>>>>> properly register the handler in the rendez group
>>>>
>>>> This is incorrect, according to thread(2):
>>>>
>>>> "The thread library depends on all procs
>>>> being in the same rendezvous group"
>>>
>>>
>>> From sleep(2):
>>>
>>>     Alarm causes an alarm note (see notify(2)) to be sent to the
>>>     invoking process after the number of milliseconds given by
>>>     the argument.
>>>
>>> Mean to be sent only to the invoking process, NOT to the process
>>> group.
>>
>> Yes this is correct, If I implied otherwise I apologize. My point
>> with
>> pointing out that excerpt is that groups likely had nothing to do
>> with this.
>>
>>>>
>>>> The issue here is that your note handler has to call noted,
>>>> you are returning from the handler without actually resuming the
>>>> program.
>>>> You either need to call noted(NCONT) to resume execution or
>>>> noted(NDFLT)
>>>> to stop execution.
>>>>
>>>> An excerpt from notify(2):
>>>>
>>>> "A notification handler must finish either by exiting the
>>>> program or by calling noted; if the handler returns the
>>>> behavior is undefined and probably erroneous."
>>>>
>>>> So you are indeed observing undefined behavior.
>>>>
>>>
>>> With:
>>>
>>> ------------------------------------
>>> static int
>>> handler_alarm(void *, char *msg)
>>> {
>>>         if(strstr(msg, "alarm")){
>>>                 noted(NCONT);
>>>                 return 1;
>>>         }
>>>
>>>         return 0;
>>> }
>>> ------------------------------------
>>> result the same:
>>>
>>> cpu% 6.out | grep end | wc -l
>>>      33
>>>
>>>
>>> And noted(NCONT) may be needed, when process recieved many (2 and
>>> more)
>>> notes at once.
>>>
>>> May be something wrong  with interrupted an incomplete  system
>>> call?
>>
>> You _always_ should call either noted(NCONT) or noted(NDFLT).
> 
> But from atnotify(2) (section 'Atnotify'):
> 
>                                                   When the system
>           posts a note to the process, each handler registered with
>           atnotify is called with arguments as described above until
>           one of the handlers returns non-zero.  Then noted is called
>           with argument NCONT.  If no registered function returns
>           non-zero, atnotify calls noted with argument NDFLT.
> 
> from /sys/src/libc/port/atnotify.c :
> 
> --------------------------------
> static
> void
> notifier(void *v, char *s)
> {
>         int i;
> 
>         for(i=0; i<NFN; i++)
>                 if(onnot[i] && ((*onnot[i])(v, s))){
>                         noted(NCONT);
>                         return;
>                 }
>         noted(NDFLT);
> }
> --------------------------------
> 
> Seems like noted() call not needed in user code.
> 

Oh look at that, my apologies. That's the whole difference
between just notify() and atnotify(), how did I miss that.

>> But you are correct in that this wasn't the exact issue. I poked
>> around with the code a bit. I rewrote it to just use
>> fork(), and I got all 80 "end" messages. 
> 
> Yes, with fork() is working:
> 
> ------------------------------------------------------------------
> #include <u.h>
> #include <libc.h>
> 
> static int
> handler_alarm(void *, char *msg)
> {
>         if(strstr(msg, "alarm"))
>                 return 1;
> 
>         return 0;
> }
> 
> static void
> proc_udp(void *)
> {
>         char resp[512];
>         char req[] = "request";
>         int fd;
> 
>         atnotify(handler_alarm, 1);
> 
>         if((fd = dial("udp!185.157.221.201!5678", nil, nil, nil)) >=
> 0){
>                 if(write(fd, req, strlen(req)) == strlen(req)){
>                         fprint(1, "start\n");
>                         alarm(2000);
>                         read(fd, resp, sizeof(resp));
>                         alarm(0);
>                         fprint(1, "end\n");
>                 }
>                 close(fd);
>         }
> 
> }
> 
> void
> main(int argc, char *argv[])
> {
>         for(int i = 0; i < 80; i++){
>                 switch(fork()){
>                 case -1:
>                         sysfatal("fork: %r");
>                 case 0:
>                         proc_udp(nil);
>                         exits(nil);
>                 }
>         }
> 
>         sleep(5000);
>         exits(nil);
> }
> ------------------------------------------------------------------
> 
> cpu% 6.out | grep end | wc -l
>      80
> 
> But with rfork(RFPROC|RFMEM|RFNOWAIT) (the same, how in proccreate)
> not:
> 
> cpu% 6.out | grep end | wc -l
>       6
> 
> strange...
> 
>> So I suspected
>> libthread had some arbitrary limit:
>>
>> #define NFN             33
>> #define ERRLEN  48
>> typedef struct Note Note;
>> struct Note
>> {
>>         Lock            inuse;
>>         Proc            *proc;          /* recipient */
>>         char            s[ERRMAX];      /* arg2 */
>> };
>>
>> static Note     notes[128];
>> static Note     *enotes = notes+nelem(notes);
>> static int              (*onnote[NFN])(void*, char*);
>> static int              onnotepid[NFN];
>> static Lock     onnotelock;
>>
>> int
>> threadnotify(int (*f)(void*, char*), int in)
>> {
>>         int i, topid;
>>         int (*from)(void*, char*), (*to)(void*, char*);
>>
>>         if(in){
>>                 from = nil;
>>                 to = f;
>>                 topid = _threadgetproc()->pid;
>>         }else{
>>                 from = f;
>>                 to = nil;
>>                 topid = 0;
>>         }
>>         lock(&onnotelock);
>>         for(i=0; i<NFN; i++)
>>                 if(onnote[i]==from){
>>                         onnote[i] = to;
>>                         onnotepid[i] = topid;
>>                         break;
>>                 }
>>         unlock(&onnotelock);
>>         return i<NFN;
>> }
>>
>> That
>>
>> #define NFN 33
>>
>> seems like the culprit. Looks like if you checked
>> the return value of threadnotify you would have seen
>> your notes handler was not registered.
> 
> Very impotant note about the return value of threadnotify.
> Thanks. My mistake.
> So it seemed to me that the processes silently fall.
> 
>>
>> Now as to why this limit is so low, I am not sure. Perhaps
>> it should be bumped up.
>>
> 
> Funny, in /sys/src/libc/port/atnotify.c:4 the same limit:
> 
> #define NFN     33
> 
> 
> But in case of using fork() this limit does not affect.
> Maybe becose RFMEM not set and and there can be 33 handlers per child.

This is exactly it, when you fork with RFMEM you share that global function
array. Thus all of your children are competing for one of those 33 slots.
Without it, it is exactly as you say, each child gets its own array.

Thanks for doing some digging in to the code in libc. I would have completely missed that.
Either way it makes sense to me that these limits should be bumped.

>>
>> Thanks,
>> moody
>>
>>
> 
> 
> I did not find other way to interrupt the stalled system call
> in a program, other than to send a signal to the process.
> 
> Maybe there is a better way...

You are correct in thinking alarm() is how you'd interrupt a system
call. That is to say, I am not aware of a better way of doing it right now.

But I think you're correct in wanting something a bit better.
I've never looked at code using alarm lovingly. Perhaps its
time to kick some ideas around for a better way of signaling
io timeouts other then alarm(). To me, it makes sense to tie
them to the fd itself rather then the process as a whole.
Perhaps through the /fd/*ctl interface?

> 
> Many thanks Jacob Moody and Skip Tavakkolian for showing me the light.

Thank you for using the system and taking some time to report and dig
in to bugs :D

I also wanted to tell you that 9front does have its own mailing
list for bugs and patches. If you are actively working with the
system you might find the discussions there useful.

http://lists.9front.org/


Thanks,
moody

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/Tfa6823048ad90a21-M5173390655895122a91157bf
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

  reply	other threads:[~2022-06-17 19:30 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-17  9:37 andrey100100100
2022-06-17 13:46 ` Thaddeus Woskowiak
2022-06-17 14:11   ` Jacob Moody
2022-06-17 14:39     ` Thaddeus Woskowiak
2022-06-17 15:06     ` andrey100100100
2022-06-17 16:08       ` Skip Tavakkolian
2022-06-17 16:11         ` Skip Tavakkolian
2022-06-17 16:16           ` Skip Tavakkolian
2022-06-17 17:42             ` adr
2022-06-17 16:11       ` Jacob Moody
2022-06-17 18:48         ` andrey100100100
2022-06-17 19:28           ` Jacob Moody [this message]
2022-06-17 21:15           ` adr
2022-06-18  6:40             ` andrey100100100
2022-06-18  8:37               ` adr
2022-06-18  9:22                 ` adr
2022-06-18 12:53                   ` Jacob Moody
2022-06-18 22:03                     ` andrey100100100
2022-06-19  5:54                     ` adr
2022-06-19  6:13                       ` Jacob Moody
2022-06-18 22:22                   ` andrey100100100
2022-06-18 16:57                 ` andrey100100100
2022-06-19  2:40                   ` adr
2022-06-19  5:01                     ` adr
2022-06-19  8:52                       ` andrey100100100
2022-06-19 10:32                         ` adr
2022-06-19 11:40                           ` andrey100100100
2022-06-19 12:01                             ` andrey100100100
2022-06-19 15:10                           ` andrey100100100
2022-06-19 16:41                             ` adr
2022-06-19 21:22                               ` andrey100100100
2022-06-19 21:26                                 ` andrey100100100
2022-06-20  4:41                                 ` adr
2022-06-20  5:39                                   ` andrey100100100
2022-06-20  5:59                                   ` adr
2022-06-20 15:56                                     ` andrey100100100
2022-06-20 22:29                                       ` Skip Tavakkolian
2022-06-21  7:07                                         ` andrey100100100
2022-06-21 11:26                                           ` adr
2022-06-21 13:03                                             ` andrey100100100
2022-06-21 13:22                                               ` adr
2022-06-28 15:28                                                 ` adr
2022-06-28 16:43                                                   ` ori
2022-06-28 18:19                                                   ` adr
2022-06-28 18:28                                                     ` adr
2022-06-28 19:09                                                   ` andrey100100100
2022-06-28 19:42                                                     ` adr
2022-06-29 13:14                                                       ` adr
2022-06-21 13:47                                             ` andrey100100100
2022-06-21  7:22                                         ` adr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03831b77-a3a4-5be2-aabc-7b3c2997c4b1@posixcafe.org \
    --to=moody@posixcafe.org \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).