Re: [9fans] thread confusion

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Re: [9fans] thread confusion
@ 2005-09-21 14:44 Fco. J. Ballesteros
  2005-09-21 15:05 ` Axel Belinfante
  0 siblings, 1 reply; 16+ messages in thread
From: Fco. J. Ballesteros @ 2005-09-21 14:44 UTC (permalink / raw)
  To: 9fans

AFAIK, you must call threadnotify() to install a handler for your note.
If you don't do that, your process is killed (which is what you are
seeing right now).

You should probably install a handler that says 'it's ok, got the note'.
Use threadnotify to do this. 
I understand that you are interested in the "side effect" of interrupting
the I/O call.

I't funny, anyway, because I had the same problem a few days ago;
I had to abort a connection to a `Broken-maybe' file server. I tried not
to use interrupts and I was nevertheless decided to

alarm(x)
read()
alarm(0)

the call. After letting Russ know, he (once more) suggested me not to
use interrupts and to read the Alef paper (which I had read before, btw).
However, after thinking it twice, I was able to avoid the interrupts.
(The process is kept there, it will sooner or later abort due to a broken
connection).

Thus, excuse me for suggesting this again ;-), have you tried not to use
interrupts? In your case, if "the other end" decides to give up, can't it
let you know so you could shutdown and restart in a clean way?

hth

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-21 14:44 [9fans] thread confusion Fco. J. Ballesteros
@ 2005-09-21 15:05 ` Axel Belinfante
  2005-09-21 15:42   ` Russ Cox
  0 siblings, 1 reply; 16+ messages in thread
From: Axel Belinfante @ 2005-09-21 15:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Thus, excuse me for suggesting this again ;-), have you tried not to use
> interrupts? In your case, if "the other end" decides to give up, can't it
> let you know so you could shutdown and restart in a clean way?

I have tried not to use interrupts.
I know when the other end wants to restart.

My essential problem seems to be to 'shutdown' this pending library call -
it will not time out by itself -- if indeed it does not react
to me closing the pipe, it will stay there forever.
But maybe I did something wrong there.

I'll think things through again (as I wrote: back to the drawing board).

Thanks for your reaction.

Axel.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-21 15:05 ` Axel Belinfante
@ 2005-09-21 15:42   ` Russ Cox
  2005-09-21 20:25     ` Axel Belinfante
  0 siblings, 1 reply; 16+ messages in thread
From: Russ Cox @ 2005-09-21 15:42 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> My essential problem seems to be to 'shutdown' this pending library call -
> it will not time out by itself -- if indeed it does not react
> to me closing the pipe, it will stay there forever.
> But maybe I did something wrong there.

You have to install a note handler, as Nemo said.
But leave the proc alone and figure out the pipe close
bug instead.  The most likely problem is that you haven't
actually closed the other side of the pipe completely.
For example, maybe you have forked a child who inherited
a copy of that fd, and that child is holding the pipe up.

Programming with notes or signals is asking for trouble.
Always.  I'm sorry that threadint/threadkill are in the
library.

Russ

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-21 15:42   ` Russ Cox
@ 2005-09-21 20:25     ` Axel Belinfante
  2005-09-21 20:32       ` Axel Belinfante
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Axel Belinfante @ 2005-09-21 20:25 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> But leave the proc alone and figure out the pipe close
> bug instead.  The most likely problem is that you haven't
> actually closed the other side of the pipe completely.
> For example, maybe you have forked a child who inherited
> a copy of that fd, and that child is holding the pipe up.

the problem seems to be that at both ends of the pipe
a read is 'hanging'.
(one from tlsClient, and one from my own pipe reader proc)
If I in this situation close (both ends of) the pipe,
it doesn't work.

I have tried to mimic the situation in a small
test program, and there I found:

if I first close one randomly chosen end of the pipe,
and then do a zero-length write at the other end and
then close that end, it works.
It also works if I first close one end of the pipe,
and then do the zero-length write and close at the
other end.

is this a correct procedure, or would another be preferred?

> Programming with notes or signals is asking for trouble.

I do use a timer, by having a proc that repeatedly sleeps
and decrements a counter, and when the counter reaches
zero it sends a (nil) timeout message on a channel.
in alts I not only wait for the io channels but
also for the timer timeout channel.

the question is how to start and reset the timer.
I have been thinking about using channels for that
too, but that seems deadlock prone: how to avoid
the case where I want to send a reset message to
the timer when the timer wants to send an expiration
message to me?
could I work around that by having a timer thread
in the same process with the main thread(s) that use it,
and a separate clock proc (process) that does regular
sleeps to generate regular tick messages?
the timer thread forever does an alt to
 - receive timer start message (contains timer and timeout value)
 - receive timer reset message (contains timer)
 - receive tick message, triggers decrement of counters
   and sending of timeout message if counter reaches zero
the clock proc forever sleeps and sends tick messages
to the timer thread.

Axel.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-21 20:25     ` Axel Belinfante
@ 2005-09-21 20:32       ` Axel Belinfante
  2005-09-21 20:37       ` Russ Cox
  2005-09-26 18:40       ` rog
  2 siblings, 0 replies; 16+ messages in thread
From: Axel Belinfante @ 2005-09-21 20:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

To avoid repeating myself too much:

if I first close one randomly chosen end of the pipe,
and then do a zero-length write at the other end and
then close that end, it works.
It also works if I first do the zero-length write and close
at one end and then do the close at the other end

is this a correct procedure, or would another be preferred?

Axel.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-21 20:25     ` Axel Belinfante
  2005-09-21 20:32       ` Axel Belinfante
@ 2005-09-21 20:37       ` Russ Cox
  2005-09-21 22:34         ` Axel Belinfante
  2005-09-26 18:40       ` rog
  2 siblings, 1 reply; 16+ messages in thread
From: Russ Cox @ 2005-09-21 20:37 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> if I first close one randomly chosen end of the pipe,
> and then do a zero-length write at the other end and
> then close that end, it works.

By it works I assume you mean you get a zero-length
read out the other end.  But that's because you did a
zero-length write, not because the pipe is signaling EOF.

> It also works if I first close one end of the pipe,
> and then do the zero-length write and close at the
> other end.

Pipes are symmetric so this is good.

> is this a correct procedure, or would another be preferred?

What doesn't work?  Can you post a small test program
that doesn't make a blocked read fail when the other
end of the pipe is closed?  Again, it sounds like you're
not closing all the references to one end of the pipe.
If multiple programs have references to a pipe end,
they *all* need to close them.  Make sure that the
proc running tlsClient doesn't have a reference too.

> the question is how to start and reset the timer.

It depends how granular this timer is.  If we're talking
about something large like seconds, then it is reasonable
to have the timer proc just poll the channel with nbrecvp
for new work or cancellations after it ticks off each second.

Your alternate approach, with a tick stream, is also reasonable.

No matter which you use, the return channels that the timer
proc writes to should be buffered so that the timer proc never
blocks writing to them.

Russ

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-21 20:37       ` Russ Cox
@ 2005-09-21 22:34         ` Axel Belinfante
  2005-09-21 22:44           ` Russ Cox
  0 siblings, 1 reply; 16+ messages in thread
From: Axel Belinfante @ 2005-09-21 22:34 UTC (permalink / raw)
  To: Russ Cox, Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1085 bytes --]

> Again, it sounds like you're
> not closing all the references to one end of the pipe.
> If multiple programs have references to a pipe end,
> they *all* need to close them.  Make sure that the
> proc running tlsClient doesn't have a reference too.

Just checking: I assume by 'references' you mean
file descriptors in the process' fd table?

ehm... I just realized:
I create the procs with proccreate,
so all procs share the same references,
and when I close them in the main proc,
they go away in the other ones as well,
or so it seems, at least according to
cat /proc/*/fd.
nevertheless, without the zero-write
before one of the closes they just keep
hanging in the read, even when their fd
table no longer shows the pipe file
descriptors in any of the fd tables.

If I don't share the fd table,
and just give each of the sub processes
their own end of the pipe, I'll never be
able to close those from within those
processes, because they are both locked
up in a read (essentially waiting for each other).

I have attached a silly program.

Axel.


[-- Attachment #2: Type: text/plain, Size: 3820 bytes --]

#include <u.h>
#include <libc.h>
#include <thread.h>
enum {
	STACK = 16*2048,
};

typedef struct State {
	int id;
	int fd, tobeclosed;
	Channel *c;
} State;

static void
subproc(void *arg)
{
	State *m;
	char buf[1024];
	int n;

	m = arg;
	print("subproc tid=%d tobeclosed=%d\n", threadid(), m->tobeclosed);
	sleep(15000);
//	print(" subproc tid=%d pid=%d\n", m->id, threadpid(m->id));
//	close(m->tobeclosed);
//	sleep(15000);

	print("subproc writing fd=%d n=%d\n", m->fd, n);
	n = write(m->fd, buf, 5);
	print("subproc written fd=%d n=%d\n", m->fd, n);
	while((n = read(m->fd, buf, sizeof(buf))) > 0)
		print("subproc fd=%d] read n=%d\n", m->fd, n);
	print("subproc eof fd=%d n=%d\n", m->fd, n);
	sleep(15000);
	sendul(m->c, 0);
	print("exiting subproc tid=%d pid=%d\n", m->id, threadpid(m->id));
	threadexits(nil);
}

static void
mainproc(void *arg)
{
	State *m;
	char buf[1024];
	int n;

	m = arg;
	print("mainproc tid=%d tobeclosed=%d\n", threadid(), m->tobeclosed);
	sleep(15000);
//	print(" mainproc tid=%d pid=%d\n", m->id, threadpid(m->id));
//	close(m->tobeclosed);
//	sleep(15000);

	print("mainproc writing fd=%d n=%d\n", m->fd, n);
	n = write(m->fd, buf, 7);
	print("mainproc written fd=%d n=%d\n", m->fd, n);

	while((n = read(m->fd, buf, sizeof(buf))) > 0)
		print("mainproc fd=%d n=%d\n", m->fd, n);
	print("mainproc  fd=%d eof n=%d\n", m->fd, n);
	sleep(15000);
	sendul(m->c, 0);
	print("exiting mainproc tid=%d pid=%d\n", m->id, threadpid(m->id));
	threadexits(nil);
}

void
threadmain(int argc, char *argv[])
{
	State mainState, *m;
	State subState, *s;
	char buf[256];
	int i, j, k, l, n, N, maineof, subeof, ret, hang;
	int p[2];
	Alt a[] = {
	/*	 c			v		op   */
		{nil ,	nil,	CHANRCV},
		{nil,	nil,	CHANRCV},
		{nil,			nil,	CHANEND},
	};

	hang = 0;		//change to 1 to hang
	N = 1;

	m = &mainState;
	s = &subState;
	print("threadmain tid=%d pid=%d\n", threadid(), threadpid(threadid()));
	memset(m, 0, sizeof(State));
	m->c = chancreate(sizeof(int), 0);
	a[0].c = m->c;
	s->c = chancreate(sizeof(int), 0);
	a[1].c = s->c;

	for (l=0; l < N; l++) {
		print("for %d\n", l);
		if (pipe(p) < 0) {
			fprint(2, "pipe failed: %r\n");
			threadexitsall("pipe failed");
		}
		m->fd = p[0];
		m->tobeclosed = p[1];
		s->fd = p[1];
		s->tobeclosed = p[0];
	
	
//		m->id = procrfork(mainproc, m, STACK, RFFDG);
		m->id = proccreate(mainproc, m, STACK);
		print("started mainproc tid=%d pid=%d\n", m->id, threadpid(m->id));

//		s->id = procrfork(subproc, s, STACK, RFFDG);
		s->id = proccreate(subproc, s, STACK);
		print("started subproc tid=%d pid=%d\n", m->id, threadpid(m->id));

		sleep(60000);
		print("manthread after sleep\n");

		i = 0;
		j = 1;
		print("threadmain closing %d\n", p[j]);
		close(p[j]);
		print("threadmain closed %d\n", p[j]);

		if (!hang) {
			print("threadmain writing zero to %d\n", p[i]);
			n = write(p[i], buf, 0);
			print("threadmain write %d returns %d\n", p[i], n);
			sleep(15000);
		}

		print("threadmain closing %d\n", p[i]);
		close(p[i]);
		print("threadmain closed %d\n", p[i]);
		sleep(15000);

		maineof = 0;
		subeof = 0;
		while(!maineof || !subeof) {
			print("threadmain while alt maineof=%d subeof=%d\n", maineof, subeof);
			switch(ret = alt(a)){
			case 0:
				print("main eof\n");
				maineof = 1;
				break;
			case 1:
				print("sub eof\n");
				subeof = 1;
				break;
			default:
				print("should not happen ret=%d\n", ret);
				sysfatal("should not happen");
			}
		}
		print("threadmain while alt done\n");
	
	//	print("threadmain threadint mainid\n");
	//	threadint(m->mainid);
	//	print("threadmain done threadint mainid\n");

	}
	print("threadmain exiting\n");

	threadexits(nil);
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-21 22:34         ` Axel Belinfante
@ 2005-09-21 22:44           ` Russ Cox
  0 siblings, 0 replies; 16+ messages in thread
From: Russ Cox @ 2005-09-21 22:44 UTC (permalink / raw)
  To: Axel Belinfante; +Cc: Fans of the OS Plan 9 from Bell Labs

It appears that your program, at its core, it is doing this:

void
readproc(void *v)
{
    int fd;
    char buf[100];

    fd = (int)v;
    read(fd, buf, sizeof buf);
}

void
threadmain(int argc, char **argv)
{
    int p[2];

    pipe(p);
    proccreate(readproc, (void*)p[0], 8192);
    proccreate(readproc, (void*)p[1], 8192);
    close(p[0]);
    /* and here you expect the first readproc to be done */
    close(p[1]);
    /* and here the second */
}

Each read call is holding up a reference to its channel
inside the kernel, so that even though you've closed the fd
and removed the ref from the fd table, there is still a reference
to each side of the pipe in the form of the process blocked
on the read.

I've never been sure whether the implicit ref held during
the system call is good behavior, but it's hard to change.

In your case, writing 0 (or anything) makes the read
finish, releasing the last ref to the underlying pipe when
the system call finishes, and then everything cleans up
as expected.  So you've found your workaround, and now
we understand why it works.

Russ

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-21 20:25     ` Axel Belinfante
  2005-09-21 20:32       ` Axel Belinfante
  2005-09-21 20:37       ` Russ Cox
@ 2005-09-26 18:40       ` rog
  2005-09-26 18:52         ` Russ Cox
  2005-09-27 10:12         ` Axel Belinfante
  2 siblings, 2 replies; 16+ messages in thread
From: rog @ 2005-09-26 18:40 UTC (permalink / raw)
  To: 9fans

> I do use a timer, by having a proc that repeatedly sleeps
> and decrements a counter, and when the counter reaches
> zero it sends a (nil) timeout message on a channel.
> in alts I not only wait for the io channels but
> also for the timer timeout channel.
> 
> the question is how to start and reset the timer.
> I have been thinking about using channels for that
> too, but that seems deadlock prone: how to avoid
> the case where I want to send a reset message to
> the timer when the timer wants to send an expiration
> message to me?

i think this is a good question.

i've found writing time-based code using the threads library to be
quite awkward.  it seems to me that there may be room for an extension
to help with writing this kind of code.

the difficulty with the plan 9 thread library (and with Limbo too) is
that sleep(2) exists in a different universe to channels, so one has
to use a separate process to bridge the gap.

but when this thread is sleeping, it's not possible to communicate
with it, so one needs another thread to act as an intermediary, and
one has to design the interface carefully - if possible one doesn't
want a separate process and thread for each thread that wishes to wait
for a little while.

this implies multiplexing access to the sleeping process between many
other threads, which, depending on the kind of access required
(one-shot?  repeating event?  no-faster-than?)  starts to make things
quite complex.

i haven't yet seen a nicely designed interface that starts to make
this kind of thing as easy as i think it could be.

Occam had "timer" variables which could be used like:

	TIMER time
	INT start
	SEQ
		time ? start
		ALT
			inputch ? val
				... no timeout, do somehing
			time ? AFTER start + 3 * Tickspersecond
				... time out after three seconds; do something else

this seems to me to be a nice solution - as long as this is
sufficiently lightweight, it's then easy to leverage this to build up
whatever other timer mechanisms one requires.

here are some attributes i'd like to see in a timing mechanism for the
thread library (however implemented):

	useful in many different scenarios.

	overhead and latency comparable with use of regular channels.

	capable of dealing with the full range interval requests (sub-millisecond upwards).

	does not soak up CPU time when unused.

	reasonable accuracy.

	easy, robust and non-error-prone to use.

is something like this possible?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-26 18:40       ` rog
@ 2005-09-26 18:52         ` Russ Cox
  2005-09-26 19:20           ` rog
  2005-09-27 10:12         ` Axel Belinfante
  1 sibling, 1 reply; 16+ messages in thread
From: Russ Cox @ 2005-09-26 18:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

What if we did without the timer variable?

  alt {
  x = <-c => print("%d\n", x);
  timeout 1 => print("1 second passed\n");
  }

You could model the timer variable easily enough:

  t := time()+10;
  for(;;){
    alt {
    x = <-c => print("%d\n", x);
    timeout t-time() => print("10 seconds passed\n"); break;
    }
  }

which do you think would be more common?  I think the former,
hence the dropping of explicit timer variables.

Once you've figured out what a good interface is, implementing
it is subtle and difficult to get right.  But it only needs to be done
right once and then everyone benefits.  Channel communication
is complicated too under the hood, but it's still a good abstraction.

Russ

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-26 18:52         ` Russ Cox
@ 2005-09-26 19:20           ` rog
  0 siblings, 0 replies; 16+ messages in thread
From: rog @ 2005-09-26 19:20 UTC (permalink / raw)
  To: 9fans

> What if we did without the timer variable?

sure.  as you say, it's pretty much equivalent.  the only thing that
troubles me is that the Occam version uses absolute timestamps, where
yours are relative.

when dealing with short intervals of time, given that scheduling is
somewhat arbitrary, the difference between:

	# do something every microsecond (relative timeout)
	for(;;){
		alt{
		timeout 1 =>
			# do something
		}
	}

and:

	# do something every microsecond (absolute timeout)
	t := now();
	for(;;){
		alt{
		timeout t =>
			# do something;
			t++;
		}
	}

might be significant.  the former version allows errors to accumulate,
where the latter does not.  having calculated the timeout necessary
for the alt, you never know exactly when it is going to acually start;
a relative timeout is inevitably inaccurate.

i think that might be one of the reasons why the transputer folks (who
thought quite hard about things) chose absolute over relative timeouts.

> Once you've figured out what a good interface is, implementing
> it is subtle and difficult to get right.  But it only needs to be done
> right once and then everyone benefits.  Channel communication
> is complicated too under the hood, but it's still a good abstraction.

i agree totally. this was my motivation in posting.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-26 18:40       ` rog
  2005-09-26 18:52         ` Russ Cox
@ 2005-09-27 10:12         ` Axel Belinfante
  1 sibling, 0 replies; 16+ messages in thread
From: Axel Belinfante @ 2005-09-27 10:12 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> > I do use a timer, by having a proc that repeatedly sleeps
> > and decrements a counter, and when the counter reaches
> > zero it sends a (nil) timeout message on a channel.
> > in alts I not only wait for the io channels but
> > also for the timer timeout channel.
> > 
> > the question is how to start and reset the timer.
> > I have been thinking about using channels for that
> > too, but that seems deadlock prone: how to avoid
> > the case where I want to send a reset message to
> > the timer when the timer wants to send an expiration
> > message to me?
> 
> i think this is a good question.
> 
> i've found writing time-based code using the threads library to be
> quite awkward.  it seems to me that there may be room for an extension
> to help with writing this kind of code.
> 
> the difficulty with the plan 9 thread library (and with Limbo too) is
> that sleep(2) exists in a different universe to channels, so one has
> to use a separate process to bridge the gap.
> 
> but when this thread is sleeping, it's not possible to communicate
> with it, so one needs another thread to act as an intermediary, and
> one has to design the interface carefully - if possible one doesn't
> want a separate process and thread for each thread that wishes to wait
> for a little while.

I have seen the followup discussion after this post and like
the idea of support for this in the thread library.
Indeed the accuracy may/will be higher than what I'm using
right now (but for my use it is not really an issue, I guess).

This is just to share the approach I have taken after
my initial posts on the topic.

(after some bad experience) I've abandoned the idea
of a timer process that only delivers expiration messages,
and with which one communicates to start and cancel timers.
Instead I'm using (something like) the etimer(2) approach.
I have now a proc that regularly sends ticks over a channel
using non-blocking sends, and decrement timers and check
for expiration in the alt.
Hmm... thinking on it while writing this, I suppose
that tickproc could just as well use blocking sends.

void
tickproc(void *v)
{
  for(;;) {
    sleep(tickTime);
    nbsend((Channel*)v, nil);
  }
}

use of ticks:

  ...
  t = ticksToWait;
  done = 0;
  while(!done)
    switch(alt(a)){
    case iochannel:
      done = 1;
      ... other handling ...
      break;
    ...
    case tickchannel:
      t--;
      if (t == 0) {
        done = 1;
        ... handle timeout ...
      }
    }
  ...

This seems to get the job done in
a less complex and more clean way
than what I was using before.

Axel.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
@ 2005-09-21 15:48 Fco. J. Ballesteros
  0 siblings, 0 replies; 16+ messages in thread
From: Fco. J. Ballesteros @ 2005-09-21 15:48 UTC (permalink / raw)
  To: 9fans

:  Programming with notes or signals is asking for trouble.
:  Always.  I'm sorry that threadint/threadkill are in the
:  library.

Nothing that can't be solved using Cut. :-)
Appart from aquarela and execnet
how many programs are using this?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [9fans] thread confusion
@ 2005-09-21 13:53 Fco. J. Ballesteros
  2005-09-21 14:32 ` Axel Belinfante
  0 siblings, 1 reply; 16+ messages in thread
From: Fco. J. Ballesteros @ 2005-09-21 13:53 UTC (permalink / raw)
  To: 9fans

:  A/The main weird thing is that after doing threadint on a thread
:  (created with proccreate) which presumably is hanging in a read
:  sometimes(?) the process just disappears without leaving a trace,
:  even though it is packed with syslog calls
:  (of which only the first part gets executed).

Did you call threadnotify()? Put a print there (the handler)
to see what's going on.

Also, forwarding advice Russ gave time before :-), read the alef paper
and don't use interrupts at all.

BTW, if you are debugging, use threadsetname() and then use ps -a.

Let me know if I can help somehow.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] thread confusion
  2005-09-21 13:53 Fco. J. Ballesteros
@ 2005-09-21 14:32 ` Axel Belinfante
  0 siblings, 0 replies; 16+ messages in thread
From: Axel Belinfante @ 2005-09-21 14:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> :  A/The main weird thing is that after doing threadint on a thread
> :  (created with proccreate) which presumably is hanging in a read
> :  sometimes(?) the process just disappears without leaving a trace,
> :  even though it is packed with syslog calls
> :  (of which only the first part gets executed).
> 
> Did you call threadnotify()?

No, threadint() (see below)

> Put a print there (the handler) to see what's going on.
lots of syslog already (instead of print - does that matter?)

> Also, forwarding advice Russ gave time before :-), read the alef paper
> and don't use interrupts at all.

Agreed. I tried to do that (not use interrupts).

The problem is that in this particular thread/proc I call
library routine tlsClient which may hang in a read from a pipe
of which I am holding the other end (to tunnel the messages).
It may happen that 'the other end' decides to give up
on a TLS handshake in progress and start a new TLS handshake,
in which case I have to 'clean up' my side of the handshake
in progress.
This is where sometimes things go wrong:
it seems that just closing my end of the pipe
is not sufficient to get tlsClient out of the read.

That's why I tried to resort to threadint:
	"Threadint interrupts a thread that is blocked
	 in a channel operation or system call"
(although now the question is what it means to be interrupted)

Probably I should instead investigate how/why closing
(my end of) the pipe is not sufficient.

Would not be surprised if I'm making mistake that's
so trivial/basic that I'm just overlooking it :-(

(in the mean time my piece of code/administration to
 deal with all this is starting to live a life of its
 own so just getting it right and simple would be good.
 back to the drawing board, I guess...)

Axel.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [9fans] thread confusion
@ 2005-09-21 13:25 Axel Belinfante
  0 siblings, 0 replies; 16+ messages in thread
From: Axel Belinfante @ 2005-09-21 13:25 UTC (permalink / raw)
  To: 9fans

I've been wrestling with the thread library trying
to do resource management in my 802.1x thingy.

This has proven to be harder than I envisioned -
I guess I've been spending more time trying to get this right
(I want this right since it may run as a daemon for a long time,
 doing a new tls handshake every 20 minutes or even more often)
than I think I've spent on the protocol/state machine part :-(

One reason may be that I'm still not very experienced with thread(2).
Another may be that weird things are happening.

A/The main weird thing is that after doing threadint on a thread
(created with proccreate) which presumably is hanging in a read
sometimes(?) the process just disappears without leaving a trace,
even though it is packed with syslog calls
(of which only the first part gets executed).

Actually, it does leave a trace, because when invoking
acid on the main process and doing threads() or stacks()
acid complains that setproc cannot read /proc/XXX/mem
where XXX presumably was the pid of the disappeared process.
(this seems to suggest that also the thread administration
 was not aware of the thread/process dying?)

This is hard to track since I'm not really able to reproduce it,
though I may be able to detect it when it happened.

Any ideas of what might be going on here, or how to debug this?

Another weird thing (bug?) is that threapid always returns -1
(I started looking a bit into this, but maybe someone in the know
 sees the problem immediately)

(furthermore thread(2) and thread.h seem to be inconsistent
 regarding return values of (e.g.) threadint*, threadkill*
 but this is not hard to fix; I could/can submit a patch)

Axel.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-09-27 10:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-21 14:44 [9fans] thread confusion Fco. J. Ballesteros
2005-09-21 15:05 ` Axel Belinfante
2005-09-21 15:42   ` Russ Cox
2005-09-21 20:25     ` Axel Belinfante
2005-09-21 20:32       ` Axel Belinfante
2005-09-21 20:37       ` Russ Cox
2005-09-21 22:34         ` Axel Belinfante
2005-09-21 22:44           ` Russ Cox
2005-09-26 18:40       ` rog
2005-09-26 18:52         ` Russ Cox
2005-09-26 19:20           ` rog
2005-09-27 10:12         ` Axel Belinfante
  -- strict thread matches above, loose matches on Subject: below --
2005-09-21 15:48 Fco. J. Ballesteros
2005-09-21 13:53 Fco. J. Ballesteros
2005-09-21 14:32 ` Axel Belinfante
2005-09-21 13:25 Axel Belinfante

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).