9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] a pair nec bugs
@ 2011-05-20  1:30 erik quanstrom
  2011-05-20  5:05 ` [9fans] a pair nsec bugs erik quanstrom
  2011-05-20 10:43 ` [9fans] a pair nec bugs ron minnich
  0 siblings, 2 replies; 19+ messages in thread
From: erik quanstrom @ 2011-05-20  1:30 UTC (permalink / raw)
  To: 9fans

i've been having trouble with nsec.  i've been dealing with an application
that has ~128 processes.  it does not use the thread library, but these
processes all share memory.  unfortunately, this results in wild thrashing
of nsec(2) because there are more shared-memory processes than slots
in the fds[] table in /sys/src/libc/nsec.c.  a call to nsec can cause delays
of up to 1s.

unfortunately, this isn't the real motivation.  the real motivation is that
recently a few processes were added that need to have their own fd tables.
and unfortunately this causes nsec() to enter an infinite loop.

in looking at the code, there seemed to be a number of ways to fix one
particular problem, but the given algorithm seems resistant to a generally
correct solution.

it seemed to me that the _privates (see exec(2)) array was the way to go.
process-private memory will never by falsely shared between processes
that don't share fd tables.  i kept the global fd to keep from opening
nsec() too many times.

in the case of this application, instead of opening nsec() afresh on
nearly every call, it is now opened just 3 times.

one note is that while i'm aware of privalloc(2), i didn't use it.  the
implementation doesn't appear correct for shared-memory procs.
i think there are two issues
- locking is unnecessary.  the only preemptable unit of execution is
a process and each process is guarenteed to have its own instance
of _privates and _nprivates.
- for shared-memory procs, we will run out of privates because
the static privinit will be falsely shared.  privinit should be replaced
by using a private entry.

i've attached my proposed solution.

- erik

----

#include <u.h>
#include <libc.h>

extern	void	**_privates;
extern	int	_nprivates;

	int	fd	= -1;

/*
 * BUG: this is chosen by fiat and without coordination.
 * privalloc(2) does not appear safe in a shared-memory
 * environment.
 */
#define	Fd	((int*)_privates[0])

static uvlong order = 0x0001020304050607ULL;

static void
be2vlong(vlong *to, uchar *f)
{
	uchar *t, *o;
	int i;

	t = (uchar*)to;
	o = (uchar*)&order;
	for(i = 0; i < sizeof order; i++)
		t[o[i]] = f[i];
}

static int*
getfd(void)
{
	if(Fd != nil)
		return Fd;
	return &fd;
}

static void
reopen(int *fd)
{
	*fd = open("/dev/bintime", OREAD|OCEXEC);
}

vlong
nsec(void)
{
	int *p;
	uchar b[8];
	vlong t;

	p = getfd();
	if(*p == -1)
		reopen(p);
	if(pread(*p, b, sizeof b, 0) != sizeof b){
		if(p != Fd){
			p = malloc(sizeof *p);
			if(p == nil)
				return 0;
			_privates[0] = p;
		}
		reopen(p);
		if(pread(*p, b, sizeof b, 0) != sizeof b)
			return 0;
	}
	be2vlong(&t, b);
	return t;
}



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nsec bugs
  2011-05-20  1:30 [9fans] a pair nec bugs erik quanstrom
@ 2011-05-20  5:05 ` erik quanstrom
  2011-05-20 10:43 ` [9fans] a pair nec bugs ron minnich
  1 sibling, 0 replies; 19+ messages in thread
From: erik quanstrom @ 2011-05-20  5:05 UTC (permalink / raw)
  To: 9fans

there.  fixed that for me.

- erik



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-20  1:30 [9fans] a pair nec bugs erik quanstrom
  2011-05-20  5:05 ` [9fans] a pair nsec bugs erik quanstrom
@ 2011-05-20 10:43 ` ron minnich
  2011-05-20 10:52   ` roger peppe
  2011-05-20 12:47   ` erik quanstrom
  1 sibling, 2 replies; 19+ messages in thread
From: ron minnich @ 2011-05-20 10:43 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I think the growing complexity of nsec() shows that the file model
doesn't work in all cases ... the thing starts to look a bit overly
complex to me. The fact that it fails due to the size of a static fd
array is also a warning flag.

I think a better interface would be one in which you read two variables:
/dev/nsecoffset
/dev/nsecdivisor

You could then apply these to the output of cycles():
nsec = (cycles()/divisor)-offset

to get time.

nsec gives you precision but dropping into a system call to read time
tosses away any hope of accuracy you might achieve. Simply put, nsec()
is not correctly named :-)

ron



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-20 10:43 ` [9fans] a pair nec bugs ron minnich
@ 2011-05-20 10:52   ` roger peppe
  2011-05-20 10:57     ` ron minnich
  2011-05-20 12:47   ` erik quanstrom
  1 sibling, 1 reply; 19+ messages in thread
From: roger peppe @ 2011-05-20 10:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 805 bytes --]

Or one file containing the two numbers. But perhaps they change...
On 20 May 2011 11:45, "ron minnich" <rminnich@gmail.com> wrote:
> I think the growing complexity of nsec() shows that the file model
> doesn't work in all cases ... the thing starts to look a bit overly
> complex to me. The fact that it fails due to the size of a static fd
> array is also a warning flag.
>
> I think a better interface would be one in which you read two variables:
> /dev/nsecoffset
> /dev/nsecdivisor
>
> You could then apply these to the output of cycles():
> nsec = (cycles()/divisor)-offset
>
> to get time.
>
> nsec gives you precision but dropping into a system call to read time
> tosses away any hope of accuracy you might achieve. Simply put, nsec()
> is not correctly named :-)
>
> ron
>

[-- Attachment #2: Type: text/html, Size: 1038 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-20 10:52   ` roger peppe
@ 2011-05-20 10:57     ` ron minnich
  0 siblings, 0 replies; 19+ messages in thread
From: ron minnich @ 2011-05-20 10:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, May 20, 2011 at 3:52 AM, roger peppe <rogpeppe@gmail.com> wrote:
> Or one file containing the two numbers. But perhaps they change...
sadly, for the tsc, they do.

But there are supposed to be counters in there that don't suck.

ron



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-20 10:43 ` [9fans] a pair nec bugs ron minnich
  2011-05-20 10:52   ` roger peppe
@ 2011-05-20 12:47   ` erik quanstrom
  2011-05-20 19:03     ` ron minnich
  1 sibling, 1 reply; 19+ messages in thread
From: erik quanstrom @ 2011-05-20 12:47 UTC (permalink / raw)
  To: 9fans

On Fri May 20 06:44:51 EDT 2011, rminnich@gmail.com wrote:
> I think the growing complexity of nsec() shows that the file model
> doesn't work in all cases ... the thing starts to look a bit overly
> complex to me. The fact that it fails due to the size of a static fd
> array is also a warning flag.

hey, ron.  did you just tl;dr my post? ☺  you've posted this opinion
before, but i don't see how it relates to my implementation.
there are no arrays, the algorithm is pretty straightforward, and
it should work in all cases.  if this is not the case, let i'd be interested.

> You could then apply these to the output of cycles():
> nsec = (cycles()/divisor)-offset

this is orthagonal to the problem i'm solving.  you will
still need to solve the problem of shared memory, but nonshared
fd tables regardless.

- erik



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-20 12:47   ` erik quanstrom
@ 2011-05-20 19:03     ` ron minnich
  2011-05-20 19:16       ` erik quanstrom
  2011-05-21  3:27       ` erik quanstrom
  0 siblings, 2 replies; 19+ messages in thread
From: ron minnich @ 2011-05-20 19:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I did not read your post as carefully as I should, but I still have
concerns when people use stuff like nsec() in the quest of accuracy.
Sorry, I was sort of reacting to another thread in the Go list where
somebody is using gettimeofday() and seems to think the nsec field has
meaning :-)

We don't have good clocks.

ron



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-20 19:03     ` ron minnich
@ 2011-05-20 19:16       ` erik quanstrom
  2011-05-21  3:27       ` erik quanstrom
  1 sibling, 0 replies; 19+ messages in thread
From: erik quanstrom @ 2011-05-20 19:16 UTC (permalink / raw)
  To: 9fans

On Fri May 20 15:04:48 EDT 2011, rminnich@gmail.com wrote:
> I did not read your post as carefully as I should, but I still have
> concerns when people use stuff like nsec() in the quest of accuracy.
> Sorry, I was sort of reacting to another thread in the Go list where
> somebody is using gettimeofday() and seems to think the nsec field has
> meaning :-)

no problems.  i've since realized there's a bug (which was probablly the
same bug that resulted in the really complicated current implementation)
that i've reintroduced.

the good news is the solution is to remove code.  :-)  i'll post a more
complete solution in a bit.

> We don't have good clocks.

i totally agree with this, but in x86 land, it's worth remembering
things have gotten much better since the bad old p4 days.  all
recent processors do have a stable tsc.

i'm doing a little something like you suggest in /dev/irqalloc.
fields 3 and 4 are number of calls and cumulative number of
cycles.  you can divide by cpumhz (in /dev/cputype) to get time
in ns.  the silly division is because ns just isn't good enough
resolution.  ether0 takes about 3.03ns/call.

; awk '$3>0' /dev/irqalloc
         50          18           7140587794       16282733189984 lapic    clock
         65           1               141628           1482204961 ioapic   kbd
         73          10            209046583        2149493918790 msi      ether0
         97          10                  896             64993576 ioapic   usbohci
         97          10                  896             64993576 ioapic   usbohci
         97          10                  896             64993576 ioapic   usbohci
         97          10                  896             64993576 ioapic   usbohci
         97          10                  896             64993576 ioapic   starport-pex2s.0C040000
        121          11                    9               239651 msi      sdE (ahci)
        129          12              1972430          20894050761 ioapic   kbdaux

- erik



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-20 19:03     ` ron minnich
  2011-05-20 19:16       ` erik quanstrom
@ 2011-05-21  3:27       ` erik quanstrom
  2011-05-21  9:59         ` Charles Forsyth
  1 sibling, 1 reply; 19+ messages in thread
From: erik quanstrom @ 2011-05-21  3:27 UTC (permalink / raw)
  To: 9fans

here's an improved version.  the previous version had a problem when
shared memory but not shared fd in this situation
	p0:
		nsec()	-> fd 3
		...
	p1:
		open()	-> fd 3
		nsec()	-> fail

one potential modification is to ditch the malloc and allocate
two privalloc entries.

one interesting bit that is not documented in the privalloc(2) man
page is that privalloc() allocations are shared and inherited when
a process forks, but the memory spaces are not.  thus a privalloc
for each new pid would be an error.

it would require a modification to kexit() to change this.  but it might
make sense, because sharing privalloc entries seems exactly the
opposite of what privalloc is supposed to be doing.

- erik

---

#include <u.h>
#include <libc.h>
#include <tos.h>

typedef	struct	Nfd	Nfd;
struct Nfd {
	int	pid;
	int	fd;
};

static	void	**nsecpriv;
#define	Fd	((Nfd*)nsecpriv[0])


static uvlong order = 0x0001020304050607ULL;

static void
be2vlong(vlong *to, uchar *f)
{
	uchar *t, *o;
	int i;

	t = (uchar*)to;
	o = (uchar*)&order;
	for(i = 0; i < sizeof order; i++)
		t[o[i]] = f[i];
}

static Nfd*
getfd(void)
{
	Nfd *p;

	if(nsecpriv != nil && Fd->pid == _tos->pid)
		return Fd;
	if(nsecpriv == nil){
		/*
		 * privalloc's allocates slots on a shared
		 * basis, even though the memory slots
		 * themselves are proc-private.
		 */
		nsecpriv = privalloc();
		if(nsecpriv == nil)
			return nil;
		*nsecpriv = p = malloc(sizeof *p);
		if(p == nil)
			return nil;
	}else
		p = *nsecpriv;
	p->fd = -1;
	return p;
}

vlong
nsec(void)
{
	uchar b[8];
	vlong t;
	Nfd *p;

	if((p = getfd()) == nil)
		return 0;
	if(p->fd == -1){
		p->fd = open("/dev/bintime", OREAD|OCEXEC);
		p->pid = _tos->pid;
	}
	if(pread(p->fd, b, sizeof b, 0) == sizeof b){
		be2vlong(&t, b);
		return t;
	}
	return 0;
}



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-21  3:27       ` erik quanstrom
@ 2011-05-21  9:59         ` Charles Forsyth
  2011-05-21 12:16           ` erik quanstrom
  2011-05-21 21:50           ` erik quanstrom
  0 siblings, 2 replies; 19+ messages in thread
From: Charles Forsyth @ 2011-05-21  9:59 UTC (permalink / raw)
  To: 9fans

the most straightforward fix for nsec
is to change it back to do the obvious thing: open the file,
read in the time data, close the file and return the value.

we might like to add the cache back in, since a grep of /sys/src/cmd
suggests that it might be useful to do that.
most programs were fine with a statically-allocated
file descriptor, closed on exec, and using pread not read
to avoid the shared offset. complications arose with
forks when file descriptors were rearranged, sometimes with an extra twist when
the forks share data apart from the stack. programs do those things
by explicit calls.

we've had several attempts at trying to guess in the library's
guts when the cached value(s) have gone wrong, but they haven't worked
because there are too many cases and the library function can't detect them precisely
if at all. also, using the current process as the cache key probably isn't right:
often it's fine for the file descriptor to be shared by a group;
on the other hand, if each process has its own file descriptor,
there's no way to tell when the descriptors have been rearranged.
nor can the program currently tell the library what it has done.

file descriptors are just one form of shared state.
is it possible to devise a call or pair of calls
to manage library state such as shared file descriptors and static values
for a process or group of related processes?  it seems tricky,
and probably overly elaborate, since a memo about
existence of some state to invalidate is itself state.

perhaps it is better for each relevant library module to expose the
existence of its cached state through a function call to invalidate it
(or otherwise manage it explicitly) when needed.

syslog, times and truerand(!) also would benefit.
of all of them so far, only times(2) really is per-process.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-21  9:59         ` Charles Forsyth
@ 2011-05-21 12:16           ` erik quanstrom
  2011-05-21 21:50           ` erik quanstrom
  1 sibling, 0 replies; 19+ messages in thread
From: erik quanstrom @ 2011-05-21 12:16 UTC (permalink / raw)
  To: 9fans

> we've had several attempts at trying to guess in the library's
> guts when the cached value(s) have gone wrong, but they haven't worked
> because there are too many cases and the library function can't detect them precisely
> if at all. also, using the current process as the cache key probably isn't right:
> often it's fine for the file descriptor to be shared by a group;
> on the other hand, if each process has its own file descriptor,
> there's no way to tell when the descriptors have been rearranged.
> nor can the program currently tell the library what it has done.

while it is a bit clunky, i think nsec() can detect whenever it could
have the wrong fd and it can limit itself to 1 open per pid, which
would seem to be the case where it could make a big difference.
(that's the code i posted.)  it seem's oddly like ken's file server's
read ahead algorithm.  start read ahead when the second block
in a row is read.

it may open too many fds, but i believe it doesn't improperly
use fds, thrash when a table gets full, nor go into an infinite
loop under adverse conditions.

> file descriptors are just one form of shared state.

so are note groups.  the difference is we can get the note group
id from user space.  we can't do this with fd groups.  (there's
really no such concept.)

> syslog, times and truerand(!) also would benefit.
> of all of them so far, only times(2) really is per-process.

i suppose there are a couple of more precise ways of
dealing with this shared state.

1.  we already have an open flag OCEXEC.  wouldn't these
problems also be solved by a flag that closed the fd in
the child when the file descriptor table was copied?

2.  wrapping rfork so that rfork detects RFFDG, and
invalidates the child's cached file descriptors.

- erik



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-21  9:59         ` Charles Forsyth
  2011-05-21 12:16           ` erik quanstrom
@ 2011-05-21 21:50           ` erik quanstrom
  2011-05-21 22:14             ` Charles Forsyth
  1 sibling, 1 reply; 19+ messages in thread
From: erik quanstrom @ 2011-05-21 21:50 UTC (permalink / raw)
  To: 9fans

> syslog, times and truerand(!) also would benefit.
> of all of them so far, only times(2) really is per-process.

and time(2).

- erik



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-21 21:50           ` erik quanstrom
@ 2011-05-21 22:14             ` Charles Forsyth
  2011-05-22  3:06               ` erik quanstrom
  0 siblings, 1 reply; 19+ messages in thread
From: Charles Forsyth @ 2011-05-21 22:14 UTC (permalink / raw)
  To: 9fans

>and time(2).

i didn't include that because it calls nsec



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-21 22:14             ` Charles Forsyth
@ 2011-05-22  3:06               ` erik quanstrom
  2011-05-22 13:30                 ` Charles Forsyth
  0 siblings, 1 reply; 19+ messages in thread
From: erik quanstrom @ 2011-05-22  3:06 UTC (permalink / raw)
  To: 9fans

On Sat May 21 18:12:35 EDT 2011, forsyth@terzarima.net wrote:
> >and time(2).
>
> i didn't include that because it calls nsec

time calls an internal function, oldtime(),
that adds another private file descriptor.

i suppose that could be killed off by now.

- erik



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-22  3:06               ` erik quanstrom
@ 2011-05-22 13:30                 ` Charles Forsyth
  2011-06-02 19:16                   ` erik quanstrom
  0 siblings, 1 reply; 19+ messages in thread
From: Charles Forsyth @ 2011-05-22 13:30 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 257 bytes --]

>time calls an internal function, oldtime(),
>that adds another private file descriptor.

that's only a fall-back if the call
to nsec doesn't work because you're running on a kernel
that's years and years out of date; so it isn't really
relevant here.

[-- Attachment #2: Type: message/rfc822, Size: 1862 bytes --]

From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] a pair nec bugs
Date: Sat, 21 May 2011 23:06:23 -0400
Message-ID: <9b37d225088bda8d7f0dfae5d506837c@brasstown.quanstro.net>

On Sat May 21 18:12:35 EDT 2011, forsyth@terzarima.net wrote:
> >and time(2).
>
> i didn't include that because it calls nsec

time calls an internal function, oldtime(),
that adds another private file descriptor.

i suppose that could be killed off by now.

- erik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-05-22 13:30                 ` Charles Forsyth
@ 2011-06-02 19:16                   ` erik quanstrom
  2011-06-03  9:52                     ` Charles Forsyth
  0 siblings, 1 reply; 19+ messages in thread
From: erik quanstrom @ 2011-06-02 19:16 UTC (permalink / raw)
  To: 9fans

here's what i settled on for now.  i believe it to
be correct in all cases, but if you are sharing file
descriptors among many processes, you may have
/dev/bintime open many more times that necessary.

i put off solving other instances of fd-group private
until later, since the solution in hand is not very
satisfying.

in the future, i believe it might make sense to make
file descriptor groups, environment groups, etc. available
via /dev/fdid to allow for sharing without false sharing.

- erik
----

#include <u.h>
#include <libc.h>
#include <tos.h>

typedef	struct	Nfd	Nfd;
struct Nfd {
	int	pid;
	int	fd;
};

static	void	**nsecpriv;
#define	Fd	((Nfd*)nsecpriv[0])


static uvlong order = 0x0001020304050607ULL;

static void
be2vlong(vlong *to, uchar *f)
{
	uchar *t, *o;
	int i;

	t = (uchar*)to;
	o = (uchar*)&order;
	for(i = 0; i < sizeof order; i++)
		t[o[i]] = f[i];
}

static Nfd*
getfd(void)
{
	Nfd *p;

	if(nsecpriv != nil && Fd->pid == _tos->pid)
		return Fd;
	if(nsecpriv == nil){
		/*
		 * privalloc's allocates slots on a shared
		 * basis, even though the memory slots
		 * themselves are proc-private.
		 */
		nsecpriv = privalloc();
		if(nsecpriv == nil)
			return nil;
		*nsecpriv = p = malloc(sizeof *p);
		if(p == nil)
			return nil;
	}else
		p = Fd;
	p->fd = -1;
	return p;
}

vlong
nsec(void)
{
	uchar b[8];
	vlong t;
	Nfd *p;

	if((p = getfd()) == nil)
		return 0;
	if(p->fd == -1){
		p->fd = open("/dev/bintime", OREAD|OCEXEC);
		p->pid = _tos->pid;
	}
	if(pread(p->fd, b, sizeof b, 0) == sizeof b){
		be2vlong(&t, b);
		return t;
	}
	return 0;
}



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-06-02 19:16                   ` erik quanstrom
@ 2011-06-03  9:52                     ` Charles Forsyth
  2011-06-04 18:12                       ` adriano
  0 siblings, 1 reply; 19+ messages in thread
From: Charles Forsyth @ 2011-06-03  9:52 UTC (permalink / raw)
  To: 9fans

unfortunately, that one still fails:

h% 8c -w nsec.c
h% 8l nsec.8
h% 8.out
1307094033073099818
1307094033079570483
0

my approach is a little different.

in the library:

vlong
nsec(void)
{
	uchar b[8];
	vlong t;
	int fd, n;

	fd = open("/dev/bintime", OREAD);
	if(fd < 0)
		return 0;
	t = 0;
	if(pread(fd, b, sizeof(b), 0) == sizeof(b))
		be2vlong(&t, b);
	close(fd);
	return t;
}

what about applications that need to get the nsec frequently?
*in those applications*, i write

extern int timefd;

void
applicationinit(void)
{
	timefd = open("/dev/bintime", OREAD|OCEXEC);
	if(timefd < 0)
		sysfatal("can't open /dev/bintime: %r");
	...
}

vlong
readnsec(void)
{
	uchar b[8];
	vlong t;

	if(pread(timefd, b, sizeof(b), 0) != sizeof(b))
		return 0;	/* or sysfatal as you like */
	be2vlong(&t, b);
	return t;
}

that's the first phase. then since there are a few applications that do that,
it might be better to account for that in the library:

extern	vlong	readnsec(int fd);

it's similar to the above, but takes fd as a parameter.
alternatively you could have a function that unpacked and returned the
other values in bintime as well (clock ticks, and clock frequency), if you're keen.

the original flaw was having a library routine that messed about
with a process's file descriptors without any idea of context,
and worse, stored the file descriptor in a hidden location,
with no way to mark it later as redundant or invalid.
that makes the library function fragile, and you can't fix it from outside
the library. in the failing example above, i wrote:

	print("%llud\n", xnsec());
	print("%llud\n", xnsec());
	for(i = 2; i < 20; i++)
		close(i);
	print("%lld\n", xnsec());

(where xnsec is the most recently proposed version of nsec).
being able to close files seems reasonable to me, but invalidates the
file descriptor that nsec had stashed away. worse, i could open another
file later that had the same index for xnsec to try to read.

lots of other things could be changed, such as making rfork and close
into functions that wrap _rfork and _close system calls, so they can
notify other library functions to clean up their fd debris, but perhaps
it's better not to create the potential for debris.

other interfaces are possible that make the file descriptor visible
to the application, and any of those would be fine.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
  2011-06-03  9:52                     ` Charles Forsyth
@ 2011-06-04 18:12                       ` adriano
  0 siblings, 0 replies; 19+ messages in thread
From: adriano @ 2011-06-04 18:12 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Charles Forsyth wrote:
> what about applications that need to get the nsec frequently?
> *in those applications*, i write
>
>
Hi, all

Last month I had problems in a appl made up by 12 threads using a shared
file table.
Six of them continuosly get the time every 20..100 ms.
The last version of the appl worked perfectly since 2009, until I
changed a few details.
 From that point on I frequently had a messy file table and sometimes a
crash,
with a overall behaviour depending on the hw, on the presence of debug
print(), on
the network load etc etc ...

With /dev/bintime unexpectedly closed and viewing the nsec() code, I
thought to a critical race too.

Two weeks ago I've slightly modified the application, to have separate
(RFFDG) fd tables per thread.
This way, in my specific appl,  the problem seems to be avoided. All the
(15) machines work ok now.

adriano



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] a pair nec bugs
@ 2011-06-03 13:13 erik quanstrom
  0 siblings, 0 replies; 19+ messages in thread
From: erik quanstrom @ 2011-06-03 13:13 UTC (permalink / raw)
  To: 9fans

> unfortunately, that one still fails:

i was willing to accept the fact that closing random file descriptors
could result in lossage at this point to solve the infinite loop problem.
(and any library function that opens a fd is potentially racing with
any other thread closing random fds.)

it's worse than that, unfortunately.  since there's no close-on-exit
flag, a program that makes a habit of forking with a shared fd table
quickly hits the 5000 fd limit.  kfs is a quick casuality.

i do like your rsec(int fd), but for some reason i'd rather not admit
there's a fd in there.  one is tempted to paw through /proc/pid/fd.  :-)

- erik



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-06-04 18:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-20  1:30 [9fans] a pair nec bugs erik quanstrom
2011-05-20  5:05 ` [9fans] a pair nsec bugs erik quanstrom
2011-05-20 10:43 ` [9fans] a pair nec bugs ron minnich
2011-05-20 10:52   ` roger peppe
2011-05-20 10:57     ` ron minnich
2011-05-20 12:47   ` erik quanstrom
2011-05-20 19:03     ` ron minnich
2011-05-20 19:16       ` erik quanstrom
2011-05-21  3:27       ` erik quanstrom
2011-05-21  9:59         ` Charles Forsyth
2011-05-21 12:16           ` erik quanstrom
2011-05-21 21:50           ` erik quanstrom
2011-05-21 22:14             ` Charles Forsyth
2011-05-22  3:06               ` erik quanstrom
2011-05-22 13:30                 ` Charles Forsyth
2011-06-02 19:16                   ` erik quanstrom
2011-06-03  9:52                     ` Charles Forsyth
2011-06-04 18:12                       ` adriano
2011-06-03 13:13 erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).