9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] imagereclaim()
@ 2014-03-01 21:46 cinap_lenrek
  2014-03-01 21:55 ` erik quanstrom
  0 siblings, 1 reply; 18+ messages in thread
From: cinap_lenrek @ 2014-03-01 21:46 UTC (permalink / raw)
  To: 9fans

the imagereclaim algorithm is wrong.

	/*
	 * All the pages with images backing them are at the
	 * end of the list (see putpage) so start there and work
	 * backward.
	 */
	for(p = palloc.tail; p && p->image && n<1000; p = p->prev) {
		if(p->ref == 0 && canlock(p)) {
			if(p->ref == 0) {
				n++;
				uncachepage(p);
			}
			unlock(p);
		}
	}

1) the uncachepage() call breaks the invariant that pages
cached in a image are at the end of the page freelist. this
can cause a gab of uncached pages in a run of cached pages
from the end of the freelist. imagereclaim() will be unable
to reclaim the images refered by the pages after the gab.

in 9front, we unchain the page and put it back on the head
of the freelist after uncachepage().

2) there needs to a check to skip pages with
p->imagge->notext != 0 in case you use the mount cache
(port/cache.c). trying to uncache a page on a notext
image will not reclaim any image structures.

3) bonus: when you run a program from ramfs, the
ramfs will stay arround for quite a while even after
the program has long exited and the ramfs has been
unmounted. the reason is that we still cache the pages
of the program keeping the image arround. the image
has a Chan* reference to the original text file. until
the image is reclaimed and the channel closed, the mount
will stay arround preventing the fileserver (ramfs) from
exiting.

one can avoid keeping the chan reference by separately
tracking the number of page references to an image.
so when image->ref == image->pgref, we know that all
references to a image are from the page freelist.
in that case, one can close the channel in the image.
when the image gets attached again (attachimage()),
we take a new reference to the supplied channel and
stick it on the image.

use the channels mountid and quid to check for
file equality as mountid's are not reused within the
same kernel. no need to keep the channel arround.

i suspect the bigboy hack (which increases the conf.nimage
parameter to 2000 in pc/main.c confinit() for cpu servers)
might be partially due to this imagereclaim() bug trying
to cure the symptom.

--
cinap



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-01 21:46 [9fans] imagereclaim() cinap_lenrek
@ 2014-03-01 21:55 ` erik quanstrom
  2014-03-01 22:22   ` cinap_lenrek
  0 siblings, 1 reply; 18+ messages in thread
From: erik quanstrom @ 2014-03-01 21:55 UTC (permalink / raw)
  To: 9fans

On Sat Mar  1 16:51:29 EST 2014, cinap_lenrek@felloff.net wrote:
> the imagereclaim algorithm is wrong.

yes.  and so is duppage().  charles has some ideas here.

- erik



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-01 21:55 ` erik quanstrom
@ 2014-03-01 22:22   ` cinap_lenrek
  2014-03-01 22:52     ` erik quanstrom
  2014-03-01 23:09     ` Charles Forsyth
  0 siblings, 2 replies; 18+ messages in thread
From: cinap_lenrek @ 2014-03-01 22:22 UTC (permalink / raw)
  To: 9fans

not sure what you mean. the page passed to duppage() is not
on the freelist. and the new page duppage() allocates and
caches is chained on the tail as far as i can see. so
the invariant required by imagereclaim() holds. or are
you refering to a different problem?

--
cinap



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-01 22:22   ` cinap_lenrek
@ 2014-03-01 22:52     ` erik quanstrom
  2014-03-01 23:05       ` cinap_lenrek
  2014-03-01 23:09     ` Charles Forsyth
  1 sibling, 1 reply; 18+ messages in thread
From: erik quanstrom @ 2014-03-01 22:52 UTC (permalink / raw)
  To: 9fans

On Sat Mar  1 17:24:03 EST 2014, cinap_lenrek@felloff.net wrote:
> not sure what you mean. the page passed to duppage() is not
> on the freelist. and the new page duppage() allocates and
> caches is chained on the tail as far as i can see. so
> the invariant required by imagereclaim() holds. or are
> you refering to a different problem?

duppage races with itself, potentially putting thousands of pages
on the free list because for whatever reason it puts the new page
back, not the old one.

but it's hard to see the carbuncle for the pimples.

- erik



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-01 22:52     ` erik quanstrom
@ 2014-03-01 23:05       ` cinap_lenrek
  2014-03-01 23:11         ` erik quanstrom
  2014-03-01 23:14         ` Charles Forsyth
  0 siblings, 2 replies; 18+ messages in thread
From: cinap_lenrek @ 2014-03-01 23:05 UTC (permalink / raw)
  To: 9fans

how is this possible? the old page is locked during the call
to duppage(). so duppage() will not run concurrently on the
same page.

--
cinap



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-01 22:22   ` cinap_lenrek
  2014-03-01 22:52     ` erik quanstrom
@ 2014-03-01 23:09     ` Charles Forsyth
  1 sibling, 0 replies; 18+ messages in thread
From: Charles Forsyth @ 2014-03-01 23:09 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 261 bytes --]

On 1 March 2014 22:22, <cinap_lenrek@felloff.net> wrote:

> or are
> you refering to a different problem?
>

duppage isn't necessary. neither, come to that, is the page hash table.
i do without them both, and without a few other things, like the pager.

[-- Attachment #2: Type: text/html, Size: 646 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-01 23:05       ` cinap_lenrek
@ 2014-03-01 23:11         ` erik quanstrom
  2014-03-01 23:14         ` Charles Forsyth
  1 sibling, 0 replies; 18+ messages in thread
From: erik quanstrom @ 2014-03-01 23:11 UTC (permalink / raw)
  To: 9fans

On Sat Mar  1 18:07:04 EST 2014, cinap_lenrek@felloff.net wrote:
> how is this possible? the old page is locked during the call
> to duppage(). so duppage() will not run concurrently on the
> same page.

what happens if !canlock(&pga) is true and ∴ p is unlocked.

- erik



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-01 23:05       ` cinap_lenrek
  2014-03-01 23:11         ` erik quanstrom
@ 2014-03-01 23:14         ` Charles Forsyth
  2014-03-02  1:21           ` cinap_lenrek
  1 sibling, 1 reply; 18+ messages in thread
From: Charles Forsyth @ 2014-03-01 23:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

On 1 March 2014 23:05, <cinap_lenrek@felloff.net> wrote:

> how is this possible? the old page is locked during the call
> to duppage(). so duppage() will not run concurrently on the
> same page.
>

there's a race to decide whether to do it, and what happens to the page
once it has. it reaches back into the page load code.
you can easily end up with many copies of the same page. worse, duppage
calls a putmmu/mmuput,
which in some implementations calls newpage, which can lead to deadlock,
since duppage has the conch.
the whole thing is meant well, but misconceived. i can't even remember the
details since it's so long since i
cut out all that stuff.

[-- Attachment #2: Type: text/html, Size: 1151 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-01 23:14         ` Charles Forsyth
@ 2014-03-02  1:21           ` cinap_lenrek
  2014-03-02  3:14             ` erik quanstrom
  0 siblings, 1 reply; 18+ messages in thread
From: cinap_lenrek @ 2014-03-02  1:21 UTC (permalink / raw)
  To: 9fans

i checked the code on sources again and found the issue.
the lookpage() in 9front is different. we lock the page from
the image cache hash chain and check if the page is still
cached and for the right disk address under the lock.

but if it changed or got uncached by duppage(), we unlock
and *retry* instead of giving up the search.

so duppage() appears atomic on 9front, but on labs, a process
already commited to use that page in lookpage() will get a
uncached page after the lock() succeeds and miss the (potential)
new page put in by duppage().

but what you describe is possible with pio() in general.

a process decides to demand load a page in pio() when lookpage()
doesnt return a cached page.

if multiple processes compete on loading the same page while its not in
the cache, they will indeed do so. in the case that they share the
segment, it will get deduplicated after the load and the page of the
first process finishing the load wins, the other pages will be put back
on the freelist (uncached). if they use distinct segments,
they (the processes) will all cache ther pages in the image and the
image will end up with multiple pages caching the same disk address
of the file.

it would be nice if the SP9SSS could speak out in detail on how they
do page caching differently than classic plan9.

--
cinap



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-02  1:21           ` cinap_lenrek
@ 2014-03-02  3:14             ` erik quanstrom
  2014-03-02  3:55               ` cinap_lenrek
  0 siblings, 1 reply; 18+ messages in thread
From: erik quanstrom @ 2014-03-02  3:14 UTC (permalink / raw)
  To: 9fans

odd coincidence.  this just happened:

no panlic:o cpou0:p cpu0:s de
adlock/abandoned lock 0xfffffffff21547f0 key 0xfffffffff215483c pc 0xfffffffff01ce2bd proc 1893 held by pc 0xfffffffff01c

acid; src(0xfffffffff01ce2bd)
/sys/src/nix/port/page.c:552
 547		for(f = pghash(daddr); f; f = f->hash){
 548			if(f->image == i && f->daddr == daddr){
 549				unlock(&pga.hashlock);
 550
 551				lock(&pga);
>552				lock(f);
 553				if(f->image != i || f->daddr != daddr){
 554					unlock(f);
 555					unlock(&pga);
 556					return 0;
 557				}

acid; src(0xfffffffff01cbc49)
/sys/src/nix/port/fault.c:148
 143				(*pg)->modref |= PG_REF;
 144				break;
 145			}
 146
 147			lkp = *pg;
>148			lock(lkp);
 149
 150			ref = lkp->ref;
 151			if(ref > 1) {
 152				unlock(lkp);
 153



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-02  3:14             ` erik quanstrom
@ 2014-03-02  3:55               ` cinap_lenrek
  2014-03-02  4:05                 ` erik quanstrom
  2014-03-02  9:48                 ` Charles Forsyth
  0 siblings, 2 replies; 18+ messages in thread
From: cinap_lenrek @ 2014-03-02  3:55 UTC (permalink / raw)
  To: 9fans

checked nix/port/page.c. your duppage() is wrong.

	/* don't dup pages with no image */
	if(p->ref == 0 || p->image == nil || p->image->notext)
		return 0;

	/*
	 *  normal lock ordering is to call
	 *  lock(&pga) before lock(p).
	 *  To avoid deadlock, we have to drop
	 *  our locks and try again.
	 */
	if(!canlock(&pga)){
		unlock(p);
		if(up)
			sched();
		lock(p);
		goto retry;
	}

you need to check p->ref != 1 instead of p->ref == 0. the page
passed to duppage() is still cached. after you unlock(p), someone
can come in and take a reference to the page from the image
cache (lookpage()), making p->ref > 1 once you get the lock back.

put an assert or print in there after the if(!canlock(&pga){} block
to check p->ref.

when this happens, the caller to duppage() (fixfault) must not
modify the page or use it in his procs segment (on copy on write)
but make a copy for itself because the other processes that
grabbed the reference is (commited to) reading it (outside of the
page lock of course, so you already lost when p->ref != 1).

--
cinap



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-02  3:55               ` cinap_lenrek
@ 2014-03-02  4:05                 ` erik quanstrom
  2014-03-02  4:30                   ` cinap_lenrek
  2014-03-02  9:48                 ` Charles Forsyth
  1 sibling, 1 reply; 18+ messages in thread
From: erik quanstrom @ 2014-03-02  4:05 UTC (permalink / raw)
  To: 9fans

On Sat Mar  1 22:56:25 EST 2014, cinap_lenrek@felloff.net wrote:
> checked nix/port/page.c. your duppage() is wrong.
>
> 	/* don't dup pages with no image */
> 	if(p->ref == 0 || p->image == nil || p->image->notext)
> 		return 0;
>
> 	/*
> 	 *  normal lock ordering is to call
> 	 *  lock(&pga) before lock(p).
> 	 *  To avoid deadlock, we have to drop
> 	 *  our locks and try again.
> 	 */
> 	if(!canlock(&pga)){
> 		unlock(p);
> 		if(up)
> 			sched();
> 		lock(p);
> 		goto retry;
> 	}
>
> you need to check p->ref != 1 instead of p->ref == 0. the page
> passed to duppage() is still cached. after you unlock(p), someone
> can come in and take a reference to the page from the image
> cache (lookpage()), making p->ref > 1 once you get the lock back.
>
> put an assert or print in there after the if(!canlock(&pga){} block
> to check p->ref.

why do i need an assert or print after the unlock?  it should go back
through the loop and notice p->ref != 1.

- erik



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-02  4:05                 ` erik quanstrom
@ 2014-03-02  4:30                   ` cinap_lenrek
  0 siblings, 0 replies; 18+ messages in thread
From: cinap_lenrek @ 2014-03-02  4:30 UTC (permalink / raw)
  To: 9fans

nono. i ment to suggest that you put a print/asser after the whole if
block in case you do not belive what i said could can happen. and
once convinced, change the p->ref == 0 to p->ref != 1 and fix
fixfault() :-)

--
cinap



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-02  3:55               ` cinap_lenrek
  2014-03-02  4:05                 ` erik quanstrom
@ 2014-03-02  9:48                 ` Charles Forsyth
  2014-03-02 18:46                   ` cinap_lenrek
  1 sibling, 1 reply; 18+ messages in thread
From: Charles Forsyth @ 2014-03-02  9:48 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

On 2 March 2014 03:55, <cinap_lenrek@felloff.net> wrote:

> checked nix/port/page.c. your duppage() is wrong.


It isn't needed at all. When a cached page is written, it's trying hard to
replace the page in the cache by a new copy,
to return the previously cached page. Instead, I copy the cached page and
return the copy, which is what it already
does in another instance. In the current version, not yet available (but
soon), I do away with the page hash
table entirely, so there's nowhere for the page hash chains to accumulate.
Page is smaller and simpler, with
a reference count but no lock.

[-- Attachment #2: Type: text/html, Size: 1031 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-02  9:48                 ` Charles Forsyth
@ 2014-03-02 18:46                   ` cinap_lenrek
  2014-03-02 18:53                     ` Charles Forsyth
  0 siblings, 1 reply; 18+ messages in thread
From: cinap_lenrek @ 2014-03-02 18:46 UTC (permalink / raw)
  To: 9fans

you'r right. the smartness of duppage() isnt really neccesary. we
can just leave the cache alone. when memory is low, newpage() will
uncache pages for us.

fixfault():
		...
		lkp = *pg;
		lock(lkp);
		if(lkp->ref == 0)
			panic("fault %#p ref == 0", lkp);
		if(lkp->ref == 1 && lkp->image == nil) {
			unlock(lkp);
		} else if(lkp->image == &swapimage && (lkp->ref + swapcount(lkp->daddr)) == 1) {
			uncachepage(lkp);
			unlock(lkp);
		} else {
			unlock(lkp);
			new = newpage(0, &s, addr);
			if(s == 0)
				return -1;
			*pg = new;
			copypage(lkp, *pg);
			putpage(lkp);
		}
		mmuphys = PPN((*pg)->pa) | PTEWRITE | PTEVALID;
		(*pg)->modref = PG_MOD|PG_REF;
		break;

--
cinap



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-02 18:46                   ` cinap_lenrek
@ 2014-03-02 18:53                     ` Charles Forsyth
  2014-03-02 22:59                       ` Anthony Martin
  0 siblings, 1 reply; 18+ messages in thread
From: Charles Forsyth @ 2014-03-02 18:53 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 381 bytes --]

On 2 March 2014 18:46, <cinap_lenrek@felloff.net> wrote:

> you'r right. the smartness of duppage() isnt really neccesary. we
> can just leave the cache alone. when memory is low, newpage() will
> uncache pages for us.
>

I also ripped out all the swap stuff. Either I'm on a machine with at last
a Gb or two of RAM, or it's embedded
and there's no paging disk, or both.

[-- Attachment #2: Type: text/html, Size: 783 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-02 18:53                     ` Charles Forsyth
@ 2014-03-02 22:59                       ` Anthony Martin
  2014-03-03  1:00                         ` erik quanstrom
  0 siblings, 1 reply; 18+ messages in thread
From: Anthony Martin @ 2014-03-02 22:59 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Charles Forsyth <charles.forsyth@gmail.com> once said:
> I also ripped out all the swap stuff. Either I'm on a machine with at last
> a Gb or two of RAM, or it's embedded
> and there's no paging disk, or both.

What do you do when you're out of memory? Panic?
Do you still have something like the kickpager in
nix?

  Anthony



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] imagereclaim()
  2014-03-02 22:59                       ` Anthony Martin
@ 2014-03-03  1:00                         ` erik quanstrom
  0 siblings, 0 replies; 18+ messages in thread
From: erik quanstrom @ 2014-03-03  1:00 UTC (permalink / raw)
  To: 9fans

On Sun Mar  2 18:01:39 EST 2014, ality@pbrane.org wrote:
> Charles Forsyth <charles.forsyth@gmail.com> once said:
> > I also ripped out all the swap stuff. Either I'm on a machine with at last
> > a Gb or two of RAM, or it's embedded
> > and there's no paging disk, or both.
>
> What do you do when you're out of memory? Panic?
> Do you still have something like the kickpager in
> nix?

nix just panics.  the old swapping code, for starters,
assumed a single page size.  again, a little pimple on
a carbuncle.

- erik



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-03-03  1:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-01 21:46 [9fans] imagereclaim() cinap_lenrek
2014-03-01 21:55 ` erik quanstrom
2014-03-01 22:22   ` cinap_lenrek
2014-03-01 22:52     ` erik quanstrom
2014-03-01 23:05       ` cinap_lenrek
2014-03-01 23:11         ` erik quanstrom
2014-03-01 23:14         ` Charles Forsyth
2014-03-02  1:21           ` cinap_lenrek
2014-03-02  3:14             ` erik quanstrom
2014-03-02  3:55               ` cinap_lenrek
2014-03-02  4:05                 ` erik quanstrom
2014-03-02  4:30                   ` cinap_lenrek
2014-03-02  9:48                 ` Charles Forsyth
2014-03-02 18:46                   ` cinap_lenrek
2014-03-02 18:53                     ` Charles Forsyth
2014-03-02 22:59                       ` Anthony Martin
2014-03-03  1:00                         ` erik quanstrom
2014-03-01 23:09     ` Charles Forsyth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).