9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] duppage
@ 2012-03-13 23:55 erik quanstrom
  0 siblings, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2012-03-13 23:55 UTC (permalink / raw)
  To: 9fans

with panic's duppage fix, i get hits.

Tue Mar 13 17:31:34: minooka# duppage: p->ref 2 != 1
Tue Mar 13 17:31:34: duppage: p->ref 2 != 1
Tue Mar 13 17:31:34: duppage: p->ref 3 != 1
Tue Mar 13 17:31:34: duppage: p->ref 2 != 1
Tue Mar 13 17:31:34: duppage: p->ref 2 != 1
Tue Mar 13 17:31:34: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 3 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 3 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 3 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 3 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 3 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:35: duppage: p->ref 2 != 1
Tue Mar 13 17:31:39: duppage: p->ref 2 != 1
Tue Mar 13 17:31:39: duppage: p->ref 2 != 1
Tue Mar 13 17:31:39: duppage: p->ref 2 != 1
Tue Mar 13 17:31:39: duppage: p->ref 2 != 1
Tue Mar 13 17:31:39: duppage: p->ref 2 != 1
Tue Mar 13 17:31:41: duppage: p->ref 2 != 1
Tue Mar 13 17:31:41: duppage: p->ref 2 != 1
Tue Mar 13 17:31:41: duppage: p->ref 2 != 1
Tue Mar 13 17:31:42: duppage: p->ref 2 != 1
Tue Mar 13 17:31:42: duppage: p->ref 2 != 1
Tue Mar 13 17:31:42: duppage: p->ref 2 != 1
Tue Mar 13 17:31:42: duppage: p->ref 2 != 1
Tue Mar 13 17:31:42: duppage: p->ref 2 != 1
Tue Mar 13 17:31:44: duppage: p->ref 2 != 1
Tue Mar 13 17:31:44: duppage: p->ref 2 != 1
Tue Mar 13 17:31:44: duppage: p->ref 2 != 1
Tue Mar 13 17:31:44: duppage: p->ref 2 != 1
Tue Mar 13 17:31:44: duppage: p->ref 2 != 1
Tue Mar 13 17:31:44: duppage: p->ref 2 != 1
Tue Mar 13 17:31:44: duppage: p->ref 2 != 1
Tue Mar 13 17:31:45: duppage: p->ref 2 != 1
Tue Mar 13 17:31:45: duppage: p->ref 2 != 1
Tue Mar 13 17:31:45: duppage: p->ref 2 != 1
Tue Mar 13 17:31:45: duppage: p->ref 2 != 1
Tue Mar 13 17:31:45: duppage: p->ref 3 != 1
Tue Mar 13 17:31:45: duppage: p->ref 2 != 1
Tue Mar 13 17:31:45: duppage: p->ref 2 != 1
Tue Mar 13 17:31:45: duppage: p->ref 2 != 1
Tue Mar 13 17:31:45: duppage: p->ref 2 != 1
Tue Mar 13 17:31:45: duppage: p->ref 2 != 1

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 18:15         ` erik quanstrom
  2014-06-08 18:37           ` Charles Forsyth
@ 2014-06-11 18:04           ` erik quanstrom
  1 sibling, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2014-06-11 18:04 UTC (permalink / raw)
  To: 9fans

On Sun Jun  8 14:17:16 EDT 2014, quanstro@quanstro.net wrote:
> On Sun Jun  8 13:55:52 EDT 2014, cinap_lenrek@felloff.net wrote:
> > right. the question is, how did it vanish from the image cache.
>
> i think it is in the image cache, but .ref >1.

perhaps independent of your question,
my assumption is correct, and proven.

the problem was that pagereclaim() only looked through
pga.pgsza[0], but since 2MiB pages were introduced in nix,
no pages could ever be reclaimed this way, since they would
be in pga.pgsza[1].

i don't think this really addresses your question, or my
original problem though.  the corrected version which i think
should work for any architecture is below.  i intend to
incorporate charles locking changes later on.
(lock(&lga.pgsza[lg]) instead of the whole thing.)
i think those changes (in the 9atom /sys/src/9 kernels)
are good stuff.

- erik
---
/*
 * Called from imagereclaim, to try to release Images.
 * The (ignored) argument indicates a preferred image for release.
 */
void
pagereclaim(Image*)
{
	int lg, n;
	usize sz;
	Page *p;

	lock(&pga);

	/*
	 * All the pages with images backing them are at the
	 * end of the list (see putpage) so start there and work
	 * backward.  Assume the smallest page size (BOTCH).
	 */
	sz = 0;
	n = 0;
	for(lg = 0; lg < m->npgsz; lg++)
		for(p = pga.pgsza[lg].tail; p != nil; p = p->prev){
			if(p->image != nil && p->ref == 0 && canlock(p)){
				if(p->ref == 0) {
					n++;
					sz += 1<<m->pgszlg2[lg];
					uncachepage(p);
				}
			unlock(p);
			if(sz >= 20*MiB && n>5)
				break;
		}
	}
	unlock(&pga);
}



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-10 13:56                     ` Steve Simon
@ 2014-06-10 15:30                       ` erik quanstrom
  0 siblings, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2014-06-10 15:30 UTC (permalink / raw)
  To: 9fans

On Tue Jun 10 09:58:18 EDT 2014, steve@quintile.net wrote:
> > if a process exits and is then run again, it will always be re-read
> > from storage.  (since channel comparisons factor in to finding
> > an image.)  only if the lifetime overlaps will the cached image be
> > used.
>
> The one place where  I can imagine lots of cache hits is when running
> parallel mk jobs, it would be interesting to measure and see how much
> of a win it is.

1.6MB of 6c read vs the theoretical 25MB, according to my measurements.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-10 12:58                   ` erik quanstrom
  2014-06-10 13:56                     ` Steve Simon
@ 2014-06-10 14:06                     ` cinap_lenrek
  1 sibling, 0 replies; 29+ messages in thread
From: cinap_lenrek @ 2014-06-10 14:06 UTC (permalink / raw)
  To: 9fans

no. my attachimage() compares qid + mountid (which is unique) and
reattaches the passed in channel if image->c was nil. when
a porcess exits, the segments are released, decrementing ref of
the pages and the images. the image has an additional field pgref where
it counts the number of page references (that is, the number of
references minus the references from segments). in putimage(),
when image->ref == image->pgref, we know that all references
to our image are from the cache only and thats when we close the image
channel and set image->c to nil. once the image gets attached again,
image->c will be set again as mentioned above.

this is just to avoid holding the channel reference when the image
is only kept arround for the cache. so mounts will go away propery
and not wait until the image is reclaimed.

it will find the right image for the channel in any case. nothing
has changed in that regard.

--
cinap



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-10 12:58                   ` erik quanstrom
@ 2014-06-10 13:56                     ` Steve Simon
  2014-06-10 15:30                       ` erik quanstrom
  2014-06-10 14:06                     ` cinap_lenrek
  1 sibling, 1 reply; 29+ messages in thread
From: Steve Simon @ 2014-06-10 13:56 UTC (permalink / raw)
  To: 9fans

> if a process exits and is then run again, it will always be re-read
> from storage.  (since channel comparisons factor in to finding
> an image.)  only if the lifetime overlaps will the cached image be
> used.

The one place where  I can imagine lots of cache hits is when running
parallel mk jobs, it would be interesting to measure and see how much
of a win it is.

idle Tuesday thoughts.

-Steve



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-10  3:53                 ` cinap_lenrek
@ 2014-06-10 12:58                   ` erik quanstrom
  2014-06-10 13:56                     ` Steve Simon
  2014-06-10 14:06                     ` cinap_lenrek
  0 siblings, 2 replies; 29+ messages in thread
From: erik quanstrom @ 2014-06-10 12:58 UTC (permalink / raw)
  To: 9fans

On Mon Jun  9 23:55:00 EDT 2014, cinap_lenrek@felloff.net wrote:
> while you'r at it. take a look at 9front imageattach() code.
> it allows the chan attached to the image to be released when the
> image is not in use. this avoids all these chans and mounts
> being kept arround until the image is reclaimed. the problem
> is worked arround in iostats by killing the filesystem process
> once the command exits. you can reproduce by copying a binary
> to a fresh ramfs, executing and then unmount. ramfs will stay
> arround because the image cache still holds onto the binaries
> channel.

i noticed that the private channel free queue was dropped in favor
of ccloseq.  good idea.

if i understand correctly, there's an other hand to this solution.
if a process exits and is then run again, it will always be re-read
from storage.  (since channel comparisons factor in to finding
an image.)  only if the lifetime overlaps will the cached image be
used.

i think it would make more sense to simply reclaim pages with
p->image != nil quicker, if you don't like the current set up.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-10  1:35               ` erik quanstrom
@ 2014-06-10  3:53                 ` cinap_lenrek
  2014-06-10 12:58                   ` erik quanstrom
  0 siblings, 1 reply; 29+ messages in thread
From: cinap_lenrek @ 2014-06-10  3:53 UTC (permalink / raw)
  To: 9fans

while you'r at it. take a look at 9front imageattach() code.
it allows the chan attached to the image to be released when the
image is not in use. this avoids all these chans and mounts
being kept arround until the image is reclaimed. the problem
is worked arround in iostats by killing the filesystem process
once the command exits. you can reproduce by copying a binary
to a fresh ramfs, executing and then unmount. ramfs will stay
arround because the image cache still holds onto the binaries
channel.

--
cinap



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-09  8:23             ` Charles Forsyth
  2014-06-09 13:21               ` erik quanstrom
@ 2014-06-10  1:35               ` erik quanstrom
  2014-06-10  3:53                 ` cinap_lenrek
  1 sibling, 1 reply; 29+ messages in thread
From: erik quanstrom @ 2014-06-10  1:35 UTC (permalink / raw)
  To: 9fans

On Mon Jun  9 04:25:00 EDT 2014, charles.forsyth@gmail.com wrote:

> On 8 June 2014 19:37, Charles Forsyth <charles.forsyth@gmail.com> wrote:
>
> > On 8 June 2014 19:15, erik quanstrom <quanstro@quanstro.net> wrote:
> >
> >> i think it is in the image cache, but .ref >1.
> >
> >
> > but in that case it will still not pio, but make a local writable copy.
>
>
> in fact ref > 1 is the copy-on-write case and in a sense the usual one,
> where the copy is needed.

i think iostats may confuse the issue a little bit, since
iostats changes the results of some of the tests in attachimage.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-09  8:23             ` Charles Forsyth
@ 2014-06-09 13:21               ` erik quanstrom
  2014-06-10  1:35               ` erik quanstrom
  1 sibling, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2014-06-09 13:21 UTC (permalink / raw)
  To: 9fans

> On 8 June 2014 19:37, Charles Forsyth <charles.forsyth@gmail.com> wrote:
>
> > On 8 June 2014 19:15, erik quanstrom <quanstro@quanstro.net> wrote:
> >
> >> i think it is in the image cache, but .ref >1.
> >
> >
> > but in that case it will still not pio, but make a local writable copy.
>
>
> in fact ref > 1 is the copy-on-write case and in a sense the usual one,
> where the copy is needed.

i'll get back to this.  after looking at the refcounting and thinking
about how imagealloc interacts with image allocation, the locking
scheme in imagereclaim does not make any sense.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 18:37           ` Charles Forsyth
@ 2014-06-09  8:23             ` Charles Forsyth
  2014-06-09 13:21               ` erik quanstrom
  2014-06-10  1:35               ` erik quanstrom
  0 siblings, 2 replies; 29+ messages in thread
From: Charles Forsyth @ 2014-06-09  8:23 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 386 bytes --]

On 8 June 2014 19:37, Charles Forsyth <charles.forsyth@gmail.com> wrote:

> On 8 June 2014 19:15, erik quanstrom <quanstro@quanstro.net> wrote:
>
>> i think it is in the image cache, but .ref >1.
>
>
> but in that case it will still not pio, but make a local writable copy.


in fact ref > 1 is the copy-on-write case and in a sense the usual one,
where the copy is needed.

[-- Attachment #2: Type: text/html, Size: 953 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 18:15         ` erik quanstrom
@ 2014-06-08 18:37           ` Charles Forsyth
  2014-06-09  8:23             ` Charles Forsyth
  2014-06-11 18:04           ` erik quanstrom
  1 sibling, 1 reply; 29+ messages in thread
From: Charles Forsyth @ 2014-06-08 18:37 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 198 bytes --]

On 8 June 2014 19:15, erik quanstrom <quanstro@quanstro.net> wrote:

> i think it is in the image cache, but .ref >1.


but in that case it will still not pio, but make a local writable copy.

[-- Attachment #2: Type: text/html, Size: 491 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 17:54       ` cinap_lenrek
@ 2014-06-08 18:15         ` erik quanstrom
  2014-06-08 18:37           ` Charles Forsyth
  2014-06-11 18:04           ` erik quanstrom
  0 siblings, 2 replies; 29+ messages in thread
From: erik quanstrom @ 2014-06-08 18:15 UTC (permalink / raw)
  To: 9fans

On Sun Jun  8 13:55:52 EDT 2014, cinap_lenrek@felloff.net wrote:
> right. the question is, how did it vanish from the image cache.

i think it is in the image cache, but .ref >1.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 17:50     ` Charles Forsyth
  2014-06-08 17:53       ` erik quanstrom
@ 2014-06-08 17:54       ` cinap_lenrek
  2014-06-08 18:15         ` erik quanstrom
  1 sibling, 1 reply; 29+ messages in thread
From: cinap_lenrek @ 2014-06-08 17:54 UTC (permalink / raw)
  To: 9fans

right. the question is, how did it vanish from the image cache.

--
cinap



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 17:50     ` Charles Forsyth
@ 2014-06-08 17:53       ` erik quanstrom
  2014-06-08 17:54       ` cinap_lenrek
  1 sibling, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2014-06-08 17:53 UTC (permalink / raw)
  To: 9fans

On Sun Jun  8 13:51:18 EDT 2014, charles.forsyth@gmail.com wrote:

> On 8 June 2014 18:34, erik quanstrom <quanstro@quanstro.net> wrote:
>
> > well, those are the measurements.  do you think they are misleading?
> >  perhaps
> > with the pio happening in another context?  i haven't hunted this down.
> >
>
> the difference is only how fault makes the copy (easy or hard), there
> shouldn't be any call to pio either way.

unless the image is not cached, or doesn't have 1 reference.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 17:34   ` erik quanstrom
@ 2014-06-08 17:50     ` Charles Forsyth
  2014-06-08 17:53       ` erik quanstrom
  2014-06-08 17:54       ` cinap_lenrek
  0 siblings, 2 replies; 29+ messages in thread
From: Charles Forsyth @ 2014-06-08 17:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 348 bytes --]

On 8 June 2014 18:34, erik quanstrom <quanstro@quanstro.net> wrote:

> well, those are the measurements.  do you think they are misleading?
>  perhaps
> with the pio happening in another context?  i haven't hunted this down.
>

the difference is only how fault makes the copy (easy or hard), there
shouldn't be any call to pio either way.

[-- Attachment #2: Type: text/html, Size: 705 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 17:22 ` Charles Forsyth
  2014-06-08 17:34   ` erik quanstrom
@ 2014-06-08 17:49   ` cinap_lenrek
  1 sibling, 0 replies; 29+ messages in thread
From: cinap_lenrek @ 2014-06-08 17:49 UTC (permalink / raw)
  To: 9fans

duppage() causes the freelist to be shuffled differently. without
stuffing cached pages at the freelist tail, the tail accumulates
a uncached "stopper" page which breaks the invariant of imagereclaim
which just scans from the tail backwards as long as the pages are
cached.

imagereclaim does not move the pages to the head after uncaching them!
so by default imagereclaim prevents the cached pages before the ones
it reclaimed from being reclaimed ever.

before image reclaim: H UUUUUUUUCCCCCCCCCC T
after image reclaim:  H UUUUUUUUCCCCUUUUUU T
                                    ^- as far was imagereclaim went

nwith duppage, theres always new cached pages added at the tail.

i suspect once you run out of images, imagereclaim will run constantly and
blow away the little usefull image cache you still have causing additional
reads to page them back in.

--
cinap



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 17:22 ` Charles Forsyth
@ 2014-06-08 17:34   ` erik quanstrom
  2014-06-08 17:50     ` Charles Forsyth
  2014-06-08 17:49   ` cinap_lenrek
  1 sibling, 1 reply; 29+ messages in thread
From: erik quanstrom @ 2014-06-08 17:34 UTC (permalink / raw)
  To: 9fans

> that doesn't make any sense.  duppage copied the page the wrong way
> round (used the image page and put another copy in).  eliminating
> duppage simply copies the page from the image cache instead of using
> that page.  there isn't any i/o in either case.

well, those are the measurements.  do you think they are misleading?  perhaps
with the pio happening in another context?  i haven't hunted this down.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 14:53 erik quanstrom
  2014-06-08 17:15 ` cinap_lenrek
@ 2014-06-08 17:22 ` Charles Forsyth
  2014-06-08 17:34   ` erik quanstrom
  2014-06-08 17:49   ` cinap_lenrek
  1 sibling, 2 replies; 29+ messages in thread
From: Charles Forsyth @ 2014-06-08 17:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 471 bytes --]

On 8 June 2014 15:53, erik quanstrom <quanstro@quanstro.net> wrote:

> i was experimenting a bit with cinap's version of dropping duppage, and for
> the lame build the kernel tests there's quite a bit more i/o
>

that doesn't make any sense. duppage copied the page the wrong way round
(used the image page and put another copy in).
eliminating duppage simply copies the page from the image cache instead of
using that page. there isn't any i/o in either case.

[-- Attachment #2: Type: text/html, Size: 864 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2014-06-08 14:53 erik quanstrom
@ 2014-06-08 17:15 ` cinap_lenrek
  2014-06-08 17:22 ` Charles Forsyth
  1 sibling, 0 replies; 29+ messages in thread
From: cinap_lenrek @ 2014-06-08 17:15 UTC (permalink / raw)
  To: 9fans

i get consistent results with iostats for building pc64.
(on amd64)

  166      192   104916        0        0 /bin/rc
    4       90   343308        0        0 /bin/awk
   37       51    51280        0        0 /bin/echo
   17       43   103786        0        0 /bin/sed
    3       17    51567        0        0 /bin/ls
    3       12    34624        0        0 /bin/grep
   13       24    40114        0        0 /bin/cp
    1        6    19112        0        0 /bin/pwd
    4       15    40658        0        0 /bin/xd
  128      192   263785        0        0 /bin/6c
    5       34   113149        0        0 /bin/6l
    4       27    89551        0        0 /bin/6a
    1       10    33360        0        0 /bin/mkdir
    2        9    22016        0        0 /bin/dd
    4       18    55085        0        0 /bin/strip
    1       17    62301        0        0 /bin/mkpaqfs
    4       14    33797        0        0 /bin/rm
    2        3      121        0        0 /bin/membername
    2       15    46432        0        0 /bin/tr
    1       19    72544        0        0 /bin/ar
    1        4    10992        0        0 /bin/cat
    2       24    86580        0        0 /bin/hoc
    4       26    86433        0        0 /bin/file
    4       13    33439        0        0 /bin/aux/data2s
    1        7    23488        0        0 /bin/date
    1       14    49502        0        0 /bin/size
    1       25    94969        0        0 /bin/mk

i made a small test trying running echo 1,2,3 and 4 times
and i get exactly one additional read per exec (which
is the read of the file header) all the other pages
are cached.

with MCACHE mount, it is exactly the same amount of
reads no matter how often i run it. :)

but thats not loaded. is your machine starved of memory?

my guess would be that the cached pages getting uncached
and reused in your case. i remember fixing some bugs
in imagereclaim that could potentially cause this.

but thats all speculation...

theres a statistics struct there that you can peek on
from acid -k and see how often imagereclaim runs between
your test passes.

--
cinap



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] duppage
@ 2014-06-08 14:53 erik quanstrom
  2014-06-08 17:15 ` cinap_lenrek
  2014-06-08 17:22 ` Charles Forsyth
  0 siblings, 2 replies; 29+ messages in thread
From: erik quanstrom @ 2014-06-08 14:53 UTC (permalink / raw)
  To: 9fans

i was experimenting a bit with cinap's version of dropping duppage, and for
the lame build the kernel tests there's quite a bit more i/o

		duppage		no duppage
read		45976291	53366962
rpc		73674		75718

you can see below that both end up reading 6909416 bytes
from 6c for 136 executions.  6c is only 264450 text+data,
so that's 26 unnecessary reads (6c had already been cached).

the original fairs better reading only 1816296, but that's
still way too much.

this needs a better algorithm.

- erik

---
without duppage
Opens    Reads  (bytes)   Writes  (bytes) File
    4       43   301801        0        0 /bin/ape/sh
    3       12    61137        0        0 /bin/ls
  155      476  1642711        0        0 /bin/rc
    5       84   602660        0        0 /bin/awk
    9       79   514294        0        0 /bin/6a
  136     1060  6909416        0        0 /bin/6c
    5       50   329397        0        0 /bin/6l
    8       20    66901        0        0 /bin/echo
    3       11    51238        0        0 /bin/xd
    3       13    72700        0        0 /bin/sed
    4       13    54357        0        0 /bin/cp
    4       33   201057        0        0 /bin/file
    4       21   125962        0        0 /bin/strip
    4       12    51024        0        0 /bin/aux/data2s
    2        9    42686        0        0 /bin/rm

with duppage
Opens    Reads  (bytes)   Writes  (bytes) File
    4       31   216169        0        0 /bin/ape/sh
    3       10    50833        0        0 /bin/ls
  155      190   210423        0        0 /bin/rc
    5       57   398636        0        0 /bin/awk
    9       57   357441        0        0 /bin/6a
  136      370  1816296        0        0 /bin/6c
    5       29   178269        0        0 /bin/6l
    8       13    35485        0        0 /bin/echo
    3       11    59756        0        0 /bin/sed
    3        9    40742        0        0 /bin/xd
    4       10    39909        0        0 /bin/cp
    4       18   100617        0        0 /bin/file
    4       12    60634        0        0 /bin/strip
    4        9    37104        0        0 /bin/aux/data2s
    2        8    38150        0        0 /bin/rm



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2012-03-03  9:58         ` Richard Miller
@ 2012-03-03 19:30           ` cinap_lenrek
  0 siblings, 0 replies; 29+ messages in thread
From: cinap_lenrek @ 2012-03-03 19:30 UTC (permalink / raw)
  To: 9fans

> Jetzt verstehe ich.  Before returning 1 (failure) duppage
> always calls uncachepage first, so no harm is done.
exactly!

> How about submitting a patch(1)?
i think geoff is working on it.

i wanted to verify this first. the code is subtile and
there might be even better ways to fix this.

modified the pager and swap code in 9front in other ways
too. so making a patch for sources plan9 requires me do
do some manual recreation of the fix and bind fakery.

--
cinap



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2012-02-29 19:50       ` cinap_lenrek
@ 2012-03-03  9:58         ` Richard Miller
  2012-03-03 19:30           ` cinap_lenrek
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Miller @ 2012-03-03  9:58 UTC (permalink / raw)
  To: 9fans

> in 9front, i changed the return type of duppage to void.

Jetzt verstehe ich.  Before returning 1 (failure) duppage
always calls uncachepage first, so no harm is done.

Good analysis.  How about submitting a patch(1)?




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2012-02-29 19:35     ` Richard Miller
@ 2012-02-29 19:50       ` cinap_lenrek
  2012-03-03  9:58         ` Richard Miller
  0 siblings, 1 reply; 29+ messages in thread
From: cinap_lenrek @ 2012-02-29 19:50 UTC (permalink / raw)
  To: 9fans

fixfault() in fault.c is the only user of duppage().

in 9front, i changed the return type of duppage to void.

fixfault() now looks like this:

		...
		if(lkp->image == &swapimage)
			ref = lkp->ref + swapcount(lkp->daddr);
		else
			ref = lkp->ref;
		if(ref == 1 && lkp->image){
			/* save a copy of the original for the image cache */
			duppage(lkp);
			ref = lkp->ref;
		}
		unlock(lkp);
		if(ref > 1){
			new = newpage(0, &s, addr);
			if(s == 0)
				return -1;
			*pg = new;
			copypage(lkp, *pg);
			putpage(lkp);
		}
		...

--
cinap



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2012-02-29 14:14   ` cinap_lenrek
@ 2012-02-29 19:35     ` Richard Miller
  2012-02-29 19:50       ` cinap_lenrek
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Miller @ 2012-02-29 19:35 UTC (permalink / raw)
  To: 9fans

> the problem we have is that we temporarily unlock
> p to acquire palloc lock, wich opens a chance for
> someone to take a ref on p, but duppage doesnt
> recheck after reqcquiering the p lock.
>
> if this happens, duppage() has to return and let
> fixfault() make a copy for its segment.

That makes sense to me.

Should we be worried that duppage returns 0 or 1 for
success or failure, but fixfault ignores the return value?




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2012-02-29 10:31 ` Richard Miller
@ 2012-02-29 14:14   ` cinap_lenrek
  2012-02-29 19:35     ` Richard Miller
  0 siblings, 1 reply; 29+ messages in thread
From: cinap_lenrek @ 2012-02-29 14:14 UTC (permalink / raw)
  To: 9fans

no, this is different.

the XXX - there's a bug is about the new page that
duppage() makes. it explains a race where the new
page is on the freelist and the image cache.
whats important is that when someone (lookpage)
locks the page, its refcount and image/daddr is
consistent wich should be the case as far as i can
see.

if you are paranoid, you can just check in newpage()
auxpage() and duppage() what the refcount of the page
is you just grabbed from the freelist is. if its anything
other than 0, then we hit that XXX bug. (done that,
never triggered)

at the point when duppage unlocks(np), the following
things are possible:

lookpage() could succeed locking np, finding its ref
to be 0 and incrementing its ref and unchaining it
from the freelist. (all while holding palloc lock
so it wont interfere with newpage() or auxpage())

if newpage(), auxpage() or duppage() got the lock first,
the page will be uncached, or point to a differnt
image/daddr (duppage). (with some luck, it might even
point to the same image/daddr, then it just doesnt matter :))

or

someone like newpage() or auxpage() succeeding locking it,
incrementing its ref and removing it from the image cache. the
cachepage() is done by duppage() while holding lock on np
so theres no race.

or

duppage() on another process succeeds with locking np,
removing it from the image cache, and filling it with
contents of the to be duped page and putting it back
in the image cache.

all this looks safe to me, but i might be missing something
of course... anyway, thats about all i know about the
XXX comment :)

back to the duppage bug i was talking about...

the bug that i see is that when duppage() is called,
p has to have a ref of 1. the whole duppage approach
will not work if p is shared with other segments
already.

in the normal case of duppage() when we have p locked,
and p->ref is 1, and we are calling uncachepage(p); before
unlocking p we are safe.

because even if someone like lookpage finds the page in the
image cache, it has to lock it first. when it succeeds
locking p, then lookpage() will find p->image != i because
duppage called uncachepage(p) before unlocking it. so it wont
get shared ever again.

the problem we have is that we temporarily unlock
p to acquire palloc lock, wich opens a chance for
someone to take a ref on p, but duppage doesnt
recheck after reqcquiering the p lock.

if this happens, duppage() has to return and let
fixfault() make a copy for its segment.

--
cinap



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2012-02-25  3:29 cinap_lenrek
  2012-02-25 20:09 ` Charles Forsyth
  2012-02-26  4:13 ` erik quanstrom
@ 2012-02-29 10:31 ` Richard Miller
  2012-02-29 14:14   ` cinap_lenrek
  2 siblings, 1 reply; 29+ messages in thread
From: Richard Miller @ 2012-02-29 10:31 UTC (permalink / raw)
  To: 9fans

> anyone with a mp system can confirm this?

Yes, I've confirmed by experiment that duppage(lkp) can return
with lkp->ref > 1.

Is what you've found related to the "XXX - here's a bug" comment
in duppage?  Maybe that's what really needs fixing.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2012-02-25  3:29 cinap_lenrek
  2012-02-25 20:09 ` Charles Forsyth
@ 2012-02-26  4:13 ` erik quanstrom
  2012-02-29 10:31 ` Richard Miller
  2 siblings, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2012-02-26  4:13 UTC (permalink / raw)
  To: 9fans

> a change that rechecks the refcount after calling duppage() in
> fixfault() and doing a copy like for the ref > 1 case seems to have
> made the problem go away. (system is running for 8 days now)
>
> anyone with a mp system can confirm this?

the description sounds logical, but i haven't seen this.
do you have a good way to replicate this?  it would be
good to keep around for testing.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] duppage
  2012-02-25  3:29 cinap_lenrek
@ 2012-02-25 20:09 ` Charles Forsyth
  2012-02-26  4:13 ` erik quanstrom
  2012-02-29 10:31 ` Richard Miller
  2 siblings, 0 replies; 29+ messages in thread
From: Charles Forsyth @ 2012-02-25 20:09 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I think we should institute a Sherlock Holmes award at iwp9.
(It wouldn't mean you need to throw yourself off a building.)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] duppage
@ 2012-02-25  3:29 cinap_lenrek
  2012-02-25 20:09 ` Charles Forsyth
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: cinap_lenrek @ 2012-02-25  3:29 UTC (permalink / raw)
  To: 9fans

discovered odd behaviour on a mp system.  it was running rchttpd and
werc and after like 3 or 7 days of load, broken sed and grep processes
appeared in the process table.  inspecting the process with acid
yields a strange picture.  the process crashed (or aborted themselves)
before any data was read from stdin just after allocating some memory
(grep uses sbrk() directly, where sed uses pool malloc).  in the case
of grep, the global bloc seemed to have been reset to a past value,
and seds mainmem structure was also inconsistent with reality.

while trying to put the pieces together, something interesting came up.

duppage() is called by fixfault for COW, with a locked, image backed,
non shared page.  it makes a new copy for the image cache, and then
removes the page from the image cache.

to do this, it has to allocate a new page from the page allocator,
temporarily unlocking the page.  what we observe is that when duppage
reacquires the page lock, the pages refcount sometimes is >1 meaning
another processor just grabed that page out of the image cache.
(tested this with a print() and it triggered multiple times right after boot)

fixfault still assumes the page to be non shared and inserts it into
the process pagetable.

a change that rechecks the refcount after calling duppage() in
fixfault() and doing a copy like for the ref > 1 case seems to have
made the problem go away. (system is running for 8 days now)

anyone with a mp system can confirm this?

--
cinap




^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-06-11 18:04 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-13 23:55 [9fans] duppage erik quanstrom
  -- strict thread matches above, loose matches on Subject: below --
2014-06-08 14:53 erik quanstrom
2014-06-08 17:15 ` cinap_lenrek
2014-06-08 17:22 ` Charles Forsyth
2014-06-08 17:34   ` erik quanstrom
2014-06-08 17:50     ` Charles Forsyth
2014-06-08 17:53       ` erik quanstrom
2014-06-08 17:54       ` cinap_lenrek
2014-06-08 18:15         ` erik quanstrom
2014-06-08 18:37           ` Charles Forsyth
2014-06-09  8:23             ` Charles Forsyth
2014-06-09 13:21               ` erik quanstrom
2014-06-10  1:35               ` erik quanstrom
2014-06-10  3:53                 ` cinap_lenrek
2014-06-10 12:58                   ` erik quanstrom
2014-06-10 13:56                     ` Steve Simon
2014-06-10 15:30                       ` erik quanstrom
2014-06-10 14:06                     ` cinap_lenrek
2014-06-11 18:04           ` erik quanstrom
2014-06-08 17:49   ` cinap_lenrek
2012-02-25  3:29 cinap_lenrek
2012-02-25 20:09 ` Charles Forsyth
2012-02-26  4:13 ` erik quanstrom
2012-02-29 10:31 ` Richard Miller
2012-02-29 14:14   ` cinap_lenrek
2012-02-29 19:35     ` Richard Miller
2012-02-29 19:50       ` cinap_lenrek
2012-03-03  9:58         ` Richard Miller
2012-03-03 19:30           ` cinap_lenrek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).