From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: To: 9fans@9fans.net Date: Sat, 25 Feb 2012 04:29:33 +0100 From: cinap_lenrek@gmx.de MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: [9fans] duppage Topicbox-Message-UUID: 64b0a06a-ead7-11e9-9d60-3106f5b1d025 discovered odd behaviour on a mp system. it was running rchttpd and werc and after like 3 or 7 days of load, broken sed and grep processes appeared in the process table. inspecting the process with acid yields a strange picture. the process crashed (or aborted themselves) before any data was read from stdin just after allocating some memory (grep uses sbrk() directly, where sed uses pool malloc). in the case of grep, the global bloc seemed to have been reset to a past value, and seds mainmem structure was also inconsistent with reality. while trying to put the pieces together, something interesting came up. duppage() is called by fixfault for COW, with a locked, image backed, non shared page. it makes a new copy for the image cache, and then removes the page from the image cache. to do this, it has to allocate a new page from the page allocator, temporarily unlocking the page. what we observe is that when duppage reacquires the page lock, the pages refcount sometimes is >1 meaning another processor just grabed that page out of the image cache. (tested this with a print() and it triggered multiple times right after boot) fixfault still assumes the page to be non shared and inserts it into the process pagetable. a change that rechecks the refcount after calling duppage() in fixfault() and doing a copy like for the ref > 1 case seems to have made the problem go away. (system is running for 8 days now) anyone with a mp system can confirm this? -- cinap