From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_MSPIKE_H2,T_SCC_BODY_TEXT_LINE autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 20491 invoked from network); 3 Sep 2022 18:56:10 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 3 Sep 2022 18:56:10 -0000
Received: (qmail 14057 invoked by uid 550); 3 Sep 2022 18:56:07 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 14021 invoked from network); 3 Sep 2022 18:56:06 -0000
MIME-Version: 1.0
Date: Sat, 03 Sep 2022 21:55:53 +0300
From: Alexey Izbyshev <izbyshev@ispras.ru>
To: musl@lists.openwall.com
In-Reply-To: <20220903170846.GA9709@brightrain.aerifal.cx>
References: <20220903001940.185345-1-izbyshev@ispras.ru>
 <20220903044820.GA8625@brightrain.aerifal.cx>
 <9fafafa1a8810b1d30798d499ebbb52f@ispras.ru>
 <20220903170846.GA9709@brightrain.aerifal.cx>
User-Agent: Roundcube Webmail/1.4.4
Message-ID: <73db75ec1992b5d10723668ae2c4038b@ispras.ru>
X-Sender: izbyshev@ispras.ru
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [musl] [PATCH] fix potential ref count overflow in sem_open()

On 2022-09-03 20:08, Rich Felker wrote:
> On Sat, Sep 03, 2022 at 05:28:56PM +0300, Alexey Izbyshev wrote:
>> On 2022-09-03 07:48, Rich Felker wrote:
>> >On Sat, Sep 03, 2022 at 03:19:40AM +0300, Alexey Izbyshev wrote:
>> >>sem_open() attempts to avoid overflow on the future ref count
>> >>increment
>> >>by computing the sum of all ref counts in semtab and checking
>> >>that it's
>> >>not INT_MAX when semtab is locked for slot reservation.  This approach
>> >>suffers from a TOCTTOU: by the time semtab is re-locked after
>> >>opening a
>> >>semaphore, the sum could have already been increased by a concurrent
>> >>sem_open(), so it will overflow on the increment. An individual ref
>> >>count can be overflowed as well if the call happened to open a
>> >>duplicate
>> >>of the only other semaphore.
>> >>
>> >>Moreover, since the overflow avoidance check admits a negative (i.e.
>> >>overflowed) sum, after this state is reached, an individual ref count
>> >>can be incremented until it reaches 1 again, and then sem_close() will
>> >>incorrectly free the semaphore, promoting what could be just a
>> >>signed-overflow UB to a use-after-free.
>> >>
>> >>Fix the overflow check by accounting for the maximum possible
>> >>number of
>> >>concurrent sem_open() calls that have already reserved a slot, but
>> >>haven't incremented the ref count yet.
>> >>---
>> >>Alternatively, we could use the number of currently reserved slots
>> >>instead of SEM_NSEMS_MAX, preserving the ability of individual ref
>> >>counts to reach INT_MAX in non-concurrent scenarios. I'm not sure
>> >>whether it matters, so I'm sending a smaller of the two fixes.
>> >>
>> >>Alexey
>> >>---
>> >> src/thread/sem_open.c | 2 +-
>> >> 1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >>diff --git a/src/thread/sem_open.c b/src/thread/sem_open.c
>> >>index 0ad29de9..611a3f64 100644
>> >>--- a/src/thread/sem_open.c
>> >>+++ b/src/thread/sem_open.c
>> >>@@ -61,7 +61,7 @@ sem_t *sem_open(const char *name, int flags, ...)
>> >> 		if (!semtab[i].sem && slot < 0) slot = i;
>> >> 	}
>> >> 	/* Avoid possibility of overflow later */
>> >>-	if (cnt == INT_MAX || slot < 0) {
>> >>+	if (cnt > INT_MAX - SEM_NSEMS_MAX || slot < 0) {
>> >> 		errno = EMFILE;
>> >> 		UNLOCK(lock);
>> >> 		return SEM_FAILED;
>> >>--
>> >>2.37.2
>> >
>> >Thanks! This is probably acceptable at least relative to the current
>> >behavior, but thinking about it, the current behavior (this whole
>> >logic) doesn't really make sense. If flags are such that a new
>> >semaphore can't be created, the claim that there's no way to handle
>> >failure after opening is false; the operation is side-effect free and
>> >we can just back out. The only case where we can't back out is when
>> >we're creating a new semaphore, and in that case, it will never be
>> >incrementing the refcnt on an existing slot; it will always be a new
>> >slot. And even in this case, I think we could back out if we needed
>> >to, since the file is created initially with a temp name.
>> 
>> Indeed, my patch tried to fix the issue while staying within the
>> wider logic of the current implementation. Since it makes the newly
>> created semaphore visible via link() *before* re-locking semtab, the
>> following scenario could happen:
>> 
>> * thread 1 calls sem_open("/sem", O_CREAT) and proceeds until after
>> link() succeeds
>> * other threads successfully call sem_open("/sem", 0) INT_MAX times
>> * thread 1 re-locks semtab, discovers another slot with the i_no it
>> created, but can't increment the refcnt
>> 
>> So in this case even though thread 1 did create a new semaphore, it
>> still has to use an existing slot and hence can't back out (if it
>> does, it'd look like semaphore creation didn't succeed, but opening
>> the semaphore did).
>> 
>> If link() and the refcnt increment occur under the same lock, this
>> issue can be avoided, as in the attached draft patch.
>> 
>> There is still a separate spurious failure case when after opening
>> SEM_NSEMS_MAX semaphores an application can't open even an already
>> opened one, but I don't think it's possible to completely fix in
>> case O_CREAT is specified because we can't know whether we need to
>> allocate a new slot or not before link() returns.
>> 
>> Thanks,
>> Alexey
> 
>> diff --git a/src/thread/sem_open.c b/src/thread/sem_open.c
>> index 0ad29de9..7f648c19 100644
>> --- a/src/thread/sem_open.c
>> +++ b/src/thread/sem_open.c
>> @@ -34,7 +34,7 @@ sem_t *sem_open(const char *name, int flags, ...)
>>  	va_list ap;
>>  	mode_t mode;
>>  	unsigned value;
>> -	int fd, i, e, slot, first=1, cnt, cs;
>> +	int fd, i, e, slot, first=1, cs, created=0;
>>  	sem_t newsem;
>>  	void *map;
>>  	char tmp[64];
>> @@ -55,13 +55,8 @@ sem_t *sem_open(const char *name, int flags, ...)
>>  	/* Reserve a slot in case this semaphore is not mapped yet;
>>  	 * this is necessary because there is no way to handle
>>  	 * failures after creation of the file. */
>> -	slot = -1;
>> -	for (cnt=i=0; i<SEM_NSEMS_MAX; i++) {
>> -		cnt += semtab[i].refcnt;
>> -		if (!semtab[i].sem && slot < 0) slot = i;
>> -	}
>> -	/* Avoid possibility of overflow later */
>> -	if (cnt == INT_MAX || slot < 0) {
>> +	for (slot=0; slot<SEM_NSEMS_MAX && semtab[slot].sem; slot++);
>> +	if (slot == SEM_NSEMS_MAX) {
>>  		errno = EMFILE;
>>  		UNLOCK(lock);
>>  		return SEM_FAILED;
>> @@ -93,6 +88,7 @@ sem_t *sem_open(const char *name, int flags, ...)
>>  					goto fail;
>>  				}
>>  				close(fd);
>> +				LOCK(lock);
>>  				break;
>>  			}
>>  			if (errno != ENOENT)
>> @@ -128,9 +124,16 @@ sem_t *sem_open(const char *name, int flags, ...)
>>  			goto fail;
>>  		}
>>  		close(fd);
>> +
>> +		LOCK(lock);
>>  		e = link(tmp, name) ? errno : 0;
>>  		unlink(tmp);
>> -		if (!e) break;
>> +		if (!e) {
>> +			created = 1;
>> +			break;
>> +		}
>> +		UNLOCK(lock);
>> +
>>  		munmap(map, sizeof(sem_t));
>>  		/* Failure is only fatal when doing an exclusive open;
>>  		 * otherwise, next iteration will try to open the
>> @@ -142,13 +145,19 @@ sem_t *sem_open(const char *name, int flags, 
>> ...)
>>  	/* See if the newly mapped semaphore is already mapped. If
>>  	 * so, unmap the new mapping and use the existing one. Otherwise,
>>  	 * add it to the table of mapped semaphores. */
>> -	LOCK(lock);
>> -	for (i=0; i<SEM_NSEMS_MAX && semtab[i].ino != st.st_ino; i++);
>> -	if (i<SEM_NSEMS_MAX) {
>> -		munmap(map, sizeof(sem_t));
>> -		semtab[slot].sem = 0;
>> -		slot = i;
>> -		map = semtab[i].sem;
>> +	if (!created) {
>> +		for (i=0; i<SEM_NSEMS_MAX && semtab[i].ino != st.st_ino; i++);
>> +		if (i<SEM_NSEMS_MAX) {
>> +			munmap(map, sizeof(sem_t));
>> +			if (semtab[i].refcnt == INT_MAX) {
>> +				UNLOCK(lock);
>> +				errno = EMFILE;
>> +				goto fail;
>> +			}
>> +			semtab[slot].sem = 0;
>> +			slot = i;
>> +			map = semtab[i].sem;
>> +		}
>>  	}
>>  	semtab[slot].refcnt++;
>>  	semtab[slot].sem = map;
> 
> One thought: in light of the complexity we're fighting with here, does
> unlocking and relocking just to save some contention from a long
> critical section containing syscalls like open, etc. really make
> sense? If we just held the lock the whole time, none of this would
> matter. It is quite a big critical section, but it's not *blocking*,
> and aside from silly intentional create-and-unlink dances to force
> repeated retries, it's one where everything makes forward progress.
> 
I can't find strong arguments for either approach, but here is what I 
think.

The patch above does roughly the following:

1. Reserve a slot inside a critical section.
2. Do some side-effect free stuff outside of critical sections.
3. link() and complete sem_open() inside a critical section, or goto 2 
if retry is appropriate.

 From the cognitive complexity POV, I wouldn't say it's too hard, and 
it's simpler than before because the refcnt is now managed inside the 
same critical section, so the only cross-section interaction is wrt. the 
slot reservation.

However, before the overhaul in bf258341b717 10 years ago, sem_open() 
did use a single critical section for the part with the loop, and the 
overhaul split it and made it so that almost no syscalls are called 
inside the critical sections. Also, f70375df85d2 mentions preservation 
of the "property of performing the munmap syscall after releasing the 
lock". So if you indeed had a strong reason to avoid syscalls inside 
critical sections, my patch above already goes against it by requiring 
at least link() to be called under the lock, and *maybe* we could just 
complete the circle and put the whole function under the lock as well 
for the sake of simplicity. I don't see a good reason for contention on 
sem_open() within a single process, and you can't have too many 
different open semaphores anyway.

Alexey