From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13227 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: string-backed FILEs mess Date: Wed, 12 Sep 2018 10:02:39 -0400 Message-ID: <20180912140239.GV1878@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1536760851 2082 195.159.176.226 (12 Sep 2018 14:00:51 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 12 Sep 2018 14:00:51 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13243-gllmg-musl=m.gmane.org@lists.openwall.com Wed Sep 12 16:00:47 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1g05hD-0000O4-QJ for gllmg-musl@m.gmane.org; Wed, 12 Sep 2018 16:00:43 +0200 Original-Received: (qmail 27997 invoked by uid 550); 12 Sep 2018 14:02:52 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 27960 invoked from network); 12 Sep 2018 14:02:52 -0000 Content-Disposition: inline Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13227 Archived-At: While working on the headers/declarations/hidden stuff, I've run across some issues that should be documented as needing fix. One big one is this: strto* is still setting up a fake FILE with buffer end pointer as (void*)-1 as a hack to read directly from a string of unknown length. This is of course nonsensical UB and might lead to actual problems (signed overflow) in places where rend-rpos is computed. strtol.c handles that aspect already by "if ((size_t)s > (size_t)-1/2)" but it's still horribly wrong. There's a __string_read function sscanf uses that avoids the whole problem by incrementally measuring string length and copying it to a real stdio buffer, which also makes ungetc safe, and this is the obvious *right* thing to do, but it might make strto* unacceptably slower. I haven't done any measurement. The other "mostly right" (modulo ungetc not being available then) approach would be getting rid of the whole current buffer design with start/end pointers and using indices instead. This would solve a lot of stupid gratuitous UB in stdio, like (T*)0-(T*)0 being undefined. It's not clear to me whether it would be more or less efficient. It would "break" glibc ABI-compat for stdio -- the original reason I used the pointer-based design -- but that could be fixed by putting "must-be-null" fields in place of the buffer pointers so that any glibc code using getc_unlocked/putc_unlocked macros would hit the "no buffer space" code path and call an actual function. In many ways that's desirable anyway. Probably the right next step here is measuring whether just using __string_read would make anything measurably slower. Rich