From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 29633 invoked from network); 11 Dec 2022 19:13:15 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 11 Dec 2022 19:13:15 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1670785995; b=kwTtEd89XMyxACfJVBd6HkVbiGEeRF6EiMMW1/howKDeg9gPIeNkkhd6sviGe1pmU3jJTIbPGV VjkHRO959WlstCWQ9QndpUTHEdYgN4ql6F6C9oT1M34iUp2kc/dfmIyUc0GhGgYHeY8+Bu/IJI nTFbvSwa9JUEb19mdF0c+aPA0Mg/x5H7RUjvfnumpHvqxLCNmEJ+tLtILestnMEDbQbipxIKyS 1fyK+mw87ntltRFFHailIuVcvkpJFyrz+LHeIttu6Lbq32L3vEsJX4TOEBaBDclkQXUTh0N5jd x9KwLMNFphdmf9XhfIQYkTeJb55EOwcvyqS3wx0aFQ37XQ==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (relay6-d.mail.gandi.net) smtp.remote-ip=217.70.183.198; dmarc=none header.from=chazelas.org; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1670785995; bh=WuN8xHiNVesHblBmJUw/oGSlwCxtWTn8jLRlP29ZDdc=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:In-Reply-To:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:DKIM-Signature; b=SXWvsd6OeAADZCB9TJyrNqoCMGHXv/5M2U3wrH+hXPhIy4+UAersXH/sfdc1RucZd+iU+GxYP+ 0T+zSTRNgVXxAj0kykzzSmvoRIexDkfdTfRYB/jy/e/243yZrKTtoTGC7GbDBxAykVILGqc2Yi 3b+GngAlH/8g7fpLBbR+X4+yo7BApaL6fUg4ZuxVMbYr+xlUqP47jIAq0hmsMk5YPrRFm9RGSQ H0mRqbEisXqoRYaG9DyRTGD97lZILLIRBHWMTwmxa8brdXXVNN7spIAUR/Im+7DAif3yW1NVx+ lwS27c48XtGdIg/706AQ5ZXqzLNHY0aDgvD8PTZWd3Waxw==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=A+TTYlF+qosGj0FKlhFwp2x87wPAA6j5j1LylH6n4B4=; b=RKgslet8KBgb8A221yQn2Fdt/J aYVMv0Im1QeZR2sLoCHIS0aoIvNCboipbBP309shn9RZSlhHn4wI9rLs+RqhrozAyULfCIR9XDguY o3dNCaXXOpwbQ7rM6/LiFsCS/kebV6DrESs0eCgi0hWqmT6b6fnwV2n1c+8PFrKsN3ZDIR0mDP/Kw YTyxVuLl9ZQpRakXxeXl3QKrHIopQRyDJmh3XJWYzzmC1jzLcFcPUli52QXXGcX3jlkcmT8mV3M7a mj4V4v7glxRfAlkSW+oAZehNldxjQXD7RikJCbYuymDEk2gVRBzgnLbkuo5uBH6x4xzm2AhitpMdP x/RUJHkA==; Received: by zero.zsh.org with local id 1p4RlB-000401-UV; Sun, 11 Dec 2022 19:13:14 +0000 Authentication-Results: zsh.org; iprev=pass (relay6-d.mail.gandi.net) smtp.remote-ip=217.70.183.198; dmarc=none header.from=chazelas.org; arc=none Received: from relay6-d.mail.gandi.net ([217.70.183.198]:40457) by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1p4Rky-0003j2-D0; Sun, 11 Dec 2022 19:13:01 +0000 Received: (Authenticated sender: stephane@chazelas.org) by mail.gandi.net (Postfix) with ESMTPSA id 70C0FC0004; Sun, 11 Dec 2022 19:12:58 +0000 (UTC) Date: Sun, 11 Dec 2022 19:12:57 +0000 From: Stephane Chazelas To: "Jun. T" Cc: zsh-workers@zsh.org Subject: Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f Message-ID: <20221211191257.xbzoprtr56qd5577@chazelas.org> Mail-Followup-To: "Jun. T" , zsh-workers@zsh.org References: <20221118142717.t4elzrigjeizjm6w@chazelas.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Seq: 51189 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: 2022-11-29 23:27:10 +0900, Jun. T: [...] > This not simple to solve. The basic question is: > What should we do if IFS contains invalid characters? [...] As also mentioned in the read -d $'\200' thread, in some other parts of the code, bytes that can't be decoded into characters in the locale's charmap are decoded as a 0xdc00 + byte_value. (see See workers/36411 workers/36415) So here, in a UTF-8 locale, IFS=$'\200' could be decoded as a 0xdc80 wchar_t and split on 0xdc80 wchar_t's only in the input (0x80 byte values not found to be part of a character). -- Stephane