From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from resqmta-ch2-04v.sys.comcast.net (resqmta-ch2-04v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:36]) by hurricane.the-brannons.com (Postfix) with ESMTPS id BA5EA79C53 for ; Fri, 30 Dec 2016 11:58:55 -0800 (PST) Received: from resomta-ch2-15v.sys.comcast.net ([69.252.207.111]) by resqmta-ch2-04v.sys.comcast.net with SMTP id N3K5cNjCnGIgtN3KHcyVIb; Fri, 30 Dec 2016 19:58:53 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20161114; t=1483127933; bh=cpeyyrePdp+zWK4lkLf262rduiIfoljITbmYFAPjdps=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=GnB8lRV8Cot2uFySekZ42N5vbEk8ULmhz/5JlGPOmU4idD+6F070WwfRaGXQtqBlP pR344NPeqXc7FAoWCkrTanEiajUBUzMGE7eplz5sOUjCnIOvcPvYqtKa1g73hdxZZv 5omxLpagWaDKiCxnkK+wIoqaZ+kmDkrmHZZvS5QNZc8tcVr+bAGqj4sjB1fxWYX6N5 n3BMptHXpYT9/fhUC9ceEiJR5ycmkD0QEgn8CFMAaeaO/M7g+3dyQnKBK1bKpdKDnh 57JpdUnSYYBcVZJO41PxsqYc+CK1Z96zXoXsVlaWca8gzfoz9lTMYMfxAjnksrslX3 cOSfNJOGIQWCg== Received: from unknown ([IPv6:2601:408:c301:784d:21e:4fff:fec2:a0f1]) by resomta-ch2-15v.sys.comcast.net with SMTP id N3KHcQ9f3oA3MN3KHcBxOW; Fri, 30 Dec 2016 19:58:53 +0000 To: Edbrowse-dev@lists.the-brannons.com From: Karl Dahlke Reply-to: Karl Dahlke References: <871swpw8p2.fsf@the-brannons.com> User-Agent: edbrowse/3.6.2+ Date: Fri, 30 Dec 2016 14:58:52 -0500 Message-ID: <20161130145852.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4wfGaoxzVdcbtMG288CpuV0S4r2g/sUArikQmgE4gRMzJsiqDxUxoOgTuXeABezcWTE7D7Y4hAnGUxVt8uN4JC1NGHg2J7o0FTULcvO3cFOUMvcPnEIEvr TsNDQHNWifGPzM/50X9j1xeNmu5DhQ3+bylSUb5YhmtioxBdaU4+v72N Subject: [Edbrowse-dev] bad byte in NASA's vendor.js X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.23 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Dec 2016 19:58:56 -0000 > I think the heart of the matter is that JS_EvaluateScript cannot cope > with non-ASCII things, even if encoded in UTF8. Yes it can, as shown in my previous example. utf8 chars can be in strings, or even regular expressions. This works. var x = "a b"; alert(x.replace(/ /, "-")); It prints out a-b. It doesn't seem to handle breakspace as whitespace however, and yet it should, and apparently does in other browsers. I don't think converting everything to unicode would change anything. It would still accept my code fragment above, and likely barf on breakspace elsewhere. As a really imperfect solution, I'm thinking about changing every breakspace to space, unless: it is in a line less 200 characters && the line does not start with // && it is not part of a string wholly contained in this line using a rather simple " " criterion. I'm trying to avoid writing a js scanner, which is really not a trivial thing, beyond what lex can handle, mostly because of those cursed regular expressions that aren't even quoted, just appear free, and we're suppose to recognize them as such, even though almost any character can appear between the slashes. I know about this nightmare because I parsed and ran my own javascript in edbrowse version 2. It's really best left to someone else! But then again, that someone else isn't handling breakspace properly. Karl Dahlke