From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out.smtp-auth.no-ip.com (out.smtp-auth.no-ip.com [8.23.224.60]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 0A89F77DF6 for ; Tue, 12 Jan 2016 22:01:01 -0800 (PST) X-No-IP: carhart.net@noip-smtp X-Report-Spam-To: abuse@no-ip.com Received: from carhart.net (unknown [99.52.200.227]) (Authenticated sender: carhart.net@noip-smtp) by smtp-auth.no-ip.com (Postfix) with ESMTPA id 94978400A11 for ; Tue, 12 Jan 2016 22:01:56 -0800 (PST) Received: from carhart.net (localhost [127.0.0.1]) by carhart.net (8.13.8/8.13.8) with ESMTP id u0D61tuR022901 for ; Tue, 12 Jan 2016 22:01:55 -0800 Received: from localhost (kevin@localhost) by carhart.net (8.13.8/8.13.8/Submit) with ESMTP id u0D61tbV022898 for ; Tue, 12 Jan 2016 22:01:55 -0800 Date: Tue, 12 Jan 2016 22:01:55 -0800 (PST) From: Kevin Carhart To: Edbrowse-dev@lists.the-brannons.com Message-ID: User-Agent: Alpine 2.03 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Subject: [Edbrowse-dev] regex criteria interpreted as literals X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2016 06:01:01 -0000 I was trying to dig into this problem where Sebastian from the commandline list was trying to read google groups with edbrowse. There may be a few things going on with google groups, but one of them that I could isolate as a short example is that they make use of the inline regular expression style as follows: And the routine fails because the expression criteria is taken as a literal, so the error is then "SyntaxError: unterminated regular expressionliteral" I know this is very similar to the string contents interpreted as literals problems from months back, which is now fixed, right? Maybe this one is harder to deal with because it isn't delimited by quotes? It gets ambiguous to know what /document.writeln('Subject: ');<" + "/script>"); Note, I made sure my tidy was up to date before trying this. When I say: tidy -v I get HTML Tidy for Linux version 5.1.33 Any idea what can be done here? thanks Kevin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out.smtp-auth.no-ip.com (out.smtp-auth.no-ip.com [8.23.224.60]) by hurricane.the-brannons.com (Postfix) with ESMTPS id A277E77D0D for ; Tue, 12 Jan 2016 22:12:47 -0800 (PST) X-No-IP: carhart.net@noip-smtp X-Report-Spam-To: abuse@no-ip.com Received: from carhart.net (unknown [99.52.200.227]) (Authenticated sender: carhart.net@noip-smtp) by smtp-auth.no-ip.com (Postfix) with ESMTPA id E2FF6400A81 for ; Tue, 12 Jan 2016 22:13:43 -0800 (PST) Received: from carhart.net (localhost [127.0.0.1]) by carhart.net (8.13.8/8.13.8) with ESMTP id u0D6Dhhn028570 for ; Tue, 12 Jan 2016 22:13:43 -0800 Received: from localhost (kevin@localhost) by carhart.net (8.13.8/8.13.8/Submit) with ESMTP id u0D6Dh0v028567 for ; Tue, 12 Jan 2016 22:13:43 -0800 Date: Tue, 12 Jan 2016 22:13:42 -0800 (PST) From: Kevin Carhart To: Edbrowse-dev@lists.the-brannons.com In-Reply-To: Message-ID: References: User-Agent: Alpine 2.03 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: [Edbrowse-dev] fixing my semantics a little X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2016 06:12:47 -0000 Sorry, I think my use of "literal" is backwards but I hope you can tell what I meant from context. It's this whole cluster of questions around an actual token with a formal meaning, versus that thing appearing as part of a string. Or in this case, something potentially with a formal meaning like <, only it isn't a piece of an HTML tag, it's expression criteria intended to be matched, delimited not by quotes but by slashes. And the parser may not have enough information to differentiate between the situations. On Tue, 12 Jan 2016, Kevin Carhart wrote: > > I was trying to dig into this problem where Sebastian from the commandline > list was trying to read google groups with edbrowse. > > There may be a few things going on with google groups, but one of them that > I could isolate as a short example is that they make use of the inline > regular expression style as follows: > > > > And the routine fails because the expression criteria is taken as a literal, > so the error is then "SyntaxError: unterminated regular expressionliteral" > > I know this is very similar to the string contents interpreted as literals > problems from months back, which is now fixed, right? Maybe this one is > harder to deal with because it isn't delimited by quotes? It gets ambiguous > to know what / Or should this work? > Or is it slipping my mind and we talked about the regex syntax back when we > talked about things like > document.writeln(", and you don't need to escape within that? (More of a tidy question but not too far afield I dont think..) thanks Kevin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (unknown [IPv6:2602:43:5b6:8a00::1]) by hurricane.the-brannons.com (Postfix) with ESMTPSA id EE68777AF8; Fri, 15 Jan 2016 13:37:29 -0800 (PST) From: Chris Brannon To: Kevin Carhart Cc: Edbrowse-dev@lists.the-brannons.com References: <87si2190hb.fsf@mushroom.localdomain> Date: Fri, 15 Jan 2016 13:38:31 -0800 In-Reply-To: (Kevin Carhart's message of "Wed, 13 Jan 2016 18:28:54 -0800 (PST)") Message-ID: <87fuxy5xrc.fsf_-_@mushroom.localdomain> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Subject: [Edbrowse-dev] Tidy and various tags was Re: regex criteria interpreted as literals X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jan 2016 21:37:30 -0000 Kevin Carhart writes: > Thanks Chris! Do you think I ought to file this in the requests > tracker for Tidy, also a question for Geoff if you're around? Hi Kevin, Yeah I'd say so. > If you could share the logic with me of how the parser will > disambiguate this, if you know, I'm interested. I think the consensus from the edbrowse side is that if you're in