From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/70580 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.gnus.general Subject: Re: Built-in HTML parsing and rendering library Date: Mon, 06 Sep 2010 08:09:35 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <878w3fj9ww.fsf@lifelogs.com> References: <87mxrvo5sk.fsf@rimspace.net> <87pqwrjc7b.fsf@lifelogs.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1283778602 15730 80.91.229.12 (6 Sep 2010 13:10:02 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 6 Sep 2010 13:10:02 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M18955@lists.math.uh.edu Mon Sep 06 15:09:59 2010 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OsbSQ-0000UU-AJ for ding-account@gmane.org; Mon, 06 Sep 2010 15:09:58 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1OsbSM-00021u-49; Mon, 06 Sep 2010 08:09:54 -0500 Original-Received: from mx2.math.uh.edu ([129.7.128.33]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1OsbSK-00021g-Sn for ding@lists.math.uh.edu; Mon, 06 Sep 2010 08:09:52 -0500 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx2.math.uh.edu with esmtp (Exim 4.72) (envelope-from ) id 1OsbSG-0006Uy-Ph for ding@lists.math.uh.edu; Mon, 06 Sep 2010 08:09:52 -0500 Original-Received: from lo.gmane.org ([80.91.229.12]) by quimby.gnus.org with esmtp (Exim 3.36 #1 (Debian)) id 1OsbSD-0002dI-00 for ; Mon, 06 Sep 2010 15:09:45 +0200 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OsbSC-0000OV-UV for ding@gnus.org; Mon, 06 Sep 2010 15:09:44 +0200 Original-Received: from c-24-14-16-248.hsd1.il.comcast.net ([24.14.16.248]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 06 Sep 2010 15:09:44 +0200 Original-Received: from tzz by c-24-14-16-248.hsd1.il.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 06 Sep 2010 15:09:44 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 41 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: c-24-14-16-248.hsd1.il.comcast.net X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6;d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:N1e+o3sj6gynbWmj0FNkz4nKFPk= X-Spam-Score: -1.9 (-) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:70580 Archived-At: On Mon, 06 Sep 2010 14:28:10 +0200 Lars Magne Ingebrigtsen wrote: LMI> Ted Zlatanov writes: >> It's what Gnome uses so it's pretty good. Because of the Gnome link it >> would probably be the easiest one to bring into the Emacs core. LMI> Yes. I had a quick peek at the interface, and it seemed to return a LMI> nice DOM that could probably be exported to Emacs pretty easily as an LMI> elisp list tree like (:html (:head ...) (:body ...)) etc. HTML and XML are SGML which is a crappy Lisp, so yeah :) Parsing them with libxml2 would improve many corners of Emacs. LMI> And since libxml2 is already installed on 99% of Linux machines, linking LMI> Emacs to it should be no big deal. Yes. The patch would be small. I don't know if the Emacs maintainers will have objections but it's kind of weird no one has proposed it yet. LMI> So the question is: If we have the parse tree in Emacs Lisp, would we be LMI> able to render it quickly enough for it to make sense to use? I haven't LMI> really thought about it much, but it strikes me that rendering heavily LMI> nested tables and the like might be a time-consuming task in a language LMI> that's as slow as Emacs Lisp. But it might be fine; I'm not sure at all. LMI> Is there a component of libxml2 (or some other handy library) that does LMI> HTML rendering, too? :-) These days Mozilla's Gecko is getting less popular. http://webkit.org/ is really popular and it's LGPL. I know it's been proposed for Emacs inclusion before and I think it's just been general laziness not to include it. IMO this is a really deep hole than is measured in man-years of work. HTML parsing is easy; rendering it is a nightmare compounded by years of legacy crap. So I am pessimistic this is a good use of your time. If the Emacs project took interest in this, there would be many more hackers and users available and it could happen. Or it could all devolve into endless arguments about keyboard stickers and DVCS supremacy. Ted