From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/74814 Path: news.gmane.org!not-for-mail From: Katsumi Yamaoka Newsgroups: gmane.emacs.gnus.general Subject: Re: numeric entities Date: Tue, 07 Dec 2010 14:06:20 +0900 Organization: Emacsen advocacy group Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1291698464 25456 80.91.229.12 (7 Dec 2010 05:07:44 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 7 Dec 2010 05:07:44 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M23170@lists.math.uh.edu Tue Dec 07 06:07:40 2010 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PPpm6-0007n9-Pd for ding-account@gmane.org; Tue, 07 Dec 2010 06:07:39 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1PPplW-0001DP-Ro; Mon, 06 Dec 2010 23:07:02 -0600 Original-Received: from mx1.math.uh.edu ([129.7.128.32]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1PPplV-0001DA-Js for ding@lists.math.uh.edu; Mon, 06 Dec 2010 23:07:01 -0600 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx1.math.uh.edu with esmtp (Exim 4.72) (envelope-from ) id 1PPplU-0001cZ-GE for ding@lists.math.uh.edu; Mon, 06 Dec 2010 23:07:01 -0600 Original-Received: from orlando.hostforweb.net ([216.246.45.90]) by quimby.gnus.org with esmtp (Exim 3.36 #1 (Debian)) id 1PPplS-0004WP-00 for ; Tue, 07 Dec 2010 06:06:58 +0100 Original-Received: from localhost ([127.0.0.1]:56715) by orlando.hostforweb.net with esmtpa (Exim 4.69) (envelope-from ) id 1PPpks-0001yw-BV for ding@gnus.org; Mon, 06 Dec 2010 23:06:22 -0600 X-Hashcash: 1:20:101207:ding@gnus.org::NkjkZ4J1s8FiDLrY:00002hh3 X-Face: #kKnN,xUnmKia.'[pp`;Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu;B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:93r0RoXK2Ifw+Pp6lVfXkzJTbrw= X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - orlando.hostforweb.net X-AntiAbuse: Original Domain - gnus.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: -1.9 (-) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:74814 Archived-At: Lars Magne Ingebrigtsen wrote: > Katsumi Yamaoka writes: >> When reading html articles, I sometimes see numeric entities like >> "›". Currently `shr' and `gnus-w3m' render it as "\233", but >> it should be "=E2=80=BA", i.e. U+8250. Here is a conversion table stolen >> from emacs-w3m (#155 is there as #x9B): >> >> (defvar mm-url-extra-numeric-entities > It's this mostly the same as `gnus-article-dumbquotes-map'? Looks > somewhat bigger, though. So perhaps that should be installed, and then > `article-treat-dumbquotes' could just use that map instead? `gnus-article-dumbquotes-map' uses only ASCII characters, so it seems still helpful to people who use an old terminal emulator. That is for normal text, not html, isn't it? So, if we make `article-treat-dumbquotes' do "\200;"->"=E2=82=AC" things, it may have to be for only environments that support such non-ASCII characters. OTOH, those who use `shr' or `gnus-w3m' will probably use a modern terminal or Emacs' display engine. >> I can implement it in mm-url.el, that is effective to `gnus-w3m', >> but I hesitate to use it in `mm-shr' before calling >> `libxml-parse-html-region'. WDYT? (IOW, isn't it better to make >> `libxml-parse-html-region' do it by itself? It's too much for me >> though.) > I think `libxml-parse-html-region' should just mainly parse what it's > given, for greater flexibility. But perhaps `mm-shr' and `gnus-w3m' > should just convert these automatically -- they never actually make much > sense. Done for `mm-shr' and `gnus-w3m'. If it slows Gnus, making `mm-extra-numeric-entities' a char-table may be better. Maybe so is `mm-url-html-entities'.