Gnus development mailing list
 help / color / mirror / Atom feed
* numeric entities
@ 2010-12-06  7:33 Katsumi Yamaoka
  2010-12-06 11:49 ` Julien Danjou
  2010-12-06 14:44 ` Lars Magne Ingebrigtsen
  0 siblings, 2 replies; 8+ messages in thread
From: Katsumi Yamaoka @ 2010-12-06  7:33 UTC (permalink / raw)
  To: ding

When reading html articles, I sometimes see numeric entities like
"›".  Currently `shr' and `gnus-w3m' render it as "\233", but
it should be "›", i.e. U+8250.  Here is a conversion table stolen
from emacs-w3m (#155 is there as #x9B):

(defvar mm-url-extra-numeric-entities
  (mapcar
   (lambda (item)
     (cons (car item) (mm-ucs-to-char (cdr item))))
   '((#x80 . #x20AC) (#x82 . #x201A) (#x83 . #x0192) (#x84 . #x201E)
     (#x85 . #x2026) (#x86 . #x2020) (#x87 . #x2021) (#x88 . #x02C6)
     (#x89 . #x2030) (#x8A . #x0160) (#x8B . #x2039) (#x8C . #x0152)
     (#x8E . #x017D) (#x91 . #x2018) (#x92 . #x2019) (#x93 . #x201C)
     (#x94 . #x201D) (#x95 . #x2022) (#x96 . #x2013) (#x97 . #x2014)
     (#x98 . #x02DC) (#x99 . #x2122) (#x9A . #x0161) (#x9B . #x203A)
     (#x9C . #x0153) (#x9E . #x017E) (#x9F . #x0178)))
  "*Alist of extra numeric entities and characters other than ISO 10646.")

I can implement it in mm-url.el, that is effective to `gnus-w3m',
but I hesitate to use it in `mm-shr' before calling
`libxml-parse-html-region'.  WDYT? (IOW, isn't it better to make
`libxml-parse-html-region' do it by itself?  It's too much for me
though.)



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-12-16 17:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-06  7:33 numeric entities Katsumi Yamaoka
2010-12-06 11:49 ` Julien Danjou
2010-12-06 18:35   ` Andreas Schwab
2010-12-07  0:06     ` Katsumi Yamaoka
2010-12-07  9:28       ` Julien Danjou
2010-12-06 14:44 ` Lars Magne Ingebrigtsen
2010-12-07  5:06   ` Katsumi Yamaoka
2010-12-16 17:42     ` Lars Magne Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).