From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/73102 Path: news.gmane.org!not-for-mail From: Katsumi Yamaoka Newsgroups: gmane.emacs.gnus.general Subject: Re: shr.el: folding Japanese text Date: Thu, 14 Oct 2010 17:16:25 +0900 Organization: Emacsen advocacy group Message-ID: References: <8762xcvbsp.fsf@lifelogs.com> <87eibxkxa3.fsf@anar.kanru.info> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: dough.gmane.org 1287044297 32140 80.91.229.12 (14 Oct 2010 08:18:17 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 14 Oct 2010 08:18:17 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M21474@lists.math.uh.edu Thu Oct 14 10:18:16 2010 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1P6J0y-0004xd-C4 for ding-account@gmane.org; Thu, 14 Oct 2010 10:18:16 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1P6Izt-0006Ka-KB; Thu, 14 Oct 2010 03:17:09 -0500 Original-Received: from mx1.math.uh.edu ([129.7.128.32]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1P6Izr-0006KL-KQ for ding@lists.math.uh.edu; Thu, 14 Oct 2010 03:17:07 -0500 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx1.math.uh.edu with esmtp (Exim 4.72) (envelope-from ) id 1P6Izq-0006gq-0s for ding@lists.math.uh.edu; Thu, 14 Oct 2010 03:17:07 -0500 Original-Received: from orlando.hostforweb.net ([216.246.45.90]) by quimby.gnus.org with esmtp (Exim 3.36 #1 (Debian)) id 1P6Izp-0005D6-00 for ; Thu, 14 Oct 2010 10:17:05 +0200 Original-Received: from localhost ([127.0.0.1]:42341) by orlando.hostforweb.net with esmtpa (Exim 4.69) (envelope-from ) id 1P6IzG-00028S-SI for ding@gnus.org; Thu, 14 Oct 2010 03:16:31 -0500 X-Hashcash: 1:20:101014:ding@gnus.org::Q2G2e9QQqVxDCF94:00002FtZ X-Face: #kKnN,xUnmKia.'[pp`;Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu;B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:LtdyYkEps7TFE5YmD4ayDTi6jD0= X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - orlando.hostforweb.net X-AntiAbuse: Original Domain - gnus.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: -1.9 (-) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:73102 Archived-At: --=-=-= Lars Magne Ingebrigtsen wrote: >> Yes, I think that should work, but I think using >> `fill-find-break-point-function-table' instead of "[^\000-\377]" and >> stuff will probably give you better results when mixing, say, Japanese >> and Russian. I see. It's much smarter. > I've now tweaked the filling algorithm slightly, and it seems to do the > right thing in the Han Han group (with Chinese text), but I haven't > tested it in any groups with mixed European/Japanese text. What's a > good test group? Try gwene groups of which the group names contain ".jp.". For instance: gwene.jp.gr.gentoo.gentoojp-news If you don't think very troublesome, I'd like to recommend nnshimbun.el. It makes html articles from contents obtained from web sites. You don't have to alter `mm-text-html-renderer'. My recommendation is: M-x gnus-group-make-shimbun-group RET asahi RET rss RET Cf. (info "(emacs-w3m)Gnus"), (info "(emacs-w3m)Nnshimbun") > Also, what font(s) should I install on Debian to get a display that has > the traditional kanji-are-twice-as-wide-as-non-kanji buffer? I believe there should be such fonts, if you've installed all the fonts Debian distributes. ;-) I use Fedora and have all the fonts installed. My favorites are: -*-fixed-medium-r-normal-*-16-*-*-*-*-*-iso8859-1 -*-fixed-medium-r-normal-*-16-*-*-*-*-*-jisx0208.1983-0 BTW, the present shr.el code deletes CJK characters that are at the end of lines, inserts useless SPC between wide characters, and doesn't seem to do kinsoku. So, I tried improving them. A patch follows. I think it is near completion. WDYT? Though I suspect there are wrong assignments for some Chinese characters in the kinsoku configuration. I'll ask Handa-san later. --=-=-= Content-Type: application/x-gzip Content-Disposition: attachment; filename=shr.el.patch.gz Content-Transfer-Encoding: base64 H4sICN67tkwCA3Noci5lbC5wYXRjaACtVU1v2zAMPSe/gsglMhIVcfq5du1y2Hmn3ZYdFJuOhbiS K8vLtsN++0jJceN061BgRpDAFEk9Pj4yUkpoSneG1a/RcpEuJH/OYbm8Tc9vL9/BbEHPeDabdV7P ThewuGGni+vOabUCuUxv5ukVzPh3eQ2r1RhGd3fQeOV8c0YvojUVNg1nk8FKNgDRoH96toGorTY+ SZLxjMM/WWhqlSHoBgxijjlssLAOwTpQhUf6ho1DtVObCiErlVMZW62LCZQHXyIFbbUx2mzBFhRR aYNn5CD2JRoQyuQg8Ikud5hhTm6SMyXwARgHwQRB94l7EBwp+2yyto322poERCoH4DmIH6EcFlDo qpKFNrkMYGVwlEVrMo6WntFzUIwSfLuM5Yk+LSz/Q+YQgxU+wiKJ6ciYk8FjqBlkSlbqljYNUj/Y NbzvS00EiwcC1zqHdEdmq/aRKufm7XXuyyR2VBcgjKVW8kFAFjB23ASfAKW7YbI2k2Qsg3WAJD34 Dj07W9166fG7l7WzNZ3+GHSg52zKIAIz4PtbYtcfAnLCR8UoJoso6Twggo9nSR+HVIstJGvgwB3A n1X6uu5iVUUUXWGryu5fiO6Ei0OvXsHe5R0gDzED3EfNncCE37uKu/kUjXcM5lH5rITJF1j7tfm6 Xk8nwHyH+FPBBCmFLXCxmKcL2gKX6Xx5FbfA6K1qhajW02kM0Ecg3t+/pkFuVlwqhW15rvuZlF2d G5Xt9srlPbPxhCc8hHCCYaeZxuOcgdi/ZOsPYqM2tqq7Xfa5RIdTUog92lgB3RwaC3uErf5GGvLQ 1mE5HYOIyyqM4JvGPmI5ZTLK4iURwzKnhaIL8+RQrcgsE8qiPfAQNUfFfbRm6oGGEnYkLrtrJVX+ vI+bfw9E2LBoN/VByGw7JKus2aI7Pgkb5q0b8HTWkuOM9LcyYKNjaVCq8B3Yk96QyegqSAXGvMbo +iBKNE3rUBrcsyuIMG79qP1EWl4v1JyMfwOzDQKFnAcAAA== --=-=-=--