From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/60183 Path: news.gmane.org!not-for-mail From: David Hansen Newsgroups: gmane.emacs.devel,gmane.emacs.gnus.general Subject: Broken XML, xml.el and nnrss.el Date: Thu, 21 Apr 2005 17:13:23 +0200 Organization: disorganized Message-ID: <87hdi0rwmk.fsf@robotron.ath.cx> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1114097051 17738 80.91.229.2 (21 Apr 2005 15:24:11 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 21 Apr 2005 15:24:11 +0000 (UTC) Cc: ding@gnus.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Apr 21 17:24:06 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DOdWZ-0002xl-Da for ged-emacs-devel@m.gmane.org; Thu, 21 Apr 2005 17:23:28 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DOdbD-0000BI-Me for ged-emacs-devel@m.gmane.org; Thu, 21 Apr 2005 11:28:15 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DOdaz-0000BA-Sh for emacs-devel@gnu.org; Thu, 21 Apr 2005 11:28:02 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DOdaz-0000Ay-8J for emacs-devel@gnu.org; Thu, 21 Apr 2005 11:28:01 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DOdaz-0008UZ-2F for emacs-devel@gnu.org; Thu, 21 Apr 2005 11:28:01 -0400 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtp (TLS-1.0:RSA_AES_128_CBC_SHA:16) (Exim 4.34) id 1DOda7-0003wQ-0M for emacs-devel@gnu.org; Thu, 21 Apr 2005 11:27:07 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1DOdTD-0002OQ-H9 for emacs-devel@gnu.org; Thu, 21 Apr 2005 17:19:59 +0200 Original-Received: from pd9e76d30.dip.t-dialin.net ([217.231.109.48]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 21 Apr 2005 17:19:59 +0200 Original-Received: from david.hansen by pd9e76d30.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 21 Apr 2005 17:19:59 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-To: emacs-devel@gnu.org Original-Lines: 119 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: pd9e76d30.dip.t-dialin.net User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/22.0.50 (gnu/linux) Cancel-Lock: sha1:J8pnuTpzxroh47GPpZuYrh7KpyM= X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:36233 gmane.emacs.gnus.general:60183 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:60183 Hello, I tried to read the RSS feed at http://xxx.pogogeil.de/podcast.php copy used to produce the backtrace at http://www.physik.fu-berlin.de/~dhansen/stuff/podcast.rss using gnus and nnrss. It is obviously broken XML as the umlauts are encoded using plain HTML enteties (e.g. ü instead of &uuml;). Still i find the behavior of the XML parser a bit weird: Debugger entered--Lisp error: (wrong-type-argument integerp nil) mapconcat(identity ((nil #("chte Andre Rieu die MIch m" 0 21 ... 21 26 ...)) #("bel gerade r" 0 12 (fontified nil)) nil) "") (concat (mapconcat (quote identity) (nreverse children) "") (substring string point)) (cond ((stringp children) (concat children ...)) ((stringp ...) (concat ... ...)) ((null children) string) (t (concat ... ...))) (let ((point 0) children end-point) (while (string-match "&\\([^;]*\\);" string point) (setq end-point ...) (let* ... ... ...)) (cond (... ...) (... ...) (... string) (t ...))) xml-substitute-special(#("Ich möchte Andre Rieu die Möbel gerade rücken" 0 60 (fontified nil))) (let* ((pos ...) (string ...)) (setq pos 0) (while (string-match " \n?" string pos) (setq string ...) (setq pos ...)) (xml-substitute-special string)) xml-parse-string() (let ((expansion ...)) (setq children (if ... ... ...))) (cond ((looking-at "") (progn (forward-char 2) (nreverse children)) (if (eq ... 62) (progn ... ...) (error "XML: (Well-Formed) Couldn't parse tag: %s" ...))) (let* ((node-name ...) (attrs ...) children pos) (when (consp xml-ns) (dolist ... ...)) (setq children (list attrs ...)) (if (looking-at "/>") (progn ... ...) (if ... ... ...))) (cond ((looking-at "<\\?") (search-forward "?>") (skip-syntax-forward " ") (xml-parse-tag parse-dtd xml-ns)) ((looking-at "") nil) ((looking-at "[:space:]]+\\)") (goto-char ...) (let* ... ... ... ...)) (t (unless xml-sub-parser ...) (xml-parse-string))) (let ((xml-validating-parser ...) (xml-ns ...)) (cond (... ... ... ...) (... ...) (... ...) (... ... nil) (... ...) (... ... ...) (t ... ...))) xml-parse-tag(nil nil) (let ((tag ...)) (when tag (push tag children))) (cond ((looking-at "") (progn (forward-char 2) (nreverse children)) (if (eq ... 62) (progn ... ...) (error "XML: (Well-Formed) Couldn't parse tag: %s" ...))) (let* ((node-name ...) (attrs ...) children pos) (when (consp xml-ns) (dolist ... ...)) (setq children (list attrs ...)) (if (looking-at "/>") (progn ... ...) (if ... ... ...))) (cond ((looking-at "<\\?") (search-forward "?>") (skip-syntax-forward " ") (xml-parse-tag parse-dtd xml-ns)) ((looking-at "") nil) ((looking-at "[:space:]]+\\)") (goto-char ...) (let* ... ... ... ...)) (t (unless xml-sub-parser ...) (xml-parse-string))) (let ((xml-validating-parser ...) (xml-ns ...)) (cond (... ... ... ...) (... ...) (... ...) (... ... nil) (... ...) (... ... ...) (t ... ...))) xml-parse-tag(nil nil) (let ((tag ...)) (when tag (push tag children))) (cond ((looking-at "") (progn (forward-char 2) (nreverse children)) (if (eq ... 62) (progn ... ...) (error "XML: (Well-Formed) Couldn't parse tag: %s" ...))) (let* ((node-name ...) (attrs ...) children pos) (when (consp xml-ns) (dolist ... ...)) (setq children (list attrs ...)) (if (looking-at "/>") (progn ... ...) (if ... ... ...))) (cond ((looking-at "<\\?") (search-forward "?>") (skip-syntax-forward " ") (xml-parse-tag parse-dtd xml-ns)) ((looking-at "") nil) ((looking-at "[:space:]]+\\)") (goto-char ...) (let* ... ... ... ...)) (t (unless xml-sub-parser ...) (xml-parse-string))) (let ((xml-validating-parser ...) (xml-ns ...)) (cond (... ... ... ...) (... ...) (... ...) (... ... nil) (... ...) (... ... ...) (t ... ...))) xml-parse-tag(nil nil) (let ((tag ...)) (when tag (push tag children))) (cond ((looking-at "") (progn (forward-char 2) (nreverse children)) (if (eq ... 62) (progn ... ...) (error "XML: (Well-Formed) Couldn't parse tag: %s" ...))) (let* ((node-name ...) (attrs ...) children pos) (when (consp xml-ns) (dolist ... ...)) (setq children (list attrs ...)) (if (looking-at "/>") (progn ... ...) (if ... ... ...))) (cond ((looking-at "<\\?") (search-forward "?>") (skip-syntax-forward " ") (xml-parse-tag parse-dtd xml-ns)) ((looking-at "") nil) ((looking-at "[:space:]]+\\)") (goto-char ...) (let* ... ... ... ...)) (t (unless xml-sub-parser ...) (xml-parse-string))) (let ((xml-validating-parser ...) (xml-ns ...)) (cond (... ... ... ...) (... ...) (... ...) (... ... nil) (... ...) (... ... ...) (t ... ...))) xml-parse-tag(nil nil) (cond ((looking-at "<\\?") (search-forward "?>") (skip-syntax-forward " ") (xml-parse-tag parse-dtd xml-ns)) ((looking-at "") nil) ((looking-at "[:space:]]+\\)") (goto-char ...) (let* ... ... ... ...)) (t (unless xml-sub-parser ...) (xml-parse-string))) (let ((xml-validating-parser ...) (xml-ns ...)) (cond (... ... ... ...) (... ...) (... ...) (... ... nil) (... ...) (... ... ...) (t ... ...))) xml-parse-tag(nil nil) (setq result (xml-parse-tag parse-dtd parse-ns)) (progn (forward-char -1) (setq result (xml-parse-tag parse-dtd parse-ns)) (if (and xml result ...) (error "XML: (Not Well-Formed) Only one root tag allowed") (cond ... ... ...))) (if (search-forward "<" nil t) (progn (forward-char -1) (setq result ...) (if ... ... ...)) (goto-char (point-max))) (while (not (eobp)) (if (search-forward "<" nil t) (progn ... ... ...) (goto-char ...))) (save-excursion (if buffer (set-buffer buffer)) (goto-char (point-min)) (while (not ...) (if ... ... ...)) (if parse-dtd (cons dtd ...) (nreverse xml))) (let ((case-fold-search nil) xml result dtd) (save-excursion (if buffer ...) (goto-char ...) (while ... ...) (if parse-dtd ... ...))) (progn (set-syntax-table (standard-syntax-table)) (let (... xml result dtd) (save-excursion ... ... ... ...))) (unwind-protect (progn (set-syntax-table ...) (let ... ...)) (save-current-buffer (set-buffer buffer) (set-syntax-table table))) (let ((table ...) (buffer ...)) (unwind-protect (progn ... ...) (save-current-buffer ... ...))) (with-syntax-table (standard-syntax-table) (let (... xml result dtd) (save-excursion ... ... ... ...))) (save-restriction (narrow-to-region beg end) (with-syntax-table (standard-syntax-table) (let ... ...))) xml-parse-region(1 48808) eval((xml-parse-region (point-min) (point-max))) eval-expression((xml-parse-region (point-min) (point-max)) nil) call-interactively(eval-expression) `xml-substitute-special' only errors if `xml-validating-parser' is non nil (line 748) and set's the expansion of an unknown entity to nil otherwise: (when xml-validating-parser (error "XML: (Validity) Undefined entity `%s'" this-part)) Due to this it builds a list of list in line 770: (setq children (list expansion prev-part children)) (should it be (append (list expansion prev-part) children) ?) which can't be handled by mapconcat. I'm not familiar with XML but i doubt that the current behavior is intended. I think it should either error when it detects the unknown entity (i think thats what the XML standard says, but seems the real world isn't that standard conform) or produce some (more or less useful) result. What about not expanding unknown entities at all? (if xml-validating-parser (error "XML: (Validity) Undefined entity `%s'" this-part) (concat "&" this-part ";")) David