Gnus development mailing list
 help / color / mirror / Atom feed
From: Andrew Cohen <acohen@ust.hk>
To: ding@gnus.org
Subject: Re: View docx/doc documents from Gnus in Docview
Date: Mon, 11 Sep 2023 10:53:59 +0800	[thread overview]
Message-ID: <87pm2p5sjc.fsf@ust.hk> (raw)
In-Reply-To: <87jzsxogb0.fsf@ericabrahamsen.net>

Sorry for not replying sooner (I am swamped with real work and have
little time for other things); I have had this working for myself so I
thought I can provide some advice.

Firstly, telling gnus to use doc-view for these documents is easy: you
need to modify the variable 'mailcap-user-mime-data (which controls user
overrides for various mime types). Here is an example (this will use
doc-view-mode for mime types of ms-excel and
openxmlformats-officedocument.wordprocessingml.document, and use eww for
html.) You should modify this for your own needs:

(setq mailcap-user-mime-data
    '(((viewer . doc-view-mode)
       (test   . window-system)
       (type . "application/vnd.ms-excel"))
      ((viewer . doc-view-mode)
       (test   . window-system)
       (type . "application/vnd.openxmlformats-officedocument.wordprocessingml.document"))
      ((viewer . eww)
       (test   . (fboundp 'eww))
       (type   . "text/html"))))

But unfortunately this won't work properly due to a deficiency in
doc-view. Doc-view has only a fairly primitive mechanism for figuring
out the type of the document; since docx documents are mostly zip
archives, and many other file formats are also zip archives, doc-view
will notice they are zip files and treat them as epub (for me, at
least).  The right way to fix this is to smarten up doc-view to
correctly identify the file type. This isn't hard, but I don't have time
to do it right now (maybe someone else is willing?).  In the meantime
you can use the following hack which works for me: replace the function
'doc-view-set-doc-type with the modified version below.

(defun doc-view-set-doc-type ()
  "Figure out the current document type (`doc-view-doc-type')."
  (let* ((buffer-file-name (or buffer-file-name (buffer-name (current-buffer))))
         (name-types
	 (when buffer-file-name
	   (cdr (assoc-string
                 (file-name-extension buffer-file-name)
                 '(
                   ;; DVI
                   ("dvi" dvi)
                   ;; PDF
                   ("pdf" pdf) ("epdf" pdf)
                   ;; EPUB
                   ("epub" epub)
                   ;; PostScript
                   ("ps" ps) ("eps" ps)
                   ;; DjVu
                   ("djvu" djvu)
                   ;; OpenDocument formats.
                   ("odt" odf) ("ods" odf) ("odp" odf) ("odg" odf)
                   ("odc" odf) ("odi" odf) ("odm" odf) ("ott" odf)
                   ("ots" odf) ("otp" odf) ("otg" odf)
                   ;; Microsoft Office formats (also handled by the odf
                   ;; conversion chain).
                   ("doc" odf) ("docx" odf) ("xls" odf) ("xlsx" odf)
                   ("ppt" odf) ("pps" odf) ("pptx" odf) ("rtf" odf)
                   ;; CBZ
                   ("cbz" cbz)
                   ;; FB2
                   ("fb2" fb2)
                   ;; (Open)XPS
                   ("xps" xps) ("oxps" oxps))
		 t))))
	(content-types
	 (save-excursion
	   (goto-char (point-min))
	   (cond
	    ((looking-at "%!") '(ps))
	    ((looking-at "%PDF") '(pdf))
	    ((looking-at "\367\002") '(dvi))
	    ((looking-at "AT&TFORM") '(djvu))
            ;; The following pattern actually is for recognizing
            ;; zip-archives, so that this same association is used for
            ;; cbz files. This is fine, as cbz files should be handled
            ;; like epub anyway.
            ((looking-at "PK") '(epub odf))))))
    (setq-local
     doc-view-doc-type
     (car (or (nreverse (seq-intersection name-types content-types #'eq))
              (when (and name-types content-types)
                (error "Conflicting types: name says %s but content says %s"
                       name-types content-types))
              name-types content-types
              (error "Cannot determine the document type"))))))


-- 
Andrew Cohen



  reply	other threads:[~2023-09-11  2:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08 16:18 Björn Bidar
2023-09-10 16:20 ` Eric Abrahamsen
2023-09-10 19:50   ` Björn Bidar
2023-09-10 21:43     ` Eric Abrahamsen
2023-09-11  2:53       ` Andrew Cohen [this message]
2023-09-11 17:01         ` Eric Abrahamsen
2023-09-12  0:37           ` Andrew Cohen
2023-09-12 17:59             ` Eric Abrahamsen
2023-09-11 19:01         ` Björn Bidar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pm2p5sjc.fsf@ust.hk \
    --to=acohen@ust.hk \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).