Gnus development mailing list
 help / color / mirror / Atom feed
From: Eric Abrahamsen <eric@ericabrahamsen.net>
To: ding@gnus.org
Subject: Re: View docx/doc documents from Gnus in Docview
Date: Mon, 11 Sep 2023 10:01:15 -0700	[thread overview]
Message-ID: <877cowod9g.fsf@ericabrahamsen.net> (raw)
In-Reply-To: <87pm2p5sjc.fsf@ust.hk>

Andrew Cohen <acohen@ust.hk> writes:

> Sorry for not replying sooner (I am swamped with real work and have
> little time for other things); I have had this working for myself so I
> thought I can provide some advice.
>
> Firstly, telling gnus to use doc-view for these documents is easy: you
> need to modify the variable 'mailcap-user-mime-data (which controls user
> overrides for various mime types). Here is an example (this will use
> doc-view-mode for mime types of ms-excel and
> openxmlformats-officedocument.wordprocessingml.document, and use eww for
> html.) You should modify this for your own needs:
>
> (setq mailcap-user-mime-data
>     '(((viewer . doc-view-mode)
>        (test   . window-system)
>        (type . "application/vnd.ms-excel"))
>       ((viewer . doc-view-mode)
>        (test   . window-system)
>        (type . "application/vnd.openxmlformats-officedocument.wordprocessingml.document"))
>       ((viewer . eww)
>        (test   . (fboundp 'eww))
>        (type   . "text/html"))))
>
> But unfortunately this won't work properly due to a deficiency in
> doc-view. Doc-view has only a fairly primitive mechanism for figuring
> out the type of the document; since docx documents are mostly zip
> archives, and many other file formats are also zip archives, doc-view
> will notice they are zip files and treat them as epub (for me, at
> least).  The right way to fix this is to smarten up doc-view to
> correctly identify the file type. This isn't hard, but I don't have time
> to do it right now (maybe someone else is willing?).  In the meantime
> you can use the following hack which works for me: replace the function
> 'doc-view-set-doc-type with the modified version below.
>
> (defun doc-view-set-doc-type ()
>   "Figure out the current document type (`doc-view-doc-type')."
>   (let* ((buffer-file-name (or buffer-file-name (buffer-name (current-buffer))))
>          (name-types
> 	 (when buffer-file-name
> 	   (cdr (assoc-string
>                  (file-name-extension buffer-file-name)
>                  '(
>                    ;; DVI
>                    ("dvi" dvi)
>                    ;; PDF
>                    ("pdf" pdf) ("epdf" pdf)
>                    ;; EPUB
>                    ("epub" epub)
>                    ;; PostScript
>                    ("ps" ps) ("eps" ps)
>                    ;; DjVu
>                    ("djvu" djvu)
>                    ;; OpenDocument formats.
>                    ("odt" odf) ("ods" odf) ("odp" odf) ("odg" odf)
>                    ("odc" odf) ("odi" odf) ("odm" odf) ("ott" odf)
>                    ("ots" odf) ("otp" odf) ("otg" odf)
>                    ;; Microsoft Office formats (also handled by the odf
>                    ;; conversion chain).
>                    ("doc" odf) ("docx" odf) ("xls" odf) ("xlsx" odf)
>                    ("ppt" odf) ("pps" odf) ("pptx" odf) ("rtf" odf)
>                    ;; CBZ
>                    ("cbz" cbz)
>                    ;; FB2
>                    ("fb2" fb2)
>                    ;; (Open)XPS
>                    ("xps" xps) ("oxps" oxps))
> 		 t))))
> 	(content-types
> 	 (save-excursion
> 	   (goto-char (point-min))
> 	   (cond
> 	    ((looking-at "%!") '(ps))
> 	    ((looking-at "%PDF") '(pdf))
> 	    ((looking-at "\367\002") '(dvi))
> 	    ((looking-at "AT&TFORM") '(djvu))
>             ;; The following pattern actually is for recognizing
>             ;; zip-archives, so that this same association is used for
>             ;; cbz files. This is fine, as cbz files should be handled
>             ;; like epub anyway.
>             ((looking-at "PK") '(epub odf))))))
>     (setq-local
>      doc-view-doc-type
>      (car (or (nreverse (seq-intersection name-types content-types #'eq))
>               (when (and name-types content-types)
>                 (error "Conflicting types: name says %s but content says %s"
>                        name-types content-types))
>               name-types content-types
>               (error "Cannot determine the document type"))))))

Thanks for this!

The version of `doc-view-set-doc-type` in master looks almost exactly
like what you've posted here, with the exception of the let* for
(buffer-file-name (or buffer-file-name (buffer-name (current-buffer))))
at the top. Could someone have fixed it in the meantime?



  reply	other threads:[~2023-09-11 17:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08 16:18 Björn Bidar
2023-09-10 16:20 ` Eric Abrahamsen
2023-09-10 19:50   ` Björn Bidar
2023-09-10 21:43     ` Eric Abrahamsen
2023-09-11  2:53       ` Andrew Cohen
2023-09-11 17:01         ` Eric Abrahamsen [this message]
2023-09-12  0:37           ` Andrew Cohen
2023-09-12 17:59             ` Eric Abrahamsen
2023-09-11 19:01         ` Björn Bidar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877cowod9g.fsf@ericabrahamsen.net \
    --to=eric@ericabrahamsen.net \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).