Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* How to access HTML DOM/source of MIME part?
@ 2020-06-11 22:36 Tim Landscheidt
  2020-06-11 23:21 ` Eric Abrahamsen
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Landscheidt @ 2020-06-11 22:36 UTC (permalink / raw)
  To: info-gnus-english

Hi,

I am subscribed to several newsletters that are sent as
multipart/alternative with one part being text/html that
contains (inter alia) a list of links.  I want to write a
command to iterate over those links and prompt for each
whether to call browse-url on it.

Ideally, I would like to use the HTML DOM/source for
that.  How can I access that?

(I wouldn't mind examples of tighter integration with shr
(marking (some) links at parsing, iterating visually over
them), but for starters, parsing the DOM (again) would be
enough for me.)

TIA,
Tim


_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to access HTML DOM/source of MIME part?
  2020-06-11 22:36 How to access HTML DOM/source of MIME part? Tim Landscheidt
@ 2020-06-11 23:21 ` Eric Abrahamsen
  2020-06-12  0:04   ` Tim Landscheidt
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Abrahamsen @ 2020-06-11 23:21 UTC (permalink / raw)
  To: Tim Landscheidt; +Cc: info-gnus-english

Tim Landscheidt <tim@tim-landscheidt.de> writes:

> Hi,
>
> I am subscribed to several newsletters that are sent as
> multipart/alternative with one part being text/html that
> contains (inter alia) a list of links.  I want to write a
> command to iterate over those links and prompt for each
> whether to call browse-url on it.

This command (if I understand your requirements correctly) is already in
Gnus master, as `gnus-summary-browse-url'. Look for that or, if you're
running an older Emacs, check out here:

https://git.savannah.gnu.org/cgit/emacs.git/tree/lisp/gnus/gnus-sum.el#n9507

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to access HTML DOM/source of MIME part?
  2020-06-11 23:21 ` Eric Abrahamsen
@ 2020-06-12  0:04   ` Tim Landscheidt
  2020-06-16 15:34     ` Tim Landscheidt
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Landscheidt @ 2020-06-12  0:04 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: info-gnus-english

Eric Abrahamsen <eric@ericabrahamsen.net> wrote:

>> I am subscribed to several newsletters that are sent as
>> multipart/alternative with one part being text/html that
>> contains (inter alia) a list of links.  I want to write a
>> command to iterate over those links and prompt for each
>> whether to call browse-url on it.

> This command (if I understand your requirements correctly) is already in
> Gnus master, as `gnus-summary-browse-url'. Look for that or, if you're
> running an older Emacs, check out here:

> https://git.savannah.gnu.org/cgit/emacs.git/tree/lisp/gnus/gnus-sum.el#n9507

Thanks; AFAICT, my requirements cannot be met by that.

The newsletters I'm thinking about typically have links such
as:

| Header_Link_A
| Header_Link_B

| Item_1_Link_A   Item_1_Link_B   Item_1_Link_C

| Item_2_Link_A   Item_2_Link_B   Item_2_Link_C

| Item_3_Link_A   Item_3_Link_B   Item_3_Link_C

| […]

| Footer_Link_A
| Footer_Link_B
| Footer_Link_C

I want to iterate (only) over Item_1_Link_B, Item_2_Link_B,
Item_3_Link_B, etc.

*But* your pointer gave me the idea that I could iterate
over shr's buttons like gnus-collect-urls does, test if
their URLs match Item_x_Link_B's typical pattern and then
offer to browse them.  This would require that
Item_x_Link_B's pattern is (relatively) stable; I have to
check whether that will work reasonably well.  Thanks!

Tim

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to access HTML DOM/source of MIME part?
  2020-06-12  0:04   ` Tim Landscheidt
@ 2020-06-16 15:34     ` Tim Landscheidt
  2020-06-16 18:39       ` Eric Abrahamsen
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Landscheidt @ 2020-06-16 15:34 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: info-gnus-english

I wrote:

> […]

> *But* your pointer gave me the idea that I could iterate
> over shr's buttons like gnus-collect-urls does, test if
> their URLs match Item_x_Link_B's typical pattern and then
> offer to browse them.  This would require that
> Item_x_Link_B's pattern is (relatively) stable; I have to
> check whether that will work reasonably well.  Thanks!

It's not that simple :-(.  For starters, some of my newslet-
ters shorten all URLs, putting them into the same format.

But more importantly, with Emacs 26.3, (forward-button 1) in
an HTML mail will always move point to the beginning of the
*Article* buffer because (button-start button) returns
(point-min) for some reason.  (In my use case, I can proba-
bly work around that by calling next-button directly.)

Is this a bug?  What is the best way to create a minimal re-
producible example?

Tim

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to access HTML DOM/source of MIME part?
  2020-06-16 15:34     ` Tim Landscheidt
@ 2020-06-16 18:39       ` Eric Abrahamsen
  2020-06-16 19:31         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Abrahamsen @ 2020-06-16 18:39 UTC (permalink / raw)
  To: info-gnus-english

Tim Landscheidt <tim@tim-landscheidt.de> writes:

> I wrote:
>
>> […]
>
>> *But* your pointer gave me the idea that I could iterate
>> over shr's buttons like gnus-collect-urls does, test if
>> their URLs match Item_x_Link_B's typical pattern and then
>> offer to browse them.  This would require that
>> Item_x_Link_B's pattern is (relatively) stable; I have to
>> check whether that will work reasonably well.  Thanks!
>
> It's not that simple :-(.  For starters, some of my newslet-
> ters shorten all URLs, putting them into the same format.
>
> But more importantly, with Emacs 26.3, (forward-button 1) in
> an HTML mail will always move point to the beginning of the
> *Article* buffer because (button-start button) returns
> (point-min) for some reason.  (In my use case, I can proba-
> bly work around that by calling next-button directly.)
>
> Is this a bug?  What is the best way to create a minimal re-
> producible example?

I know it's not helpful, but quick testing with Emacs master (28) seems
to work fine. If I display the html part of an article, move point into
the article buffer, and run (forward-button 1), point moves correctly to
the first button.

Something to be aware of is that, sometime not too long ago, Lars
re-implemented links in article bodies using widgets instead of buttons.
TBH I don't really know what that means, or what the implications are,
but it's probably good to know.

Eric


_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to access HTML DOM/source of MIME part?
  2020-06-16 18:39       ` Eric Abrahamsen
@ 2020-06-16 19:31         ` Lars Ingebrigtsen
  2020-06-16 19:43           ` Eric Abrahamsen
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-06-16 19:31 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: info-gnus-english

Eric Abrahamsen <eric@ericabrahamsen.net> writes:

> Something to be aware of is that, sometime not too long ago, Lars
> re-implemented links in article bodies using widgets instead of buttons.

The other way around -- they used to be widgets, and they're now
buttons.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to access HTML DOM/source of MIME part?
  2020-06-16 19:31         ` Lars Ingebrigtsen
@ 2020-06-16 19:43           ` Eric Abrahamsen
  2020-06-18  2:50             ` Tim Landscheidt
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Abrahamsen @ 2020-06-16 19:43 UTC (permalink / raw)
  To: info-gnus-english

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> Something to be aware of is that, sometime not too long ago, Lars
>> re-implemented links in article bodies using widgets instead of buttons.
>
> The other way around -- they used to be widgets, and they're now
> buttons.

Sure enough, I didn't really know what I was talking about :) But at
least that points to a likely source of Tim's problem: he's trying to
use button commands in a version of Emacs/Gnus that is still using
widgets?


_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to access HTML DOM/source of MIME part?
  2020-06-16 19:43           ` Eric Abrahamsen
@ 2020-06-18  2:50             ` Tim Landscheidt
  2020-06-26  9:30               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Landscheidt @ 2020-06-18  2:50 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: info-gnus-english

Eric Abrahamsen <eric@ericabrahamsen.net> wrote:

>>> Something to be aware of is that, sometime not too long ago, Lars
>>> re-implemented links in article bodies using widgets instead of buttons.

>> The other way around -- they used to be widgets, and they're now
>> buttons.

> Sure enough, I didn't really know what I was talking about :) But at
> least that points to a likely source of Tim's problem: he's trying to
> use button commands in a version of Emacs/Gnus that is still using
> widgets?

Maybe :-).  Anyway, I found a solution for one of my news-
letters that states the number of entries in its subject and
then has two links per entry with one link containing one
information and the other link containing another informa-
tion and additional (older) entries following that:

| (let
|     ((subject (gnus-summary-article-subject)))
|   (if (string-match "^\\([0-9]+\\) new entries$" subject)
|       (let
|           ((number-of-entries-todo (string-to-number (match-string 1 subject))))
|         (gnus-with-article-buffer
|           (article-goto-body)
|           (let
|               ((article-body-start (point))
|                last-field1
|                last-url)
|             (while (> number-of-entries-todo 0)
|               (widget-forward 1)
|               (if (< (point) article-body-start)
|                   (error "Moved past the wrap!"))
|               (let
|                   ((url-at-point (button-get (button-at (point)) 'shr-url))
|                    (widget-label (let
|                                      ((widget-properties (cdr (widget-at))))
|                                    (buffer-substring-no-properties (plist-get widget-properties :from) (plist-get widget-properties :to)))))
|                 (when (string-match "^https://domain.com/some-prefix/" url-at-point)
|                   (if (not (string= last-url url-at-point))
|                       (setq last-field1 widget-label
|                             last-url url-at-point)
|                     (setq number-of-entries-todo (- number-of-entries-todo 1))
|                     (if (y-or-n-p (format "Browse %s (%s)? " widget-label last-field1))
|                         (browse-url url-at-point)))))))))))

Now:

a) The formula for widget-label feels way too complicated,
   but I did not find a predefined function for that pur-
   pose.  Did I miss something?

b) I use this code as part of gnus-select-article-hook.
   widget-forward does move point internally, but does not
   update/recenter the display.  Is this due to
   gnus-with-article-buffer?  What is the best way to make
   the *Article* buffer follow point's movement?

TIA,
Tim

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to access HTML DOM/source of MIME part?
  2020-06-18  2:50             ` Tim Landscheidt
@ 2020-06-26  9:30               ` Lars Ingebrigtsen
  0 siblings, 0 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-06-26  9:30 UTC (permalink / raw)
  To: Tim Landscheidt; +Cc: Eric Abrahamsen, info-gnus-english

Tim Landscheidt <tim@tim-landscheidt.de> writes:

> b) I use this code as part of gnus-select-article-hook.
>    widget-forward does move point internally, but does not
>    update/recenter the display.  Is this due to
>    gnus-with-article-buffer?  What is the best way to make
>    the *Article* buffer follow point's movement?

Yes, you should probably set the point with set-window-point or
something like that...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-26  9:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 22:36 How to access HTML DOM/source of MIME part? Tim Landscheidt
2020-06-11 23:21 ` Eric Abrahamsen
2020-06-12  0:04   ` Tim Landscheidt
2020-06-16 15:34     ` Tim Landscheidt
2020-06-16 18:39       ` Eric Abrahamsen
2020-06-16 19:31         ` Lars Ingebrigtsen
2020-06-16 19:43           ` Eric Abrahamsen
2020-06-18  2:50             ` Tim Landscheidt
2020-06-26  9:30               ` Lars Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).