9front - general discussion about 9front
 help / color / mirror / Atom feed
* [9front] htmlfs
@ 2021-08-29 21:05 Philip Silva
  2021-08-30  4:12 ` Amavect
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Philip Silva @ 2021-08-29 21:05 UTC (permalink / raw)
  To: 9front

Speaking of Go and gumbo, I was also wondering if there is some sort of htmlfs. In at lest one forum it was mentioned there would be an xmlfs somewhere in contrib but couldn't find it. I'd be most curious about how it's structured also given that tags must be ordered. Maybe /mnt/.../html/body/001-div/... or .../body/001/div/...? I'm not sure if that would be a nice solution. For the opossum browser I eventually wrote an rpc that gets a css selector as input and returns jsons of html nodes. (Not really something for a reusable api I guess, but enough to try some stuff :) Although I'd much prefer using a common/clean interface/fs)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-29 21:05 [9front] htmlfs Philip Silva
@ 2021-08-30  4:12 ` Amavect
  2021-08-31  5:33   ` unobe
  2021-08-31 11:49   ` hiro
  2021-08-31  5:23 ` unobe
  2021-08-31 13:20 ` Pavel Renev
  2 siblings, 2 replies; 18+ messages in thread
From: Amavect @ 2021-08-30  4:12 UTC (permalink / raw)
  To: 9front

Not everything needs to be a file system.
A program still needs to deserialize and load structs.
A 9p fs just doesn't do that.

I think you just want a programming library.

Thanks,
Amavect

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-29 21:05 [9front] htmlfs Philip Silva
  2021-08-30  4:12 ` Amavect
@ 2021-08-31  5:23 ` unobe
  2021-08-31 13:20 ` Pavel Renev
  2 siblings, 0 replies; 18+ messages in thread
From: unobe @ 2021-08-31  5:23 UTC (permalink / raw)
  To: 9front

Quoth Philip Silva <philip.silva@protonmail.com>:
> Speaking of Go and gumbo, I was also wondering if there is some sort of htmlfs. In at lest one forum it was mentioned there would be an xmlfs somewhere in contrib but couldn't find it. I'd be most curious about how it's structured also given that tags must be ordered. Maybe /mnt/.../html/body/001-div/... or .../body/001/div/...? I'm not sure if that would be a nice solution. For the opossum browser I eventually wrote an rpc that gets a css selector as input and returns jsons of html nodes. (Not really something for a reusable api I guess, but enough to try some stuff :) Although I'd much prefer using a common/clean interface/fs)

Is this what you're looking for?

cpu% 9fs sources
post...
cpu% ls -l /n/sources/contrib/steve/libxml*
--rw-r--r-- M 505 bootes sys 10178 Jun  7  2017 /n/sources/contrib/steve/libxml.tbz
cpu%


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-30  4:12 ` Amavect
@ 2021-08-31  5:33   ` unobe
  2021-08-31  9:46     ` jstsmthrgk
  2021-08-31 20:09     ` hiro
  2021-08-31 11:49   ` hiro
  1 sibling, 2 replies; 18+ messages in thread
From: unobe @ 2021-08-31  5:33 UTC (permalink / raw)
  To: 9front

Quoth Amavect <amavect@gmail.com>:
> Not everything needs to be a file system.
> A program still needs to deserialize and load structs.
> A 9p fs just doesn't do that.

That's true, there are benefits to a programming library (namely,
performance).  But doesn't a file system that presents a consistent
interface allow for a choice of programming language and for the
ability to abstract further?  For instance, having xmlfs (if such a
thing existed) would allow for rc programs to do some simple tasks
that need to muck with xml.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-31  5:33   ` unobe
@ 2021-08-31  9:46     ` jstsmthrgk
  2021-08-31 10:36       ` hiro
  2021-08-31 20:09     ` hiro
  1 sibling, 1 reply; 18+ messages in thread
From: jstsmthrgk @ 2021-08-31  9:46 UTC (permalink / raw)
  To: 9front

There exists a piece of software (for linux) with some interesting ideas regarding xml scriptability: https://manpages.debian.org/unstable/xml2/html2.1.en.html


Am 31. August 2021 07:33:25 MESZ schrieb unobe@cpan.org:
>Quoth Amavect <amavect@gmail.com>:
>> Not everything needs to be a file system.
>> A program still needs to deserialize and load structs.
>> A 9p fs just doesn't do that.
>
>That's true, there are benefits to a programming library (namely,
>performance).  But doesn't a file system that presents a consistent
>interface allow for a choice of programming language and for the
>ability to abstract further?  For instance, having xmlfs (if such a
>thing existed) would allow for rc programs to do some simple tasks
>that need to muck with xml.
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-31  9:46     ` jstsmthrgk
@ 2021-08-31 10:36       ` hiro
  2021-08-31 12:29         ` Steve Simon
  0 siblings, 1 reply; 18+ messages in thread
From: hiro @ 2021-08-31 10:36 UTC (permalink / raw)
  To: 9front

i can confirm, but the man page doesn't represent the ideas.

if you end up using this, you will see that for each element one line
of text is printed, including the full path/name of the value.

it feels a little bit like the output of grep -r, just with xml
hierarchy instead of file paths.

i always thought it would be neat to have a real fs instead, allowing
globbing instead of grep to read a specific element.

example:

$ html2 < poettering-Walkthrough\ for\ Portable\ Services.html
2>/dev/null | grep head/title
/html/head/title= Walkthrough for Portable Services

would be cool to instead run:
; htmlfs poettering*html
mounted at /n/htmlfs
; cat /n/htmlfs/*/*/title
Walkthrough for Portable Services
;

cool.
but not necessary for my use case.

A file is useful for separating binary data and strings that contain
newlines, but it just so happens that inside html you can often ignore
newlines, which means that practically the value of any entity should
fit quite well on a line of text as html2/xml2 output it.

On 8/31/21, jstsmthrgk <jstsmthrgk@jstsmthrgk.eu> wrote:
> There exists a piece of software (for linux) with some interesting ideas
> regarding xml scriptability:
> https://manpages.debian.org/unstable/xml2/html2.1.en.html
>
>
> Am 31. August 2021 07:33:25 MESZ schrieb unobe@cpan.org:
>>Quoth Amavect <amavect@gmail.com>:
>>> Not everything needs to be a file system.
>>> A program still needs to deserialize and load structs.
>>> A 9p fs just doesn't do that.
>>
>>That's true, there are benefits to a programming library (namely,
>>performance).  But doesn't a file system that presents a consistent
>>interface allow for a choice of programming language and for the
>>ability to abstract further?  For instance, having xmlfs (if such a
>>thing existed) would allow for rc programs to do some simple tasks
>>that need to muck with xml.
>>
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-30  4:12 ` Amavect
  2021-08-31  5:33   ` unobe
@ 2021-08-31 11:49   ` hiro
  2021-08-31 15:42     ` Philip Silva
  1 sibling, 1 reply; 18+ messages in thread
From: hiro @ 2021-08-31 11:49 UTC (permalink / raw)
  To: 9front

> Not everything needs to be a file system.
> A program still needs to deserialize and load structs.

not everything needs to use structs.
esp. if everything is a string.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-31 10:36       ` hiro
@ 2021-08-31 12:29         ` Steve Simon
  2021-09-01  6:53           ` hiro
  0 siblings, 1 reply; 18+ messages in thread
From: Steve Simon @ 2021-08-31 12:29 UTC (permalink / raw)
  To: 9front

fyi

there is a companion to libxml referenced earlier, called something like xmlcmds.tbz.

This includes xb (xml beautifier/indentet) and xml2 which works similarly to the linux command.

-Steve


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-29 21:05 [9front] htmlfs Philip Silva
  2021-08-30  4:12 ` Amavect
  2021-08-31  5:23 ` unobe
@ 2021-08-31 13:20 ` Pavel Renev
  2021-08-31 15:40   ` Philip Silva
  2021-09-02 11:44   ` hiro
  2 siblings, 2 replies; 18+ messages in thread
From: Pavel Renev @ 2021-08-31 13:20 UTC (permalink / raw)
  To: 9front

I have a half-backed DOMfs:
http://git.nsmpr.xyz/domfs/files.html
but it just represents documents as a flat list of numbered nodes (the way rio serves its windows) and their hierarchy is provided through a separate file.

The challenge with xml/html is that unlike traditional file trees their elements do not have unique names and instead addressed by their order. Additionaly, element's attributes often play bigger role than text data they contain.
Style also can override tree hierarchy when it comes to rendering, and when it comes to javascript, programs look up needed elements via global search by id and usually only care about element's immediate parent/children.

TL;DR: the tree is a lie. 
Maybe serving html via some kind of database query interface would be better.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-31 13:20 ` Pavel Renev
@ 2021-08-31 15:40   ` Philip Silva
  2021-09-02 11:44   ` hiro
  1 sibling, 0 replies; 18+ messages in thread
From: Philip Silva @ 2021-08-31 15:40 UTC (permalink / raw)
  To: 9front

Cool thanks for sharing! Yes I was thinking of use cases like connecting a separate JS process for dom manipulation or an rc script. For automation a numbered tree is oftentimes probably what is actually needed...

> I have a half-backed DOMfs:
>
> http://git.nsmpr.xyz/domfs/files.html
>
> but it just represents documents as a flat list of numbered nodes (the way rio serves its windows) and their hierarchy is provided through a separate file.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-31 11:49   ` hiro
@ 2021-08-31 15:42     ` Philip Silva
  0 siblings, 0 replies; 18+ messages in thread
From: Philip Silva @ 2021-08-31 15:42 UTC (permalink / raw)
  To: 9front

True, plain text with files for the fields is much better.

> > Not everything needs to be a file system.
>
> > A program still needs to deserialize and load structs.
>
> not everything needs to use structs.
>
> esp. if everything is a string.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-31  5:33   ` unobe
  2021-08-31  9:46     ` jstsmthrgk
@ 2021-08-31 20:09     ` hiro
  2021-08-31 22:40       ` Stuart Morrow
  1 sibling, 1 reply; 18+ messages in thread
From: hiro @ 2021-08-31 20:09 UTC (permalink / raw)
  To: 9front

indeed. both a filesystem or a piping process like discussed in
previous emails can be used easily from rc

On 8/31/21, unobe@cpan.org <unobe@cpan.org> wrote:
> Quoth Amavect <amavect@gmail.com>:
>> Not everything needs to be a file system.
>> A program still needs to deserialize and load structs.
>> A 9p fs just doesn't do that.
>
> That's true, there are benefits to a programming library (namely,
> performance).  But doesn't a file system that presents a consistent
> interface allow for a choice of programming language and for the
> ability to abstract further?  For instance, having xmlfs (if such a
> thing existed) would allow for rc programs to do some simple tasks
> that need to muck with xml.
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-31 20:09     ` hiro
@ 2021-08-31 22:40       ` Stuart Morrow
  0 siblings, 0 replies; 18+ messages in thread
From: Stuart Morrow @ 2021-08-31 22:40 UTC (permalink / raw)
  To: 9front

I don't know what you're trying to do, but I'd rather have edbrowse or
webscript[1] than something which requires me to know the inner
workings of the web page.

The browser that rpeppe is looking for in the comments section of [1]
is LAPIS[2].

[1] https://research.swtch.com/webscript
[2] http://www.cs.cmu.edu/~rcm/papers/usenix00/usenix00.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-31 12:29         ` Steve Simon
@ 2021-09-01  6:53           ` hiro
  2021-09-01  8:30             ` sirjofri
  0 siblings, 1 reply; 18+ messages in thread
From: hiro @ 2021-09-01  6:53 UTC (permalink / raw)
  To: 9front

and one more random data point:

http://r-36.net/scm/xmlpull/files.html

On 8/31/21, Steve Simon <steve@quintile.net> wrote:
> fyi
>
> there is a companion to libxml referenced earlier, called something like
> xmlcmds.tbz.
>
> This includes xb (xml beautifier/indentet) and xml2 which works similarly to
> the linux command.
>
> -Steve
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-09-01  6:53           ` hiro
@ 2021-09-01  8:30             ` sirjofri
  2021-09-01  9:02               ` kvik
  0 siblings, 1 reply; 18+ messages in thread
From: sirjofri @ 2021-09-01  8:30 UTC (permalink / raw)
  To: hiro


01.09.2021 08:53:51 hiro <23hiro@gmail.com>:
> and one more random data point:
>
> http://r-36.net/scm/xmlpull/files.html

Xmlpull is really limited. I use it for rssfill, it has no CDATA support 
and I assume its parser is not that accurate. It's good for smaller tasks 
though.

sirjofri

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-09-01  8:30             ` sirjofri
@ 2021-09-01  9:02               ` kvik
  0 siblings, 0 replies; 18+ messages in thread
From: kvik @ 2021-09-01  9:02 UTC (permalink / raw)
  To: 9front

Quoth sirjofri <sirjofri+ml-9front@sirjofri.de>:
> 
> 01.09.2021 08:53:51 hiro <23hiro@gmail.com>:
> > and one more random data point:
> >
> > http://r-36.net/scm/xmlpull/files.html
> 
> Xmlpull is really limited. I use it for rssfill, it has no CDATA support 
> and I assume its parser is not that accurate. It's good for smaller tasks 
> though.
> 
> sirjofri

Also of note are sigrid's xml.c [1] and 9atom's libxml [2]

I haven't used either.

[1] https://git.sr.ht/~ft/snippets/blob/master/xml.h
    https://git.sr.ht/~ft/snippets/blob/master/xml.c
[2] https://github.com/Plan9-Archive/9atom/tree/master/sys/src/libxml


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-08-31 13:20 ` Pavel Renev
  2021-08-31 15:40   ` Philip Silva
@ 2021-09-02 11:44   ` hiro
  2021-09-02 12:32     ` hiro
  1 sibling, 1 reply; 18+ messages in thread
From: hiro @ 2021-09-02 11:44 UTC (permalink / raw)
  To: 9front

On 8/31/21, Pavel Renev <an2qzavok@gmail.com> wrote:
> I have a half-backed DOMfs:
> http://git.nsmpr.xyz/domfs/files.html
> but it just represents documents as a flat list of numbered nodes (the way
> rio serves its windows) and their hierarchy is provided through a separate
> file.
>
> The challenge with xml/html is that unlike traditional file trees their
> elements do not have unique names and instead addressed by their order.
> Additionaly, element's attributes often play bigger role than text data they
> contain.
> Style also can override tree hierarchy when it comes to rendering, and when
> it comes to javascript, programs look up needed elements via global search
> by id and usually only care about element's immediate parent/children.
>
> TL;DR: the tree is a lie.
> Maybe serving html via some kind of database query interface would be
> better.
>

why not do like xpath? numbers can signify order.
we don't support javascript anyway, so the tree wouldn't really change
under our feet...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9front] htmlfs
  2021-09-02 11:44   ` hiro
@ 2021-09-02 12:32     ` hiro
  0 siblings, 0 replies; 18+ messages in thread
From: hiro @ 2021-09-02 12:32 UTC (permalink / raw)
  To: 9front

after writing my last email i googled xpath, and realize that i only
ever got to see a subset of it's insane complexity (the simple path
notation that includes a way to specify this n-th-element of a type,
which i have seen used a lot in practice by adblockers and anything
that needs to scrape content from websites that don't supply
meaningful element names.

contrary, i have indeed seen that some websites randomize their
element names to prevent this kind of javascript-free processing. so
yes, our low effort will not help with websites that really don't want
to be scraped...

On 9/2/21, hiro <23hiro@gmail.com> wrote:
> On 8/31/21, Pavel Renev <an2qzavok@gmail.com> wrote:
>> I have a half-backed DOMfs:
>> http://git.nsmpr.xyz/domfs/files.html
>> but it just represents documents as a flat list of numbered nodes (the
>> way
>> rio serves its windows) and their hierarchy is provided through a
>> separate
>> file.
>>
>> The challenge with xml/html is that unlike traditional file trees their
>> elements do not have unique names and instead addressed by their order.
>> Additionaly, element's attributes often play bigger role than text data
>> they
>> contain.
>> Style also can override tree hierarchy when it comes to rendering, and
>> when
>> it comes to javascript, programs look up needed elements via global
>> search
>> by id and usually only care about element's immediate parent/children.
>>
>> TL;DR: the tree is a lie.
>> Maybe serving html via some kind of database query interface would be
>> better.
>>
>
> why not do like xpath? numbers can signify order.
> we don't support javascript anyway, so the tree wouldn't really change
> under our feet...
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-09-02 14:13 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-29 21:05 [9front] htmlfs Philip Silva
2021-08-30  4:12 ` Amavect
2021-08-31  5:33   ` unobe
2021-08-31  9:46     ` jstsmthrgk
2021-08-31 10:36       ` hiro
2021-08-31 12:29         ` Steve Simon
2021-09-01  6:53           ` hiro
2021-09-01  8:30             ` sirjofri
2021-09-01  9:02               ` kvik
2021-08-31 20:09     ` hiro
2021-08-31 22:40       ` Stuart Morrow
2021-08-31 11:49   ` hiro
2021-08-31 15:42     ` Philip Silva
2021-08-31  5:23 ` unobe
2021-08-31 13:20 ` Pavel Renev
2021-08-31 15:40   ` Philip Silva
2021-09-02 11:44   ` hiro
2021-09-02 12:32     ` hiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).