Speaking of Go and gumbo, I was also wondering if there is some sort of htmlfs. In at lest one forum it was mentioned there would be an xmlfs somewhere in contrib but couldn't find it. I'd be most curious about how it's structured also given that tags must be ordered. Maybe /mnt/.../html/body/001-div/... or .../body/001/div/...? I'm not sure if that would be a nice solution. For the opossum browser I eventually wrote an rpc that gets a css selector as input and returns jsons of html nodes. (Not really something for a reusable api I guess, but enough to try some stuff :) Although I'd much prefer using a common/clean interface/fs)
Not everything needs to be a file system. A program still needs to deserialize and load structs. A 9p fs just doesn't do that. I think you just want a programming library. Thanks, Amavect
Quoth Philip Silva <philip.silva@protonmail.com>:
> Speaking of Go and gumbo, I was also wondering if there is some sort of htmlfs. In at lest one forum it was mentioned there would be an xmlfs somewhere in contrib but couldn't find it. I'd be most curious about how it's structured also given that tags must be ordered. Maybe /mnt/.../html/body/001-div/... or .../body/001/div/...? I'm not sure if that would be a nice solution. For the opossum browser I eventually wrote an rpc that gets a css selector as input and returns jsons of html nodes. (Not really something for a reusable api I guess, but enough to try some stuff :) Although I'd much prefer using a common/clean interface/fs)
Is this what you're looking for?
cpu% 9fs sources
post...
cpu% ls -l /n/sources/contrib/steve/libxml*
--rw-r--r-- M 505 bootes sys 10178 Jun 7 2017 /n/sources/contrib/steve/libxml.tbz
cpu%
Quoth Amavect <amavect@gmail.com>:
> Not everything needs to be a file system.
> A program still needs to deserialize and load structs.
> A 9p fs just doesn't do that.
That's true, there are benefits to a programming library (namely,
performance). But doesn't a file system that presents a consistent
interface allow for a choice of programming language and for the
ability to abstract further? For instance, having xmlfs (if such a
thing existed) would allow for rc programs to do some simple tasks
that need to muck with xml.
There exists a piece of software (for linux) with some interesting ideas regarding xml scriptability: https://manpages.debian.org/unstable/xml2/html2.1.en.html Am 31. August 2021 07:33:25 MESZ schrieb unobe@cpan.org: >Quoth Amavect <amavect@gmail.com>: >> Not everything needs to be a file system. >> A program still needs to deserialize and load structs. >> A 9p fs just doesn't do that. > >That's true, there are benefits to a programming library (namely, >performance). But doesn't a file system that presents a consistent >interface allow for a choice of programming language and for the >ability to abstract further? For instance, having xmlfs (if such a >thing existed) would allow for rc programs to do some simple tasks >that need to muck with xml. >
i can confirm, but the man page doesn't represent the ideas.
if you end up using this, you will see that for each element one line
of text is printed, including the full path/name of the value.
it feels a little bit like the output of grep -r, just with xml
hierarchy instead of file paths.
i always thought it would be neat to have a real fs instead, allowing
globbing instead of grep to read a specific element.
example:
$ html2 < poettering-Walkthrough\ for\ Portable\ Services.html
2>/dev/null | grep head/title
/html/head/title= Walkthrough for Portable Services
would be cool to instead run:
; htmlfs poettering*html
mounted at /n/htmlfs
; cat /n/htmlfs/*/*/title
Walkthrough for Portable Services
;
cool.
but not necessary for my use case.
A file is useful for separating binary data and strings that contain
newlines, but it just so happens that inside html you can often ignore
newlines, which means that practically the value of any entity should
fit quite well on a line of text as html2/xml2 output it.
On 8/31/21, jstsmthrgk <jstsmthrgk@jstsmthrgk.eu> wrote:
> There exists a piece of software (for linux) with some interesting ideas
> regarding xml scriptability:
> https://manpages.debian.org/unstable/xml2/html2.1.en.html
>
>
> Am 31. August 2021 07:33:25 MESZ schrieb unobe@cpan.org:
>>Quoth Amavect <amavect@gmail.com>:
>>> Not everything needs to be a file system.
>>> A program still needs to deserialize and load structs.
>>> A 9p fs just doesn't do that.
>>
>>That's true, there are benefits to a programming library (namely,
>>performance). But doesn't a file system that presents a consistent
>>interface allow for a choice of programming language and for the
>>ability to abstract further? For instance, having xmlfs (if such a
>>thing existed) would allow for rc programs to do some simple tasks
>>that need to muck with xml.
>>
>
>
>
> Not everything needs to be a file system.
> A program still needs to deserialize and load structs.
not everything needs to use structs.
esp. if everything is a string.
fyi there is a companion to libxml referenced earlier, called something like xmlcmds.tbz. This includes xb (xml beautifier/indentet) and xml2 which works similarly to the linux command. -Steve
I have a half-backed DOMfs: http://git.nsmpr.xyz/domfs/files.html but it just represents documents as a flat list of numbered nodes (the way rio serves its windows) and their hierarchy is provided through a separate file. The challenge with xml/html is that unlike traditional file trees their elements do not have unique names and instead addressed by their order. Additionaly, element's attributes often play bigger role than text data they contain. Style also can override tree hierarchy when it comes to rendering, and when it comes to javascript, programs look up needed elements via global search by id and usually only care about element's immediate parent/children. TL;DR: the tree is a lie. Maybe serving html via some kind of database query interface would be better.
Cool thanks for sharing! Yes I was thinking of use cases like connecting a separate JS process for dom manipulation or an rc script. For automation a numbered tree is oftentimes probably what is actually needed...
> I have a half-backed DOMfs:
>
> http://git.nsmpr.xyz/domfs/files.html
>
> but it just represents documents as a flat list of numbered nodes (the way rio serves its windows) and their hierarchy is provided through a separate file.
True, plain text with files for the fields is much better.
> > Not everything needs to be a file system.
>
> > A program still needs to deserialize and load structs.
>
> not everything needs to use structs.
>
> esp. if everything is a string.
indeed. both a filesystem or a piping process like discussed in
previous emails can be used easily from rc
On 8/31/21, unobe@cpan.org <unobe@cpan.org> wrote:
> Quoth Amavect <amavect@gmail.com>:
>> Not everything needs to be a file system.
>> A program still needs to deserialize and load structs.
>> A 9p fs just doesn't do that.
>
> That's true, there are benefits to a programming library (namely,
> performance). But doesn't a file system that presents a consistent
> interface allow for a choice of programming language and for the
> ability to abstract further? For instance, having xmlfs (if such a
> thing existed) would allow for rc programs to do some simple tasks
> that need to muck with xml.
>
>
I don't know what you're trying to do, but I'd rather have edbrowse or webscript[1] than something which requires me to know the inner workings of the web page. The browser that rpeppe is looking for in the comments section of [1] is LAPIS[2]. [1] https://research.swtch.com/webscript [2] http://www.cs.cmu.edu/~rcm/papers/usenix00/usenix00.html
and one more random data point: http://r-36.net/scm/xmlpull/files.html On 8/31/21, Steve Simon <steve@quintile.net> wrote: > fyi > > there is a companion to libxml referenced earlier, called something like > xmlcmds.tbz. > > This includes xb (xml beautifier/indentet) and xml2 which works similarly to > the linux command. > > -Steve > >
01.09.2021 08:53:51 hiro <23hiro@gmail.com>:
> and one more random data point:
>
> http://r-36.net/scm/xmlpull/files.html
Xmlpull is really limited. I use it for rssfill, it has no CDATA support
and I assume its parser is not that accurate. It's good for smaller tasks
though.
sirjofri
Quoth sirjofri <sirjofri+ml-9front@sirjofri.de>: > > 01.09.2021 08:53:51 hiro <23hiro@gmail.com>: > > and one more random data point: > > > > http://r-36.net/scm/xmlpull/files.html > > Xmlpull is really limited. I use it for rssfill, it has no CDATA support > and I assume its parser is not that accurate. It's good for smaller tasks > though. > > sirjofri Also of note are sigrid's xml.c [1] and 9atom's libxml [2] I haven't used either. [1] https://git.sr.ht/~ft/snippets/blob/master/xml.h https://git.sr.ht/~ft/snippets/blob/master/xml.c [2] https://github.com/Plan9-Archive/9atom/tree/master/sys/src/libxml
On 8/31/21, Pavel Renev <an2qzavok@gmail.com> wrote:
> I have a half-backed DOMfs:
> http://git.nsmpr.xyz/domfs/files.html
> but it just represents documents as a flat list of numbered nodes (the way
> rio serves its windows) and their hierarchy is provided through a separate
> file.
>
> The challenge with xml/html is that unlike traditional file trees their
> elements do not have unique names and instead addressed by their order.
> Additionaly, element's attributes often play bigger role than text data they
> contain.
> Style also can override tree hierarchy when it comes to rendering, and when
> it comes to javascript, programs look up needed elements via global search
> by id and usually only care about element's immediate parent/children.
>
> TL;DR: the tree is a lie.
> Maybe serving html via some kind of database query interface would be
> better.
>
why not do like xpath? numbers can signify order.
we don't support javascript anyway, so the tree wouldn't really change
under our feet...
after writing my last email i googled xpath, and realize that i only
ever got to see a subset of it's insane complexity (the simple path
notation that includes a way to specify this n-th-element of a type,
which i have seen used a lot in practice by adblockers and anything
that needs to scrape content from websites that don't supply
meaningful element names.
contrary, i have indeed seen that some websites randomize their
element names to prevent this kind of javascript-free processing. so
yes, our low effort will not help with websites that really don't want
to be scraped...
On 9/2/21, hiro <23hiro@gmail.com> wrote:
> On 8/31/21, Pavel Renev <an2qzavok@gmail.com> wrote:
>> I have a half-backed DOMfs:
>> http://git.nsmpr.xyz/domfs/files.html
>> but it just represents documents as a flat list of numbered nodes (the
>> way
>> rio serves its windows) and their hierarchy is provided through a
>> separate
>> file.
>>
>> The challenge with xml/html is that unlike traditional file trees their
>> elements do not have unique names and instead addressed by their order.
>> Additionaly, element's attributes often play bigger role than text data
>> they
>> contain.
>> Style also can override tree hierarchy when it comes to rendering, and
>> when
>> it comes to javascript, programs look up needed elements via global
>> search
>> by id and usually only care about element's immediate parent/children.
>>
>> TL;DR: the tree is a lie.
>> Maybe serving html via some kind of database query interface would be
>> better.
>>
>
> why not do like xpath? numbers can signify order.
> we don't support javascript anyway, so the tree wouldn't really change
> under our feet...
>