Hi, the command edbrowse -d4 -f1 gave me only the following 3 lines: imap is not move capable SSL connect error in libcurl: no mail then I changed inport=*993 to inport=^993, and issued the command again: curl< * OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE STARTTLS LOGINDISABLED] Dovecot ready. curl> A001 CAPABILITY curl< * CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE STARTTLS LOGINDISABLED A001 OK Pre-login capabilities listed, post-login capabilities have more. imap is not move capable Invalid login or password no mail Cleverson
Hi, I use a paid e-mail service, cotse.net, which Thunderbird manages well, but edbrowse is having some problem. The starting lines of the respective mail account in my config file is as follows, including my comments for each setting I am trying to use: mail{ imap inserver=mail.cotse.net # the following gives ssl connect error on libcurl: inport=*993 # The following gives invalid login or password: inport=^993 then the function goes on normally until the end. Could it be that edbrowse doesn't accept my password, which was generated by a password manager and contains characters like ^, +, and _? Best, Cleverson
Watching my wife on her phone, she calls up an email, reads it, thinks the link is interesting, goes to it, all seamless. edbrowse isn't like that. We save the mail unformatted, quit, call up regular edbrowse, read the email, browse the email, go to the hyperlink, etc. There may be times we wish we were more seamless, especially when writing scripts. Do it all in one edbrowse session. But as I play with it a little, I still think what we have today is faster for most things, check email, throw away spam, save important mails, etc. So for now it's just playing and honestly it hasn't been much code, or I wouldn't be doing it. In the latest, make an empty buffer, then type something like "imap 1" If block 1 is your imap descriptor. The folders will be in your buffer, with message counts. You can't do anything with them yet, the idea would be to type g and then get a list of envelopes, sort of like directory mode. On an envelope type g to read or manage the email. ^ to get back. You get the idea. Comments. Karl Dahlke
Hello, Recently I proposed an update for src/makefile and the top makefile; now both committed. I omitted a description for the top makefile change. My mistake, I apologize. The change is all : - cd src ; make + $(MAKE) -C src clean : - cd src ; make clean + $(MAKE) -C src clean Description and rationale. Change 1: Replace make with the $MAKE variable. Why? GNU make replaces this variable with the utility name used to "perform" the makefile. This fixes naming problem because some OS names GNU make "gmake". Moreover I found a problem testing edbrowse with automatic building system (used by distros and OSs package builders). Usually they run make with the -j<number jobs> option to build in parallel. The $MAKE variable passes also -j to the makefile in src/makefile avoiding system warnings and faults. The $MAKE variable variable is documented in <https://www.gnu.org/software/make/manual/html_node/MAKE-Variable.html>. Change 2: Replace cd src with -C src. It is just a sugar syntax, the -C option is documented in <https://www.gnu.org/software/make/manual/html_node/Recursion.html>. Best Regards, Alfonso
On Sun, Feb 12, 2023 at 08:39:22AM +0100, Sebastian Humenda wrote:
> Hi
>
> Karl Dahlke schrieb am 11.02.2023, 5:32 -0500:
> >If quickjs were packaged, we would need to change the makefile, as it
> >currently assumes it has been built statically in parallel. In other words,
> >we would simply link to it as we do with curl and readline etc. We might need
>
> Debian installs it to /usr/lib/quickjs/libquickjs.a, if that helps. Maybe you
> were assuming that anyway.
As long as we can find it via pkg-config we can just look for it, perhaps
using a flag of some sort to switch between that and providing a path to a
local build? That'd mean we didn't need to know where it'd be installed if we
were building using the system (read packaged) Quickjs whilst keeping the
ability to use a local version.
[-- Attachment #1: Type: text/plain, Size: 403 bytes --] Hi Karl Dahlke schrieb am 11.02.2023, 5:32 -0500: >If quickjs were packaged, we would need to change the makefile, as it >currently assumes it has been built statically in parallel. In other words, >we would simply link to it as we do with curl and readline etc. We might need Debian installs it to /usr/lib/quickjs/libquickjs.a, if that helps. Maybe you were assuming that anyway. Cheers Sebastian [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
If quickjs were packaged, we would need to change the makefile, as it currently assumes it has been built statically in parallel. In other words, we would simply link to it as we do with curl and readline etc. We might need an environment flag or parameter or something conditional in the makefile, to snag the library from it's standard place if it is there, or from a parallel directory if we had to build it. Then we should update the corresponding paragraph in README, on what is going on. Karl Dahlke
[-- Attachment #1: Type: text/plain, Size: 1027 bytes --] Hi [please don't CC me as I'm on the list] Adam Thompson schrieb am 11.02.2023, 8:10 +0000: >No problem. As someone who uses Debian on a daily basis I've been wondering >how to facilitate a more up-to-date Edbrowse package for a while. Have ou been thinking about stable or about the rolling release versions, i.e. testing that didn't get updated in time? The a11y team is a bit understaffed. Depending on how it goes with QuickJS, I could try to follow edbrowse development along and update the packaging in time. However, for Debian stable, I cannot make any promise for the time being, as this creates additional effort -- but let's see. In any case, please feel free to drop in #debian-a11y or ping me off-list if you need packaging updates. >That being said, to repeat what I said in my previous email, it's probably >worth contacting the Quickjs maintainers directly about these concerns as >they may be able to provide greater reassurance and assistance than we can. Agreed and on the To-Do-list. Cheers Sebastian [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
On Thu, Feb 09, 2023 at 10:48:00AM +0100, Sebastian Humenda wrote: > Hi > > Adam Thompson schrieb am 09.02.2023, 8:13 +0000: > >On Wed, Feb 08, 2023 at 05:33:03AM -0500, Karl Dahlke wrote: > >> I don't understand why there would be security concerns with quickjs. It is > >> a language interpreter. It either works or it doesn't. All the security > >> concerns fall on edbrowse, which is already packaged in several distros. > > > >To provide a little more context, whereas adding an additional interpreter > >does create an additional package requiring security support, it is no more > >than any other library as far as its integration with Edbrowse. We're a lot > >less js-centric in terms of our browsing engine than other browsers and > >Quickjs is a lot more of a pure interpreter than more browser-integrated js > >engines, at least that's how it appears. > > Thanks for the context and your clarifications. No problem. As someone who uses Debian on a daily basis I've been wondering how to facilitate a more up-to-date Edbrowse package for a while. > My intent has not been to enforce any decision or to criticise what is being > done. I know that the developer base of Edbrowse is small and I am working in > similar projects to know the maintenance burden of dependencies. This is > exactly why I brought this up: understanding the rationale behind the > decision. However, I still ask for a bit more understanding for the Debian > view, as the Security team needs to know about QuickJS (among more than 38000 > other packages). QA is taken seriously, so my e-mail is just a step in that > process :-). I'll take your arguments to the security team and let's see where > it goes. It might well be that QuickJS is soon in Debian with the arguments > made. Makes sense. Apologies if any of the remarks here came across as a lack of understanding. I've been running Debian in various contexts for about 16 years now and am (at least from a user perspective) aware, and thankful for, the large amount of work that goes into the distribution including on the security front. Obviously consideration needs to be given when adding to that. That being said, to repeat what I said in my previous email, it's probably worth contacting the Quickjs maintainers directly about these concerns as they may be able to provide greater reassurance and assistance than we can. Cheers, Adam.
[-- Attachment #1: Type: text/plain, Size: 1526 bytes --] Hi Adam Thompson schrieb am 09.02.2023, 8:13 +0000: >On Wed, Feb 08, 2023 at 05:33:03AM -0500, Karl Dahlke wrote: >> I don't understand why there would be security concerns with quickjs. It is >> a language interpreter. It either works or it doesn't. All the security >> concerns fall on edbrowse, which is already packaged in several distros. > >To provide a little more context, whereas adding an additional interpreter >does create an additional package requiring security support, it is no more >than any other library as far as its integration with Edbrowse. We're a lot >less js-centric in terms of our browsing engine than other browsers and >Quickjs is a lot more of a pure interpreter than more browser-integrated js >engines, at least that's how it appears. Thanks for the context and your clarifications. My intent has not been to enforce any decision or to criticise what is being done. I know that the developer base of Edbrowse is small and I am working in similar projects to know the maintenance burden of dependencies. This is exactly why I brought this up: understanding the rationale behind the decision. However, I still ask for a bit more understanding for the Debian view, as the Security team needs to know about QuickJS (among more than 38000 other packages). QA is taken seriously, so my e-mail is just a step in that process :-). I'll take your arguments to the security team and let's see where it goes. It might well be that QuickJS is soon in Debian with the arguments made. Thanks Sebastian [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
On Wed, Feb 08, 2023 at 05:33:03AM -0500, Karl Dahlke wrote: > I don't understand why there would be security concerns with quickjs. It is > a language interpreter. It either works or it doesn't. All the security > concerns fall on edbrowse, which is already packaged in several distros. To provide a little more context, whereas adding an additional interpreter does create an additional package requiring security support, it is no more than any other library as far as its integration with Edbrowse. We're a lot less js-centric in terms of our browsing engine than other browsers and Quickjs is a lot more of a pure interpreter than more browser-integrated js engines, at least that's how it appears. > There are very likely security issues with edbrowse, but we don't have the > staff to track them down. A typical browser has hundreds of programmers > supporting it, and it's plugins and such, we have a couple of volunteers. > The README file says there are no warranties, if you use edbrowse it's on > you. This is typical boiler plate disclaimer. In any case I doubt quickjs > would be the problem. As Karl says, the development of Edbrowse is carried out by an extremely small team. That being said, I think it'd actually be nice to have some more interest in the project from security researchers (and yes I'm aware I'm probably signing the project up to more work). > > seems that QuickJS is not the most actively maintained project. > > Well, much more than duktape, which we used before. We had to drop duktape > because it doesn't even support the es6 features of js, and emails to their > maintainers went unanswered for months. In other words, duktape can't parse > most of the js out there at this time. > > It is feasible to switch to another. > The connection to the engine is entirely encapsulated in jseng-quick.c. > If we wanted to use v8, example, we would write a jseng-v8.c > and change the makefile. > That's what we did when switching from duktape to quick. And from smjs to duktape. That decision was driven by the fact that smjs is far too integrated with the Firefox ecosystem rather than being developed as an embeddable library. The same used to also be true of v8 which appeared to make various assumptions about how it was being plugged in and was a complete pain to build. This may have changed now (a quick internet search shows it's got its own website and is talked about as a discrete library) but we'd have to be somewhat careful when considering the maintainability of plugging in another js engine even though the code aspects are certainly technically viable. I also wonder if it's worth contacting the Quickjs maintainers if you have concerns about security and ongoing maintenance? Cheers, Adam.
I don't understand why there would be security concerns with quickjs.
It is a language interpreter. It either works or it doesn't.
All the security concerns fall on edbrowse, which is already packaged
in several distros.
There are very likely security issues with edbrowse, but we don't have
the staff to track them down.
A typical browser has hundreds of programmers supporting it, and it's
plugins and such, we have a couple of volunteers.
The README file says there are no warranties, if you use edbrowse it's
on you.
This is typical boiler plate disclaimer.
In any case I doubt quickjs would be the problem.
> seems that QuickJS is not the most actively maintained project.
Well, much more than duktape, which we used before.
We had to drop duktape because it doesn't even support the es6 features
of js,
and emails to their maintainers went unanswered for months.
In other words, duktape can't parse most of the js out there at this
time.
It is feasible to switch to another.
The connection to the engine is entirely encapsulated in jseng-quick.c.
If we wanted to use v8, example, we would write a jseng-v8.c
and change the makefile.
That's what we did when switching from duktape to quick.
Hope this helps.
Karl Dahlke
[-- Attachment #1: Type: text/plain, Size: 599 bytes --] Hi all I have prepared a packaged version of QuickJS for Debian that is a dependency of Edbrowse. However, during that process the question got raised whether such a security-sensitive package would be appropriate to package in Debian. The main point is that this puts additional burden on the Debian security team and it seems that QuickJS is not the most actively maintained project. Hard specific are the bindings of Edbrowse to QuickJS and is it feasible to switch to something else? To me, Duktape comes to mind. It looks slightly more maintained and is already in Debian. Thanks Sebastian [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
On Fri, Nov 18, 2022 at 09:18:07PM -0300, Cleverson Casarin Uliana wrote:
> Hi, I was reading the review linked below, and was wondering how Edbrowse
> compares against the very severe author's criteria for a privacy respecting
> browser. I might write to him later to ask him to include edbrowse on his
> review...
We don't make any unsolicited requests and definitely have no
dependencies on Google, Mozilla etc. However the author would class Edbrowse
as a minimalist browser (see the section on such things) and so I suspect
that any such message would meet with a negative response.
Cheers,
Adam.
OK lets see if the message arrives to this address: Hi, I was reading the review linked below, and was wondering how Edbrowse compares against the very severe author's criteria for a privacy respecting browser. I might write to him later to ask him to include edbrowse on his review... https://digdeeper.club/articles/browsers.xhtml Cleverson
Others have also pointed me to sgml, just, you know, if we want to understand the evolution of things. > garbage which people wrote (and continue to write) and browsers somehow turn > into something sane. Yes tidy did a lot of this for us, I didn't realize how much until I wrote my own html scanner. Ugh. I'm still making tweaks now and then. And yet, my scanner isn't much bigger than the interface code that connected to the tidy library, so there ya go. > the current direction seems to make sense. Yes I think so. Thank you. xml as received through xhr is now parsed as xml, and that may make a difference to some websites. We also do some of the cdata parsing and representation, which tidy would not be able to do for us. So this is the right path. Karl Dahlke
On Wed, Oct 12, 2022 at 08:32:37PM -0400, Karl Dahlke wrote:
> The scanners have huge overlap, and I expect only minor differences, so
> should keep it as one function. All the tag cracking and attribute cracking
> and &element; cracking and building the tree it's all the same. I suspect
> html came first and xml was a direct generalization, by throwing away the
> semantics. For sure one was very quickly on the heels of the other.
I appreciate I'm a little late to this discussion but I think (and some
quick research seems to confirm this) that they're both subsets of SGML. To
be more specific, XML is readable by a generic SGML parser whilst some SGML
(i.e. some HTML constructs) will generate errors in XML parsers. In
addition, as previously noted, XML has no inherent semantics whereas HTML
most definitely does.
To add some more confusion, an attempt was made to apply XML strictness to
HTML called as XHTML. This was, as far as I remember, the thing for a while
until HTML5 came along which (I think) went back to the pure SGML basis of
HTML.
Also, as previously noted, there's all the non-standard (and probably
incorrect in SGML though I've not bothered to read the generic standard)
garbage which people wrote (and continue to write) and browsers somehow turn
into something sane.
As such, I expect there to be quite a bit of overlap and the current
direction seems to make sense. In fact, there are other parsers which have
XML and HTML modes (and not just those used in browsers).
Cheers,
Adam.
The scanners have huge overlap, and I expect only minor differences, so should keep it as one function. All the tag cracking and attribute cracking and &element; cracking and building the tree it's all the same. I suspect html came first and xml was a direct generalization, by throwing away the semantics. For sure one was very quickly on the heels of the other. Karl Dahlke
Karl Dahlke wrote on Wed, Oct 12, 2022 at 06:51:05PM -0400: > And that's part of my problem. No worry, thanks for looking into it. I've replied to points individually below but I agre with your assessement. > xml is more like json Yes, xml is just a way of writing a tree down. As far as I understand, HTML was built on top of XML but people built "incorrect" websites (for example not closing <p> tags or whatever) and some browsers said it's ok then people asked why it's not working with other browsers and that became a new standard.. But I might be embellishing this. > * xml should be syntactically correct. Yes, I think it's ok to just return an error and no parsed tree for xml if we see an error. > * Bad html should be tolerated in xml (<p><p></p></p>) > * Should not convert <p> to P upper case Yes, definitely to both of these. > * The {cdata{ section we should only pull that out for xml. I think so, it doesn't look like the html parser in firefox does anything with it, and we've been ignoring it in html all the time, so let's keep ignoring it in html. Looking a bit more I found some more exceptions for xml e.g. <!-- comment --> shows up as "#comment {}" in dumptree on firefox, but that might be a detail. > So for start I might need another global variable, not fond of those but you > know, or maybe a parameter to htmlScanner(), bool isXML, to say which way we > are scanning, then rules as above based on isXML. Yes, xml and html are different enough to warrant some separation there. Since we do not need to interpret xml at all (except cdata that we do not need in html), it might actually be better to fork off to a different function altogether, instead of a global variable? So depending on DomParser argument (or mime type) we'd either run htmlScanner or xmlScanner ? I'm not sure which is easier to do, my line of thinking is that if more differences pop up the code might end up simpler. > This is an overview but let me know if I have made it to first base, or if I > am off in left field. This sounds good, let's try this way. I'm not sure how many sites actually manipulate xml in practice (appart for my work site...), so thank you for spending time on this! -- Dominique
And that's part of my problem. I read the wikipedia article on it. That increased my knowledge about 450%. I'm sure you know more, so please correct any of what follows, you know, before I write code that does the wrong thing. * I use to think xml was more than html, an extension of html, but now I think it is less than html. It is more like json. A way to linearly encode a tree of objects. Then people put meaning on top of it as they wish. * xml should be syntactically correct. This is more like javascript. We should not see in the wild the kind of garbage that we must deal with in html. <foo a<b > <foo bar="Hello no closing quote> <foo bar=at&t> <foo><bar></foo></bar> I'm reading that that stuff shouldn't happen, so if we only had xml in the wild my scanner would be easier to write, but of course it's mostly html, where all sorts of errors are permitted cause people wrote it by hand in the 90s and made mistakes or were just lazy etc. * Conversely, some errors in html are semantic not syntax, and should be tolerated in xml. <p><p></p></p> That's wrong in html, paragraphs inside paragraphs, the second p closes the first, I do it, tidy does it, and so on, but p has no meaning in xml, the p entities, whatever they are, might nest just fine, so <p><p></p></p> is a fine construct that should create a corresponding tree with p as child of p. * Converting <p> to P upper case because p is a standard html tag, we shouldn't do that in xml, leave p in lower case. * The {cdata{ section we should only pull that out for xml. I guess we could embed it in an html document and verify it is left alone. So for start I might need another global variable, not fond of those but you know, or maybe a parameter to htmlScanner(), bool isXML, to say which way we are scanning, then rules as above based on isXML. Because if we read in xml and I "fix" it, like turning <p><p></p></p> into <P></P>, the "document" would receive different from the way it was sent, silently, well unless somebody turned dbtags on, which a normal person would never do, so silently, and edbrowse would give the wrong answer and nobody knows why. This is an overview but let me know if I have made it to first base, or if I am off in left field. Karl Dahlke
Hi Karl, I've done some real world debugging and would like help thinking on how to fix these... Unfortunately it's on an internal (work), so I can't share any example unless I try to make one :/ It's using jquery under the hood so something we must try to understand. I'm looking at this (readable) version : https://code.jquery.com/jquery-2.2.4.js In the order I stumbled uppon problems: 1/ the first is outside that file, but they do getElementsByTagName() to get a form <select> field, then var $select = jQuery(select); and $select.prop('disabled') should apparently be false. It's undefined for us, we can probably just add it to Option in startwindow.js with this.disabled = false? I have not checked yet where that should be true. 2/ next in jquery-2.2.4.js, line 7648, this option.disabled should also be false. It'll probably be fixed with the same fix... But then on the next line we have option.parentNode.disabled. option is elem.options[3], and for edbrowse option.parentNode is elem.options, but for firefox option.parentNode is elem. We probably want to do the same, I'm not sure where we chose the parent. 3/ then Sizzle.attr line 1446 First something probably minor, but elem.ownerDocument is not defined and on firefox it is 'document', we probably want to set that in Option creation as well (they setDocument on the item if not, not sure what that does in practice) Then some weird expression I don't understand, but it's undefined and we want that here so it's good, until elem.getAttribute("value") does not return elem.value That can probably be fixed in shared.js doing like we do with length and returning this.value directly if name === "value". 4/ another unrelated problem I have is when the site apparently decided to add an onclick event on the td of a table (via jquery data-event / data-handler; not sure how that works but I could reproduce with onclick) You can try it here: https://gaia.codewreck.org/local/tmp/link-td.html Both "Test 1" and "Test 2" pop up an alert when clicked (they are stylized differently but that doesn't matter, the text is clickable and runs js) Apparently that can be set for tr as well in which case the whole row would be clickable anywhere in the row, and I tried for giggles but that applies anywhere up: the table, or even body! Not sure what we'd want to do out of that, but tr and td seem common enough that we want to come up with a way to interact with them (and I need td with href=# for my internal site) I've pushed some fixes in https://github.com/martinetd/edbrowse master branch, that fix 1/ and 3/ and it seems to work leaving a workaround for parentNode in the site's code, but I'm not quite sure what to do with parentNode and actions at table level. I've also added two unrelated patches: - one fixes some memory leaks, there might be some left but I got a few - other one fixes more (probably not very useful) gcc warnings, but at least that makes my compilation less verbose If you prefer to look at these separately I can remove them for now. Cheers! -- Dominique
[-- Attachment #1: Type: text/plain, Size: 173 bytes --] Hello, My name is Oriol and I 've been testing edbrowse. I like it and I would like to help with translations to spain, if you like it. Thank you for all. Oriol [-- Attachment #2: Type: text/html, Size: 1444 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2774 bytes --] Hey, thanks all for weighing in on this. I got it working, it turns out pyenv, an environment manager for Python, was shadowing the Python version. tdsr is more or less exactly what I was looking for, which is to read the output of commands and review by line, word, and character. Thanks for developing it, Tyler. (And it works well with Edbrowse.) Have a good day, everyone. Patrick On Fri, Sep 9, 2022, at 6:22 PM, Patrick Smyth wrote: > Hi Tyler, > > Seems I already have it installed: > > python3-speechd is already the newest version (0.9.1-4) > > Could it be some kind of path issue? I'm not sure how to get the > python3 on my system to recognize the bindings. > > > Best, > Patrick > > > > > Tyler Spivey <tspivey@pcdesk.net> writes: > > > Try installing the python3-speechd package. > > > > On 9/9/2022 11:25 AM, Patrick Smyth wrote: > >> Hi all, > >> Apologies if this is a basic or trivial question, but I wanted > >> to ask > >> about setting up screen readers for the command line on Linux. > >> I am using Linux Mint (functionally Ubuntu LTS), and while I > >> can use Orca to > >> read X11 terminals, it's quite slow and annoying to use, and > >> I'd prefer something specific to the terminal. I'm also pretty > >> happy with speakup > >> when I drop out of the graphical interface, so not looking for > >> anything there. > >> I've tried a couple command-line specific screen readers, and > >> I've had a > >> lot of trouble getting them to work. The two I've tried > >> recently are tdsr (https://github.com/tspivey/tdsr) and fenrir > >> (https://github.com/chrys87/fenrir). Setting aside Fenrir, > >> since the setup is a lot more involved, when I run tdsr I get > >> the following error > >> ModuleNotFoundError: No module named 'speechd' > >> I have speech dispatcher installed with aptitude (apt-get > >> install > >> speech-dispatcher). I downloaded the speech dispatcher project > >> from GitHub and tried importing the Python API, but it gives me > >> a circular > >> import issue. Here's the speech dispatcher repo on GitHub, > >> there's a clients folder with a Python library: > >> https://github.com/brailcom/speechd > >> If people have gotten speech dispatcher for Python or the tdsr > >> screen > >> reader working, I'd appreciate any guidance. If people are more > >> familiar with fenrir, I can try to articulate where I'm stuck > >> with that, but it's > >> significantly more involved as a setup process. And apologies > >> if edbrowse isn't the place for this kind of question, though > >> it seems > >> fairly likely some of us are using CLI screen readers in this > >> community. > >> Thanks, and hope you have a good end of the week! > >> > >> Best, > >> Patrick > >> > >> > >> > > [-- Attachment #2: Type: text/html, Size: 4544 bytes --]
The possible redesign of edbrowse buffer to use link list is, after 3 days of head-down work, closed, for now. I rather forgot about the undo command. So an empty line could consume 40 bytes of ram, and then another 40 on the undo side. The link list design basically doubles the amount of memory consumed. We have to keep everything new and old, and all those next prev pointers on both sides. Just do the math, or actually, I should have done the math first. A minute of thought is worth a megabyte of programming. For the most part, the linear design only adds 8 bytes per line for the undo feature, unless you do something weird like ,s/^.// It sure doesn't double things. Yes the linear design has its disadvantages, we've run into them, don't type g/stuff/ .m-2 on a big file, just don't do it, Henny Youngman. I'll hang on to the linklist stuff for a while, or maybe put it in a branch or something, though it will quickly become out of date and unworkable as edbrowse moves on, cause that's how software works. Karl Dahlke
One of the driving factors for large or even medium files is the overhead per line. Let's step back and do some math, ok that's too rigorous, let's do some estimating. Lines are variable length, from one byte to thousands, so they have to be allocated, and there is probably no way to optimize or customize the allocation process. In other words, no special circumstance, and I would be pretty dog gone arrogant to think I could do better than malloc, which has been refined over the past 45 years by the best minds in computer science. So, each line has a malloc overhead. What is it? I looked on the internet and there is no clear answer. I'm gonna guess 16 bytes. Could be more or less. Also chunks are 8 bytes aligned, so if your line is 41 bytes long you're going to get a slot of length 48. That's an average of another 4 bytes per line. Then there's my representation in edbrowse. In current edbrowse, pointers to lines are stored in an array, so a million line file has an array of a million pointers pointing to the million lines. Simple enough. That adds 4 bytes per line, or 8 bytes per line for 64 bit pointers. In linklist edbrowse, lines are in a linked list and that means two pointers, next and previous, you know the drill. So to compare, each line has 28 bytes overhead in one version of edbrowse, 36 bytes overhead in the other. An empty line, one byte, could consume 40 bytes of ram in linklist edbrowse. That weighs in favor of linear edbrowse, though not heavily, not a huge difference. Performance also has many tradeoffs. Something like g/re/ .m-2 is *way* more efficient in linklist. Each matching line: change some pointers and move it two lines back. That is a quadratic explosion in linear edbrowse. Try this: r !seq 400000 g/7$/ .m-2 It's 2 minutes 53 second in linear edbrowse, 1 second in linklist. and the former is quadratic in time for larger files. Twice as big 4 times as slow etc. However, if your file has 20 million lines and you ask for line 11382930 there is nothing to do but start at 1 and step through all the links and count until you find the line. I do tricks like remembering where dot is, and the last line displayed, so - just steps back one line, sure, but those are tricks and still random access can be slow. The real question is can we reduce overhead, and I have found no practical way to do so. Store 16 lines per allocated chunk? Tempting, but becomes a nightmare when you delete a line or move a line up or down in the buffer etc. Point to lines on disk by off_t, don't take them into memory unless you are changing them. Tempting, but mini disk reads are a lot of overhead, and it doesn't really save much space, since we still have all those linklist pointers and such. This is a great exercise in memory and performance optimization, and kinda fun, but I'm not making much progress, since lines in a file can be just anything. There's not much to customize or take advantage of here. Karl Dahlke