edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
From: Karl Dahlke <eklhad@comcast.net>
To: edbrowse-dev@edbrowse.org
Subject: Architecture
Date: Wed, 27 Jul 2022 13:47:03 -0400	[thread overview]
Message-ID: <20220627134703.eklhad@comcast.net> (raw)

[-- Attachment #1: Type: text/plain, Size: 1989 bytes --]

Just a note / thought, in case I'm gone and you want to keep edbrowse going, and maybe modify it for the modern age.

Over 20 years ago I chose a design where the lines of a file are stored in an array.
Not literally of course, a line can be of arbitrary length, but the structure that defines a line.
An alternate design is a link list of lines.
If the lines are short, this adds considerable overhead, pointer to next and previous.
Computers didn't have a lot of memory, and we could hold more lines, and larger files, with my simple array -
still within the limits of memory which was about half a gig.
I still think that was a reasonable design circa 2000.

Now apply this to large files, and first, some can't even be represented, with the limits of that array.
For others, delete a line and you are moving millions of lines up in the array.
That's fine for one line, but g/stuff/d is horribly inefficient, quadratic in the size of the file, and prohibitive in some situations.
So I wrote mass delete, mass join, mass read, mass substitute, to address all this, and that's fine,
but at some point we should step back and ask if we should redesign the representation using link lists.
We don't care about the pointer overhead now, we have ram to burn.
And deleting a line - just snap it out.
We could throw away all that specialized mass-operation code.
And large files are easier to manage.
Now some things are slower, like accessing line 937241.
You have to step through 937241 links to get there.
But a computer can do that pretty fast, and it's linear in the size of the file.
There aren't any operations, any more, that become quadratic in the size of the file.
At the end of the day, it's usually the exponent that makes all the difference.

So after 3.8.3, if somebody has nothing better to do,
ask whether a link list redesign is appropriate, and how it might be accomplished.
You know, rebuilding the ship while it is still sailing.

Karl Dahlke

                 reply	other threads:[~2022-07-27 17:47 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220627134703.eklhad@comcast.net \
    --to=eklhad@comcast.net \
    --cc=edbrowse-dev@edbrowse.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).