edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] Object identifiers
@ 2017-07-24  0:52 Karl Dahlke
  2017-07-24  1:04 ` Chris Brannon
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Karl Dahlke @ 2017-07-24  0:52 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]

As you know, we use the pointer to the object as the link between that object in the js world and the corresponding html tag in the edbrowse world.
We did this in Mozilla, and naturally carried it across to duktape.
The documentation guarantees these pointers are valid, as long as the object is not garbage collected away.
We've come to learn that's a mighty big if.
I wrote a fair amount of code to make sure we weren't diddling with an object that was thrown away, because you get a seg fault if you do. Boom!
But then there's a bigger problem.
I'm sure this was present in mozilla, but we never got far enough down the road to see it.
tag1 links to object1 by pointer1, and javascript removes object1, and it is thrown away, and creates object2, which happens to use the same pointer1.
It passes pointer1 back to edbrowse, which already has this recorded in tag1.
So now tag1 is linked to object2, which may be a form instead of a div, may have different number of children, etc etc.
It is potentially bad, with unpredictable behavior, and nearly impossible to reproduce.
We went down the pointer path because neither engine gives you an unambiguous handle to the object.
So now I'm writing more code to know that object2 is new, and even if it uses pointer1, which we have in our registry, we should delete that tag and make a new one,
and I already broke edbrowse in a mysterious way, and it's just badbadbad.
So I'm thinking instead, why don't we create our own IDs.
Just a sequence number that we put on every object we care about, and an array to map back from sequence number to the object.
A bit of overhead, but no crashes if we accidentally reference an object that is gone, and no ambiguity - one object per id.
Change jsobjtype from voidd * to int, and systemically change everything else accordingly.
A fair chunk of work, but might be worth it.
I really don't have a better alternative right now. What do you think?

Karl Dahlke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Edbrowse-dev] Object identifiers
  2017-07-24  0:52 [Edbrowse-dev] Object identifiers Karl Dahlke
@ 2017-07-24  1:04 ` Chris Brannon
  2017-07-24  5:25 ` Dominique Martinet
  2017-07-24  5:43 ` Robert Ransom
  2 siblings, 0 replies; 8+ messages in thread
From: Chris Brannon @ 2017-07-24  1:04 UTC (permalink / raw)
  To: edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> So I'm thinking instead, why don't we create our own IDs.
> Just a sequence number that we put on every object we care about, and an array to map back from sequence number to the object.
> A bit of overhead, but no crashes if we accidentally reference an object that is gone, and no ambiguity - one object per id.

I like this.  It is the safe approach.  How will objects in the object
ID mapping table be freed?

-- Chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Edbrowse-dev] Object identifiers
  2017-07-24  0:52 [Edbrowse-dev] Object identifiers Karl Dahlke
  2017-07-24  1:04 ` Chris Brannon
@ 2017-07-24  5:25 ` Dominique Martinet
  2017-07-24  6:43   ` Karl Dahlke
  2017-07-24  7:03   ` Karl Dahlke
  2017-07-24  5:43 ` Robert Ransom
  2 siblings, 2 replies; 8+ messages in thread
From: Dominique Martinet @ 2017-07-24  5:25 UTC (permalink / raw)
  To: edbrowse-dev

Hi,

Karl Dahlke wrote on Sun, Jul 23, 2017:
> I really don't have a better alternative right now. What do you think?

I'm not quite sure it will help, if the problem is a desynchronisation
between what we think is still linked and what duktape thinks is still
linked then even with an indirect mapping we still will have pointers to
invalid memory, unless you plan on using duktape to handle the map
itself.


I've had a look at duktape documentation and code, and I have an idea
that might be less code.
The duk_create_heap function lets us provide alloc, realloc and free
functions - we use it with duk_create_heap_default which is
duk_create_heap with a lot of NULL parameters but we can override some
of the base functions.

If we overload malloc, realloc and free with just a little wrapper that
tells the main edbrowse process "this address changed" (for realloc) or
"this address is no longer valid" (for free), nothing to do for malloc,
then we will have a safe way to determin a pointer can no longer be
used.

This is not as implementation-generic as making our own ID, but it
sounds like a _lot_ less work and should be safe for us.
I also think most implementations will have a way to use our own
allocation functions too so it might be more reuseable than it looks.


What do you think?

-- 
Dominique

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Edbrowse-dev] Object identifiers
  2017-07-24  0:52 [Edbrowse-dev] Object identifiers Karl Dahlke
  2017-07-24  1:04 ` Chris Brannon
  2017-07-24  5:25 ` Dominique Martinet
@ 2017-07-24  5:43 ` Robert Ransom
  2017-07-24 17:45   ` Adam Thompson
  2 siblings, 1 reply; 8+ messages in thread
From: Robert Ransom @ 2017-07-24  5:43 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On 7/23/17, Karl Dahlke <eklhad@comcast.net> wrote:

> Just a sequence number that we put on every object we care about, and an
> array to map back from sequence number to the object.

The traditional approach is to use a 'handle', consisting of
a small array index together with a short check value.
Each array index contains a counter of the number of times
the object at that index has been freed.
When a handle is created, its check value is set to that counter;
when the object associated with a handle is accessed,
the check value must be equal to the counter's current value.

For debugging purposes, a list of every handle created
with a description of the objects each handle referenced
could be stored compressed in memory or logged to a file.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Edbrowse-dev] Object identifiers
  2017-07-24  5:25 ` Dominique Martinet
@ 2017-07-24  6:43   ` Karl Dahlke
  2017-07-24  7:49     ` Dominique Martinet
  2017-07-24  7:03   ` Karl Dahlke
  1 sibling, 1 reply; 8+ messages in thread
From: Karl Dahlke @ 2017-07-24  6:43 UTC (permalink / raw)
  To: edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2841 bytes --]

> I'm not quite sure it will help, if the problem is a desynchronisation
> between what we think is still linked and what duktape thinks is still linked

Well if we call upon an object with id 17, and there is no such object, for any reason, we get back eb$idl[17] which will not crash the process, and is probably the null object, which is what we want.
It won't have the properties we look for, and if we try to set properties it won't do anything.
But as Chris pointed out, it will never be garbage collected away, because it always has at least one reference, our registry eb$idl[17].
That's good and bad.
We can mess with extinct objects all we want, because they're still there, but objects accumulate and never go away.
Not until you close the window.
Some web pages perform some kind of update every few seconds, like my example http://www.eklhad.net/async
and these would just accrue objects in js and tags in edbrowse for as long as the page is open.

I don't think my generated id plan is prohibitive, probably 2 or 3 days work to get it functioning.
There are advantages to that plan, and advantages to your free interceptor.
If I don't maintain a registry of objects then they will indeed go away when they should.
I only need wrapper free, because the pointer to the object is valid until it is thrown away.
But is the pointer returned by get_heapptr() the same as the base of the allocated region for that object?
Probably, but we really don't know.
It could be a structure inside the region, so malloc and free would not be the same pointers as get_heapptr and push_heapptr.
Only way to know is to write a stand-alone test program.
We'd sure want to do that before going down that path.
And hope duktape never changed that internal design.

So what to do when a pointer is freed?
Since the heap holds everything, you have to loop through all tags in all open windows in edbrowse,
and if any tag has the pointer, and there should be only one, mark it dead or otherwise delete it.
Well that's not too awful, if JS1 is set and everything is one process.
In the 2 process model it's a pain!
At an unpredictable time while javascript is running, it frees some object and we can see the pointer that is freed.
But we're in the wrong process.
I guess I'd need another side effect, pass the freed pointers back to edbrowse along with the response to whatever js function ran.
Then edbrowse can run its own version of gc as described above.

> I also think most implementations will have a way to use our own
> allocation functions

I don't believe mozilla allowed for this.
In other words, we couldn't have considered this approach last month.

> What do you think?

I tentatively like it - even though I just spent a few hours working on the generated id design...
Oh well.

Karl Dahlke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Edbrowse-dev] Object identifiers
  2017-07-24  5:25 ` Dominique Martinet
  2017-07-24  6:43   ` Karl Dahlke
@ 2017-07-24  7:03   ` Karl Dahlke
  1 sibling, 0 replies; 8+ messages in thread
From: Karl Dahlke @ 2017-07-24  7:03 UTC (permalink / raw)
  To: edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

Now I'm thinking about performance, and that could be a problem.
duktape is allocating and freeing zillions of things all the time, most of them not objects.
duk_get_string, duk_push_string, etc.
Edbrowse is the same way if you look at the code.
So pass all those pointers across and scan through all tags in all open windows against all pointers? I don't think so.
No - I'd have to maintain a watch list in the js process.
My own wrapper around get_heapptr() that says, this is an important pointer that will be linked to an edbrowse tag, keep it in a list.
Then our wrapper around free could scan through this list and 99% of the time it wouldn't match so do nothing.
If it does match then remove it from the watch list and pass it over to edbrowse as a side effect.

To do it right, it's not easy peasy; might be about the same effort is generated IDs,
but the free interceptor might still win the day, primarily because it allows garbage collection to continue, and web pages will not accumulate objects second by second by second.

Karl Dahlke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Edbrowse-dev] Object identifiers
  2017-07-24  6:43   ` Karl Dahlke
@ 2017-07-24  7:49     ` Dominique Martinet
  0 siblings, 0 replies; 8+ messages in thread
From: Dominique Martinet @ 2017-07-24  7:49 UTC (permalink / raw)
  To: edbrowse-dev

Karl Dahlke wrote on Mon, Jul 24, 2017:
> Well if we call upon an object with id 17, and there is no such
> object, for any reason, we get back eb$idl[17] which will not crash
> the process, and is probably the null object, which is what we want.

Ok, so you do use duktape to hold the array, in that case the design is
solid yes; just need to worry about when to release objects if we go
that way as you describe.
I am not quite sure how to do that before we close the window... Will
have a think.


> If I don't maintain a registry of objects then they will indeed go away when they should.
> I only need wrapper free, because the pointer to the object is valid until it is thrown away.
> But is the pointer returned by get_heapptr() the same as the base of the allocated region for that object?

Hm, good point. I think they are by definition.
I just checked and it works.
By the way, realloc is very important too, it happens all the time and
addresses are also reused after realloc, the base pointer changes too.
It happened js_hello_duk so imagine for edbrowse!

> And hope duktape never changed that internal design.

We can only hope, but I think it is a definition of the function but it
is only how I interpret this, they might think differently.



> Since the heap holds everything, you have to loop through all tags in all open windows in edbrowse,
> and if any tag has the pointer, and there should be only one, mark it dead or otherwise delete it.

Right, we might need to keep an internal id just for that, so the main
process only uses our id, and the js process has a table that does id to
pointer?
Or we can allocate a bit more and keep some edbrowse data in the pointer
itself at the start of what we allocated, so free/realloc know what
edbrowse id this is about and can operate quickly without traversal.
(There are nice 'container_of' macros for that)

I had forgotten that the js process cannot message back to edbrowse, but
with this model the id are only ever modified in the js process and the
main process never needs to know if we do not reuse them.

It is a bit of both ideas, and does not need too many messages passed, I
like it.


(taken from irc)
> Browsing jsrt generates 17821 calls to free() - so my concerns about
> performance are justified.

Hm, we cannot have too many id either, but I do not think alloc can tell
if the pointer will be useful or not.
Maybe always give a new number to each malloc but not using every number
to keep a pointer to the heapptr; when we need a heapptr the first time
we get the id back and add it to our table with another edbrowse number.

That way most free will not have anything to do if the alloc number has
not been used.

Something like extra 64 bits at the start.
First bit is used/not used, if not used then free/realloc do nothing.
If used, other 63 bits are our ID and we need to find it in table.



I think this needs a bit more thinking over, so let's not rush this but
we can play with different ideas and see how it works.



> Then edbrowse can run its own version of gc as described above.

I would rather not add a gc algorithm if we can get away without it, one
is bad enough to debug, imagine when something goes wrong if we cannot
tell which gc removed our value!



-- 
Dominique

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Edbrowse-dev] Object identifiers
  2017-07-24  5:43 ` Robert Ransom
@ 2017-07-24 17:45   ` Adam Thompson
  0 siblings, 0 replies; 8+ messages in thread
From: Adam Thompson @ 2017-07-24 17:45 UTC (permalink / raw)
  To: Robert Ransom; +Cc: Karl Dahlke, Edbrowse-dev

On Mon, Jul 24, 2017 at 01:43:40AM -0400, Robert Ransom wrote:
> On 7/23/17, Karl Dahlke <eklhad@comcast.net> wrote:
> 
> > Just a sequence number that we put on every object we care about, and an
> > array to map back from sequence number to the object.
> 
> The traditional approach is to use a 'handle', consisting of
> a small array index together with a short check value.
> Each array index contains a counter of the number of times
> the object at that index has been freed.
> When a handle is created, its check value is set to that counter;
> when the object associated with a handle is accessed,
> the check value must be equal to the counter's current value.

This sounds like what we need here.  It's more work but worth it I think.  I'm not entirely sure how we'll keep the two sets of objects in sync but it's certainly a solvable problem.

> For debugging purposes, a list of every handle created
> with a description of the objects each handle referenced
> could be stored compressed in memory or logged to a file.

Agreed, that'd help a lot.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-07-24 17:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-24  0:52 [Edbrowse-dev] Object identifiers Karl Dahlke
2017-07-24  1:04 ` Chris Brannon
2017-07-24  5:25 ` Dominique Martinet
2017-07-24  6:43   ` Karl Dahlke
2017-07-24  7:49     ` Dominique Martinet
2017-07-24  7:03   ` Karl Dahlke
2017-07-24  5:43 ` Robert Ransom
2017-07-24 17:45   ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).