edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev]  [PATCH] Use the list class from the C++ STL,...
@ 2014-01-30 21:58 Karl Dahlke
  2014-01-30 22:18 ` Chris Brannon
  2014-01-31 12:09 ` Adam Thompson
  0 siblings, 2 replies; 12+ messages in thread
From: Karl Dahlke @ 2014-01-30 21:58 UTC (permalink / raw)
  To: Edbrowse-dev

> htmlTag should probably be a proper class,

I guess I figure when we use C++, for whatever reason, we should use its power.
And selfishly, it's a good way for me to learn it.
But I don't mean to create a lot of work for you either.

> all the char * members need to be replaced with string.

Need to? Really? Right away?
A good idea perhaps but might open up a can of worms.
Like name is assigned the string produced by htmlAttrVal() in format.c,
returning a c string of course,
and yes I suppose this would autocast itself into a c++ string,
but I wonder if there won't be a number of other cross file confusions here.

The cw->tags is a concern for sure.
Maybe could be void *, opaque pointer.
If I remember right I build the html stack and then just move it to cw->tags,
for that session, for the browsed file.
Then use cw->tags later as you interact with the browsed file.
Like filling out web forms, submitting, going to hyperlinks, etc,
all those actions being encapsulated in html.cpp.
Switch sessions and you get the tags for that session via cw->tags,
and so on.
But I should read through the code and make sure it's that simple.

Karl Dahlke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Edbrowse-dev] [PATCH] Use the list class from the C++ STL,...
  2014-01-30 21:58 [Edbrowse-dev] [PATCH] Use the list class from the C++ STL, Karl Dahlke
@ 2014-01-30 22:18 ` Chris Brannon
  2014-01-31 12:09 ` Adam Thompson
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Brannon @ 2014-01-30 22:18 UTC (permalink / raw)
  To: Edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> Like name is assigned the string produced by htmlAttrVal() in format.c,
> returning a c string of course,

The only call sites for htmlAttrVal are in html.cpp.  It would be best
to move it there and make it static.  But point well taken.  There are
probably other cross-module issues waiting in the wings.

-- Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Edbrowse-dev] [PATCH] Use the list class from the C++ STL,...
  2014-01-30 21:58 [Edbrowse-dev] [PATCH] Use the list class from the C++ STL, Karl Dahlke
  2014-01-30 22:18 ` Chris Brannon
@ 2014-01-31 12:09 ` Adam Thompson
  2014-01-31 12:35   ` Chris Brannon
  1 sibling, 1 reply; 12+ messages in thread
From: Adam Thompson @ 2014-01-31 12:09 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Thu, Jan 30, 2014 at 04:58:53PM -0500, Karl Dahlke wrote:
> > htmlTag should probably be a proper class,
> 
> I guess I figure when we use C++, for whatever reason, we should use its power.
> And selfishly, it's a good way for me to learn it.
> But I don't mean to create a lot of work for you either.

Yeah, I'm not sure why we'd want to do the full constructor,
destructor etc thing with this, apart from design for design's sake,
which in my opinion is not just unnecessary but actually harms maintainability.
I've seen much c++ code which makes wide use of structs.

> > all the char * members need to be replaced with string.
> 
> Need to? Really? Right away?
> A good idea perhaps but might open up a can of worms.
> Like name is assigned the string produced by htmlAttrVal() in format.c,
> returning a c string of course,
> and yes I suppose this would autocast itself into a c++ string,
> but I wonder if there won't be a number of other cross file confusions here.

No, char * does not autocast into a c++ string.
Thus, whereas we can pull some of this stuff into html.cpp,
we're definitely opening a large can of worms for very little benefit.
The only useful reason I could see for this change is handling nulls in names
but this wouldn't be valid html anyway i suspect.

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Edbrowse-dev] [PATCH] Use the list class from the C++ STL,...
  2014-01-31 12:09 ` Adam Thompson
@ 2014-01-31 12:35   ` Chris Brannon
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Brannon @ 2014-01-31 12:35 UTC (permalink / raw)
  To: Edbrowse-dev

Adam Thompson <arthompson1990@gmail.com> writes:

> On Thu, Jan 30, 2014 at 04:58:53PM -0500, Karl Dahlke wrote:
>> > htmlTag should probably be a proper class,
>> 
> Yeah, I'm not sure why we'd want to do the full constructor,
> destructor etc thing with this, apart from design for design's sake,

Now that I think about it, you're right.  No real need for all of that.

-- Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Edbrowse-dev] [PATCH] Use the list class from the C++ STL,...
  2014-01-31 18:37 Karl Dahlke
  2014-01-31 18:48 ` Chris Brannon
@ 2014-02-02 13:43 ` Adam Thompson
  1 sibling, 0 replies; 12+ messages in thread
From: Adam Thompson @ 2014-02-02 13:43 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Fri, Jan 31, 2014 at 01:37:31PM -0500, Karl Dahlke wrote:
> > my gcc gives me a bunch of warnings in url.c about using int and size_t
> 
> Mine doesn't, which is odd, but sure make any changes you like.
> 

Which gcc version are you using?

> > Yeah, but then we've still got the cw->tags array to worry about.
> 
> Shouldn't be a problem, if we move tags into jsstate,
> thus cw->jss->tags, as I wrote Chris off-line.
> The only reference elsewhere is some freeTags call in buffers.c,
> but we could just pass cw->jss in that case.
> Then it's all encapsulated in html.cpp, and should be easy.

Yep, sounds good.

> 
> I'd thought about the link list for the lines in a file,
> but it would be a lot of rewrite.
> Wouldn't help me much, I only have 2 gig ram anyways,
> and might even reduce the sizes of files I can edit.
> At this level it is a bit like politics,
> anything we do here will help some people and hurt others.

Perhaps, though I'd say having a hard limit on how long you can run edbrowse
for (due to the lack of line index reuse) is something which should be fixed.
Without using a linked list, I really can't think of a nice way round the issue
without shifting all the lines down to fill the gaps in the array (possibly
once the indices are all gone). This sounds a bit nasty though and I'm not sure
how large the implications of such swapping would be.

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Edbrowse-dev] [PATCH] Use the list class from the C++ STL,...
  2014-01-31 18:37 Karl Dahlke
@ 2014-01-31 18:48 ` Chris Brannon
  2014-02-02 13:43 ` Adam Thompson
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Brannon @ 2014-01-31 18:48 UTC (permalink / raw)
  To: Edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> Shouldn't be a problem, if we move tags into jsstate,
> thus cw->jss->tags, as I wrote Chris off-line.

Oh, struct htmlTag is forward-declared in eb.h, so cw->tags is struct
htmlTag **.

-- Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Edbrowse-dev]    [PATCH] Use the list class from the C++ STL,...
@ 2014-01-31 18:37 Karl Dahlke
  2014-01-31 18:48 ` Chris Brannon
  2014-02-02 13:43 ` Adam Thompson
  0 siblings, 2 replies; 12+ messages in thread
From: Karl Dahlke @ 2014-01-31 18:37 UTC (permalink / raw)
  To: Edbrowse-dev

> my gcc gives me a bunch of warnings in url.c about using int and size_t

Mine doesn't, which is odd, but sure make any changes you like.

> Yeah, but then we've still got the cw->tags array to worry about.

Shouldn't be a problem, if we move tags into jsstate,
thus cw->jss->tags, as I wrote Chris off-line.
The only reference elsewhere is some freeTags call in buffers.c,
but we could just pass cw->jss in that case.
Then it's all encapsulated in html.cpp, and should be easy.

I'd thought about the link list for the lines in a file,
but it would be a lot of rewrite.
Wouldn't help me much, I only have 2 gig ram anyways,
and might even reduce the sizes of files I can edit.
At this level it is a bit like politics,
anything we do here will help some people and hurt others.

Karl Dahlke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Edbrowse-dev] [PATCH] Use the list class from the C++ STL,...
  2014-01-31 13:02 Karl Dahlke
@ 2014-01-31 17:33 ` Adam Thompson
  0 siblings, 0 replies; 12+ messages in thread
From: Adam Thompson @ 2014-01-31 17:33 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Fri, Jan 31, 2014 at 08:02:20AM -0500, Karl Dahlke wrote:
> > The only useful reason I could see for this change is handling nulls in names
> 
> To me the reason to make a change at all, and I grant you it ain't broke,
> is to get rid of all those warning messages.
> In the linux kernel they encourage you to make such changes,
> even in working code,
> and it's surprising how you can compile the kernel, zillions of
> lines of code, and how few warnings there are.
> In contrast js 24 gives pages and pages of warnings,
> so that you can't tell if any of them should be taken seriously.

Agreed, but the change from char * to string wouldn't fix any of the js24 warnings.
Most of them go away with the -Wno-invalid-offsetof in JS_CXXFLAGS (I know this
isn't nice, but they're Mozilla's warnings which we can't fix).

> In that sense Chris' original patch does the trick, and I'm ok with that.

Yeah, switching to the c++ list class makes sense here.

Also, in the spirit of removing warnings,
my gcc gives me a bunch of warnings in url.c about using int and size_t
interchangably. Do you mind if I go ahead and fix these?

> I just thought it might not be much more work
> to put the struct right in the list, rather than struct * with manual allocs.
> And maybe clean up the code a bit.
> Guess that's Chris' call.

Yeah, but then we've still got the cw->tags array to worry about.
I wonder if we could have the stack as a void * (as you suggest)
and then get whatever needs to use it to use c++?
Which file is it actually rendered in?
If it's one of the existing c++ ones then this'd allow us to get rid of the tag
array, and thus the need for all those pointers.
If not, then I don't know.

> As for line count, I may be able to push from 10 million to 100 million,
> but probably not more than that because of the way
> files are represented with arrays inside,
> having 4 byte indexes etc.
> 
> There is another factor wherein you can run out of your 10 million lines
> not because they are really being used,
> but because there is no garbage collector for lines.
> Make a substitution, and of course you have the new line, but I
> have to keep the old one in case of an undo.
> Make another change and I suppose I could get rid of that original line,
> and yes it is freed (no memory leak),
> but the index inside is not reclaimed.
> The indexes march on towards 10 million.
> So after a long edbrowse session you could run into this, even though you don't
> have 10 million lines active now.
> I've been meaning to address this in some fashion for,
> or er um, at least a decade.
> Well probably push the limit to 100 million first, that's more important
> and easier, then think about better gc procedures for the line indexes.

Yeah. Not sure how to do this without getting the equivalent of memory
fragmentation in the lines array! The only way around this would be to switch
from an array of lines to a linked list of them so you could swap out lines as needed.
This'd be less memory-efficient for small numbers of lines but would make
switching lines in and out much easier.
This'd also get rid of the hard limit on lines completely I think (unless I'm
missing something).
> 
> I have played with jsrt for a while but haven't been able to produce
> your seg fault.
> And the one that I had earlier, reproducible,
> went away with js 24.
> I'll keep fishing around to see if I can trip another one.

Thanks.

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Edbrowse-dev]   [PATCH] Use the list class from the C++ STL,...
@ 2014-01-31 13:02 Karl Dahlke
  2014-01-31 17:33 ` Adam Thompson
  0 siblings, 1 reply; 12+ messages in thread
From: Karl Dahlke @ 2014-01-31 13:02 UTC (permalink / raw)
  To: Edbrowse-dev

> The only useful reason I could see for this change is handling nulls in names

To me the reason to make a change at all, and I grant you it ain't broke,
is to get rid of all those warning messages.
In the linux kernel they encourage you to make such changes,
even in working code,
and it's surprising how you can compile the kernel, zillions of
lines of code, and how few warnings there are.
In contrast js 24 gives pages and pages of warnings,
so that you can't tell if any of them should be taken seriously.
In that sense Chris' original patch does the trick, and I'm ok with that.
I just thought it might not be much more work
to put the struct right in the list, rather than struct * with manual allocs.
And maybe clean up the code a bit.
Guess that's Chris' call.

As for line count, I may be able to push from 10 million to 100 million,
but probably not more than that because of the way
files are represented with arrays inside,
having 4 byte indexes etc.

There is another factor wherein you can run out of your 10 million lines
not because they are really being used,
but because there is no garbage collector for lines.
Make a substitution, and of course you have the new line, but I
have to keep the old one in case of an undo.
Make another change and I suppose I could get rid of that original line,
and yes it is freed (no memory leak),
but the index inside is not reclaimed.
The indexes march on towards 10 million.
So after a long edbrowse session you could run into this, even though you don't
have 10 million lines active now.
I've been meaning to address this in some fashion for,
or er um, at least a decade.
Well probably push the limit to 100 million first, that's more important
and easier, then think about better gc procedures for the line indexes.

I have played with jsrt for a while but haven't been able to produce
your seg fault.
And the one that I had earlier, reproducible,
went away with js 24.
I'll keep fishing around to see if I can trip another one.

Karl Dahlke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Edbrowse-dev]  [PATCH] Use the list class from the C++ STL,...
@ 2014-01-30 22:25 Karl Dahlke
  0 siblings, 0 replies; 12+ messages in thread
From: Karl Dahlke @ 2014-01-30 22:25 UTC (permalink / raw)
  To: Edbrowse-dev

> The only call sites for htmlAttrVal are in html.cpp. It would be best
> to move it there and make it static.

Yes and perhaps skipHtmlComment() too.

You just have to play around and see what makes sense.

Karl Dahlke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Edbrowse-dev] [PATCH] Use the list class from the C++ STL,...
  2014-01-30 17:49 Karl Dahlke
@ 2014-01-30 21:23 ` Chris Brannon
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Brannon @ 2014-01-30 21:23 UTC (permalink / raw)
  To: Edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> but why aren't you using
>
> static list < struct htmlTag >htmlStack;

That's not a bad idea.  There are a few things preventing it.
htmlTag should probably be a proper class, with a constructor,
destructor, copy constructor, and assignment operator.
Also, to make life easier, all the char * members need to be replaced
with string.  I'm working on that, but it'll take a while.

Next, we have an issue of duplication.  We have two data structures that
will hold the same set of tags, one being cw->tags (an array), and the
other being htmlStack.  If we use structs instead of pointers in these,
the data structures can get out of sync when code modifies a tag in just
one of them.  If cw->tags is never constructed while htmlStack is in
use, then this isn't actually a problem.  But I'm not sure if that is
the case.

-- Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Edbrowse-dev]  [PATCH] Use the list class from the C++ STL,...
@ 2014-01-30 17:49 Karl Dahlke
  2014-01-30 21:23 ` Chris Brannon
  0 siblings, 1 reply; 12+ messages in thread
From: Karl Dahlke @ 2014-01-30 17:49 UTC (permalink / raw)
  To: Edbrowse-dev

static list < struct htmlTag *>htmlStack;

It may just be a matter of style, or perhaps evolution
since my lists had to handle pointers, but why aren't you using

static list < struct htmlTag >htmlStack;

An array of the actual structures, rahter than pointers to the structures.
Thus the structures are allocated and freed as you go, by C++,
and I don't have to alloc and free them.
They are "inline", for lack of a better term.
I'm just looking to understand here,
and to learn what these lists can do for us.

Karl Dahlke

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-02-02 13:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-30 21:58 [Edbrowse-dev] [PATCH] Use the list class from the C++ STL, Karl Dahlke
2014-01-30 22:18 ` Chris Brannon
2014-01-31 12:09 ` Adam Thompson
2014-01-31 12:35   ` Chris Brannon
  -- strict thread matches above, loose matches on Subject: below --
2014-01-31 18:37 Karl Dahlke
2014-01-31 18:48 ` Chris Brannon
2014-02-02 13:43 ` Adam Thompson
2014-01-31 13:02 Karl Dahlke
2014-01-31 17:33 ` Adam Thompson
2014-01-30 22:25 Karl Dahlke
2014-01-30 17:49 Karl Dahlke
2014-01-30 21:23 ` Chris Brannon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).