edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] tidy5 and versions
@ 2015-08-24 11:54 Karl Dahlke
  2015-08-25  0:24 ` Kevin Carhart
  2015-08-26 18:06 ` Chris Brannon
  0 siblings, 2 replies; 3+ messages in thread
From: Karl Dahlke @ 2015-08-24 11:54 UTC (permalink / raw)
  To: Edbrowse-dev

We stand on the edge of pushing a change that will require tidy5.
It's cautious, doesn't do anything except run the html through tidy,
in parallel with everything else we are doing,
then free the tidy tree when the window is freed.
Just to get us started, to make sure tidy doesn't seg fault etc.
But it will change the way edbrowse is built.
We now need another library etc.
Should we, and I kinda think we should, stamp another version, 3.5.4.2,
before we jump into the tidy pool?
Some work has been done since 3.5.4.1, some bug fixes, some cosmetics,
and the framework for imap, including a simple move delete interface.
Chris before you push Kevin's tidy patch, maybe stamp 3.5.4.2.

After we are using tidy to parse html,
and I hope this isn't a long time coming, we may want to jump up to 3.6.

Karl Dahlke

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Edbrowse-dev] tidy5 and versions
  2015-08-24 11:54 [Edbrowse-dev] tidy5 and versions Karl Dahlke
@ 2015-08-25  0:24 ` Kevin Carhart
  2015-08-26 18:06 ` Chris Brannon
  1 sibling, 0 replies; 3+ messages in thread
From: Kevin Carhart @ 2015-08-25  0:24 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1418 bytes --]


And a note to Adam, we've hashed this patch out offlist, but if you have 
any critiques on this, please fire away.  It's just a few lines and 
straightforward, but as a patch-submission newbie I can use the 
multifaceted scrutiny, and would like to know what you think.

Thanks
Kevin


On Mon, 24 Aug 2015, Karl Dahlke wrote:

> We stand on the edge of pushing a change that will require tidy5.
> It's cautious, doesn't do anything except run the html through tidy,
> in parallel with everything else we are doing,
> then free the tidy tree when the window is freed.
> Just to get us started, to make sure tidy doesn't seg fault etc.
> But it will change the way edbrowse is built.
> We now need another library etc.
> Should we, and I kinda think we should, stamp another version, 3.5.4.2,
> before we jump into the tidy pool?
> Some work has been done since 3.5.4.1, some bug fixes, some cosmetics,
> and the framework for imap, including a simple move delete interface.
> Chris before you push Kevin's tidy patch, maybe stamp 3.5.4.2.
>
> After we are using tidy to parse html,
> and I hope this isn't a long time coming, we may want to jump up to 3.6.
>
> Karl Dahlke
> _______________________________________________
> Edbrowse-dev mailing list
> Edbrowse-dev@lists.the-brannons.com
> http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev
>

--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists

[-- Attachment #2: Type: TEXT/PLAIN, Size: 3638 bytes --]

diff -Naur 1/edbrowse-master/README 2/edbrowse-master/README
--- 1/edbrowse-master/README	2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/README	2015-08-23 21:46:42.783741131 -0700
@@ -73,6 +73,19 @@
 If you have to compile curl from source, be sure to specify
 --ENABLE-VERSION-SYMBOLS (or some such) at the configure script.
 
+Edbrowse now uses the Tidy HTML parser.  So there are a couple
+of things to install for this prerequisite.
+The Tidy compilation process uses cmake.  Please either use your
+package manager to get cmake (for instance, apt-get install cmake),
+or follow the instructions at http://www.cmake.org/download/
+
+Once you have cmake, download the latest Tidy code from:
+https://github.com/htacg/tidy-html5/archive/master.zip
+Unzip and cd to build/cmake
+cmake ../..
+make install
+Now the latest Tidy library will be available to edbrowse.
+
 Finally, you need the Spider Monkey javascript engine from Mozilla.org
 ftp://ftp.mozilla.org/pub/mozilla.org/js/
 Edbrowse 3.5.1 and higher requires Mozilla js version 2.4.
diff -Naur 1/edbrowse-master/src/buffers.c 2/edbrowse-master/src/buffers.c
--- 1/edbrowse-master/src/buffers.c	2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/src/buffers.c	2015-08-24 16:02:18.351550150 -0700
@@ -583,6 +583,7 @@
 	nzFree(w->firstURL);
 	nzFree(w->referrer);
 	nzFree(w->baseDirName);
+       tidyRelease(w->tdoc);
 	free(w);
 }				/* freeWindow */
 
diff -Naur 1/edbrowse-master/src/eb.h 2/edbrowse-master/src/eb.h
--- 1/edbrowse-master/src/eb.h	2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/src/eb.h	2015-08-23 21:34:37.165011656 -0700
@@ -26,6 +26,7 @@
 #include <stdio.h>
 #include <errno.h>
 #include <fcntl.h>
+#include <tidy.h>
 #include <curl/curl.h>
 #ifdef DOSLIKE
 #include <io.h>
@@ -362,6 +363,7 @@
 	jsobjtype jcx;
 	jsobjtype winobj;
 	jsobjtype docobj;	/* window.document */
+	TidyDoc tdoc;           /* tidy5 html parser */
 	struct DBTABLE *table;	/* if in sqlMode */
 };
 extern struct ebWindow *cw;	/* current window */
diff -Naur 1/edbrowse-master/src/html.c 2/edbrowse-master/src/html.c
--- 1/edbrowse-master/src/html.c	2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/src/html.c	2015-08-24 16:17:20.031306748 -0700
@@ -1668,6 +1668,21 @@
 	int nopt;		/* number of options */
 	int intable = 0, inrow = 0;
 	bool tdfirst;
+       int TidyReturnValue; /* for Tidy methods that return
+success/failure */
+
+        // Tidy-related actions on incoming html
+
+        // At the moment, the goal is to get the parser into edbrowse
+        // and be able to call things without detrimental effect to
+        // any existing functionality
+
+        cw->tdoc = tidyCreate();
+printf("In case you wanted to know if this is the version with Tidy, it is");
+        // run tidyParseString here, or do something else
+        //TidyReturnValue = tidyParseString (tdoc,html);
+
+        // The use of Tidy ends here ---
 
 	ns = initString(&ns_l);
 	preamble = initString(&preamble_l);
diff -Naur 1/edbrowse-master/src/makefile 2/edbrowse-master/src/makefile
--- 1/edbrowse-master/src/makefile	2015-08-23 21:32:17.459104575 -0700
+++ 2/edbrowse-master/src/makefile	2015-08-24 16:45:40.857553878 -0700
@@ -32,7 +32,7 @@
 # Override JSLIB on the command-line, if your distro uses a different name.
 # E.G., make JSLIB=-lmozjs
 JSLIB = -lmozjs-24
-LDLIBS = -lpcre -lcurl -lreadline -lncurses
+LDLIBS = -lpcre -lcurl -lreadline -lncurses -ltidy
 
 #  Make the dynamically linked executable program by default.
 all: edbrowse edbrowse-js

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Edbrowse-dev] tidy5 and versions
  2015-08-24 11:54 [Edbrowse-dev] tidy5 and versions Karl Dahlke
  2015-08-25  0:24 ` Kevin Carhart
@ 2015-08-26 18:06 ` Chris Brannon
  1 sibling, 0 replies; 3+ messages in thread
From: Chris Brannon @ 2015-08-26 18:06 UTC (permalink / raw)
  To: Edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> Should we, and I kinda think we should, stamp another version, 3.5.4.2,
> before we jump into the tidy pool?

Yep, I haven't seen any objections, so I'll go ahead and push 3.5.4.2,
along with new static binaries.

-- Chris

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-08-26 18:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-24 11:54 [Edbrowse-dev] tidy5 and versions Karl Dahlke
2015-08-25  0:24 ` Kevin Carhart
2015-08-26 18:06 ` Chris Brannon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).