* Re: [Edbrowse-dev] tidy5 and versions
2015-08-24 11:54 [Edbrowse-dev] tidy5 and versions Karl Dahlke
@ 2015-08-25 0:24 ` Kevin Carhart
2015-08-26 18:06 ` Chris Brannon
1 sibling, 0 replies; 3+ messages in thread
From: Kevin Carhart @ 2015-08-25 0:24 UTC (permalink / raw)
To: Edbrowse-dev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1418 bytes --]
And a note to Adam, we've hashed this patch out offlist, but if you have
any critiques on this, please fire away. It's just a few lines and
straightforward, but as a patch-submission newbie I can use the
multifaceted scrutiny, and would like to know what you think.
Thanks
Kevin
On Mon, 24 Aug 2015, Karl Dahlke wrote:
> We stand on the edge of pushing a change that will require tidy5.
> It's cautious, doesn't do anything except run the html through tidy,
> in parallel with everything else we are doing,
> then free the tidy tree when the window is freed.
> Just to get us started, to make sure tidy doesn't seg fault etc.
> But it will change the way edbrowse is built.
> We now need another library etc.
> Should we, and I kinda think we should, stamp another version, 3.5.4.2,
> before we jump into the tidy pool?
> Some work has been done since 3.5.4.1, some bug fixes, some cosmetics,
> and the framework for imap, including a simple move delete interface.
> Chris before you push Kevin's tidy patch, maybe stamp 3.5.4.2.
>
> After we are using tidy to parse html,
> and I hope this isn't a long time coming, we may want to jump up to 3.6.
>
> Karl Dahlke
> _______________________________________________
> Edbrowse-dev mailing list
> Edbrowse-dev@lists.the-brannons.com
> http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev
>
--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists
[-- Attachment #2: Type: TEXT/PLAIN, Size: 3638 bytes --]
diff -Naur 1/edbrowse-master/README 2/edbrowse-master/README
--- 1/edbrowse-master/README 2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/README 2015-08-23 21:46:42.783741131 -0700
@@ -73,6 +73,19 @@
If you have to compile curl from source, be sure to specify
--ENABLE-VERSION-SYMBOLS (or some such) at the configure script.
+Edbrowse now uses the Tidy HTML parser. So there are a couple
+of things to install for this prerequisite.
+The Tidy compilation process uses cmake. Please either use your
+package manager to get cmake (for instance, apt-get install cmake),
+or follow the instructions at http://www.cmake.org/download/
+
+Once you have cmake, download the latest Tidy code from:
+https://github.com/htacg/tidy-html5/archive/master.zip
+Unzip and cd to build/cmake
+cmake ../..
+make install
+Now the latest Tidy library will be available to edbrowse.
+
Finally, you need the Spider Monkey javascript engine from Mozilla.org
ftp://ftp.mozilla.org/pub/mozilla.org/js/
Edbrowse 3.5.1 and higher requires Mozilla js version 2.4.
diff -Naur 1/edbrowse-master/src/buffers.c 2/edbrowse-master/src/buffers.c
--- 1/edbrowse-master/src/buffers.c 2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/src/buffers.c 2015-08-24 16:02:18.351550150 -0700
@@ -583,6 +583,7 @@
nzFree(w->firstURL);
nzFree(w->referrer);
nzFree(w->baseDirName);
+ tidyRelease(w->tdoc);
free(w);
} /* freeWindow */
diff -Naur 1/edbrowse-master/src/eb.h 2/edbrowse-master/src/eb.h
--- 1/edbrowse-master/src/eb.h 2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/src/eb.h 2015-08-23 21:34:37.165011656 -0700
@@ -26,6 +26,7 @@
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
+#include <tidy.h>
#include <curl/curl.h>
#ifdef DOSLIKE
#include <io.h>
@@ -362,6 +363,7 @@
jsobjtype jcx;
jsobjtype winobj;
jsobjtype docobj; /* window.document */
+ TidyDoc tdoc; /* tidy5 html parser */
struct DBTABLE *table; /* if in sqlMode */
};
extern struct ebWindow *cw; /* current window */
diff -Naur 1/edbrowse-master/src/html.c 2/edbrowse-master/src/html.c
--- 1/edbrowse-master/src/html.c 2015-08-23 01:46:57.000000000 -0700
+++ 2/edbrowse-master/src/html.c 2015-08-24 16:17:20.031306748 -0700
@@ -1668,6 +1668,21 @@
int nopt; /* number of options */
int intable = 0, inrow = 0;
bool tdfirst;
+ int TidyReturnValue; /* for Tidy methods that return
+success/failure */
+
+ // Tidy-related actions on incoming html
+
+ // At the moment, the goal is to get the parser into edbrowse
+ // and be able to call things without detrimental effect to
+ // any existing functionality
+
+ cw->tdoc = tidyCreate();
+printf("In case you wanted to know if this is the version with Tidy, it is");
+ // run tidyParseString here, or do something else
+ //TidyReturnValue = tidyParseString (tdoc,html);
+
+ // The use of Tidy ends here ---
ns = initString(&ns_l);
preamble = initString(&preamble_l);
diff -Naur 1/edbrowse-master/src/makefile 2/edbrowse-master/src/makefile
--- 1/edbrowse-master/src/makefile 2015-08-23 21:32:17.459104575 -0700
+++ 2/edbrowse-master/src/makefile 2015-08-24 16:45:40.857553878 -0700
@@ -32,7 +32,7 @@
# Override JSLIB on the command-line, if your distro uses a different name.
# E.G., make JSLIB=-lmozjs
JSLIB = -lmozjs-24
-LDLIBS = -lpcre -lcurl -lreadline -lncurses
+LDLIBS = -lpcre -lcurl -lreadline -lncurses -ltidy
# Make the dynamically linked executable program by default.
all: edbrowse edbrowse-js
^ permalink raw reply [flat|nested] 3+ messages in thread