edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev]  html unicode translations in edbrowse
@ 2013-12-18 18:45 Karl Dahlke
  2013-12-19 12:20 ` Adam Thompson
  2013-12-21 18:00 ` [Edbrowse-dev] Javascript support MENGUAL Jean-Philippe
  0 siblings, 2 replies; 10+ messages in thread
From: Karl Dahlke @ 2013-12-18 18:45 UTC (permalink / raw)
  To: Edbrowse-dev

> I use speakup with espeak which seems to handle most things,

As I understand it it works well with 8859-1,
which covers many western languages,
but that would not include the high unicodes,
so yes that would leave you out in the cold regarding
alpha beta gamma and my other math symbols.
And I do appreciate this feedback; that's why I posted.

On the other side, edbrowse renders these according to my taste,
and in english, hard coded,
so some of my French edbrowse users may not be thrilled with the word alpha.
Who knows how that sounds on a french synthesizer.
So there's no clear right answere here;
maybe we'll just leave edbrowse be for a while until we have
a clear plan, or maybe a switch to turn these on or off.

Karl Dahlke

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Edbrowse-dev] html unicode translations in edbrowse
  2013-12-18 18:45 [Edbrowse-dev] html unicode translations in edbrowse Karl Dahlke
@ 2013-12-19 12:20 ` Adam Thompson
  2013-12-21 18:00 ` [Edbrowse-dev] Javascript support MENGUAL Jean-Philippe
  1 sibling, 0 replies; 10+ messages in thread
From: Adam Thompson @ 2013-12-19 12:20 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Wed, Dec 18, 2013 at 01:45:11PM -0500, Karl Dahlke wrote:
> > I use speakup with espeak which seems to handle most things,
> 
> As I understand it it works well with 8859-1,
> which covers many western languages,
> but that would not include the high unicodes,
> so yes that would leave you out in the cold regarding
> alpha beta gamma and my other math symbols.

Yeah, testing by echoing utf8 in bash,
it totally fails to handle the alpha symbol.  It looks like it can't understand multi-byte codes and thus just interprets each byte.
> And I do appreciate this feedback; that's why I posted.
Thanks.

> On the other side, edbrowse renders these according to my taste,
> and in english, hard coded,
> so some of my French edbrowse users may not be thrilled with the word alpha.
> Who knows how that sounds on a french synthesizer.

Not sure, but I don't imagine it's particularly useful.

> So there's no clear right answere here;
> maybe we'll just leave edbrowse be for a while until we have
> a clear plan, or maybe a switch to turn these on or off.

The switch is a good idea, or some sort of auto-substitution list, kind of like you're doing with jupiter but in edbrowse?
This'd possibly generalise nicely if it can be added as I'm forever having to
run substitutions on pdfs and text files to fix things like this.
I'm not sure how that'd fit in the current design though.
Either that or ship an example unutf8 function in the example.ebrc.

This, combined with the ability to have a function run when a document is loaded (i.e. from a file or html, but not when creating a new buffer) would handle the current case as well as many more substitutions.
However this is turning into another feature request which probably needs more 
thought.

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Edbrowse-dev] Javascript support
  2013-12-18 18:45 [Edbrowse-dev] html unicode translations in edbrowse Karl Dahlke
  2013-12-19 12:20 ` Adam Thompson
@ 2013-12-21 18:00 ` MENGUAL Jean-Philippe
  2013-12-22 15:42   ` Chris Brannon
  2014-01-08 12:50   ` Adam Thompson
  1 sibling, 2 replies; 10+ messages in thread
From: MENGUAL Jean-Philippe @ 2013-12-21 18:00 UTC (permalink / raw)
  To: Karl Dahlke, Edbrowse-dev

hi.


Can you tell me if some plan exist to enable edbrowse to support 
Javascript from libmozjs version 26? So far, it seems the libmozjs on 
the base on which edbrowse builds has build failures and runtime issues. 
It is a safety problem. Do you have some guidelines in this matter?

Regards,




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Edbrowse-dev] Javascript support
  2013-12-21 18:00 ` [Edbrowse-dev] Javascript support MENGUAL Jean-Philippe
@ 2013-12-22 15:42   ` Chris Brannon
  2013-12-22 16:48     ` Adam Thompson
  2014-01-08 12:50   ` Adam Thompson
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Brannon @ 2013-12-22 15:42 UTC (permalink / raw)
  To: edbrowse-dev

MENGUAL Jean-Philippe <mengualjeanphi@free.fr> writes:
>
> Can you tell me if some plan exist to enable edbrowse to support
> Javascript from libmozjs version 26?

Hi.  I don't know anything about libmozjs version 26, but I'll look into
it.  It isn't even packaged for my distro though.  Any idea what is
needed to build with it?

-- Chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Edbrowse-dev] Javascript support
  2013-12-22 15:42   ` Chris Brannon
@ 2013-12-22 16:48     ` Adam Thompson
  2013-12-22 18:36       ` Chris Brannon
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Thompson @ 2013-12-22 16:48 UTC (permalink / raw)
  To: Chris Brannon; +Cc: edbrowse-dev

Hi,
On Sun, Dec 22, 2013 at 07:42:27AM -0800, Chris Brannon wrote:
> MENGUAL Jean-Philippe <mengualjeanphi@free.fr> writes:
> >
> > Can you tell me if some plan exist to enable edbrowse to support
> > Javascript from libmozjs version 26?
> 
> Hi.  I don't know anything about libmozjs version 26, but I'll look into
> it.  It isn't even packaged for my distro though.  Any idea what is
> needed to build with it?

According to the mozilla website the most recent version is version 24 [1].
My initial investigations into building edbrowse against this yielded this
quote from the mozilla website [2]:
"SpiderMonkey now provides a fully C++ interface,
so embedders relying on embeddability in C projects will have to convert to
C++, or implement their own adapter code."

This, along with my initial attempts to build the current edbrowse with g++
indicate that this is probably not going to be a small change unless we
implement our own "adapter code" to interface with the engine and even then I
suspect there'll be quite a lot of work involved.

It's a good point about the old version of the engine used by edbrowse though,
on Debian it's built (I believe) against version 1.85 which is seriously old now.

For what it's worth my thoughts on this are:
- it's going to be a lot of work to switch to the new mozilla engine
- we probably should switch to something newer than the current SpiderMonkey version
for security reasons as well as supporting newer javascript language features
- I've yet to find a javascript engine which is both mature and provides a pure
C api (or anything which'd compile with a C compiler)
- we shouldn't go back to a hand-made javascript engine

I don't have a massive amount of time (due to university work)
but I may be able to help with development as I've done quite a bit of C and C++.

Cheers,
Adam.
[1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey
[2] https://developer.mozilla.org/en-US/docs/SpiderMonkey/24

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Edbrowse-dev] Javascript support
  2013-12-22 16:48     ` Adam Thompson
@ 2013-12-22 18:36       ` Chris Brannon
  2013-12-22 19:10         ` Adam Thompson
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Brannon @ 2013-12-22 18:36 UTC (permalink / raw)
  To: edbrowse-dev

Adam Thompson <arthompson1990@gmail.com> writes:

> "SpiderMonkey now provides a fully C++ interface,
> so embedders relying on embeddability in C projects will have to convert to
> C++, or implement their own adapter code."

Hi,
Thank you ever so much for doing the research.

Years ago, we batted around the idea of rewriting edbrowse in C++.
Those string objects would sure be nice, and after all, edbrowse is very
string-intensive.  We also use lists in some places, and we could use
STL's list container instead.  But it never happened.  C++ is a
complicated beast.  The people who work on edbrowse are more comfortable
with C.  I used to be decent with C++, but I've managed to forget most
of what I knew.  So I'm not chomping at the bit to do this.  I wonder
how difficult the adapter code will be?  I assume it'll just be tedious.

Happy Holidays,
-- Chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Edbrowse-dev] Javascript support
  2013-12-22 18:36       ` Chris Brannon
@ 2013-12-22 19:10         ` Adam Thompson
  0 siblings, 0 replies; 10+ messages in thread
From: Adam Thompson @ 2013-12-22 19:10 UTC (permalink / raw)
  To: Chris Brannon; +Cc: edbrowse-dev

On Sun, Dec 22, 2013 at 10:36:56AM -0800, Chris Brannon wrote:
> Adam Thompson <arthompson1990@gmail.com> writes:
> 
> > "SpiderMonkey now provides a fully C++ interface,
> > so embedders relying on embeddability in C projects will have to convert to
> > C++, or implement their own adapter code."
> 
> Thank you ever so much for doing the research.

That's ok, I've been thinking of looking at this for a while,
though it's not until now that I've had the time.

> Years ago, we batted around the idea of rewriting edbrowse in C++.

Yeah, I noticed it in the todo.

> Those string objects would sure be nice, and after all, edbrowse is very
> string-intensive.  We also use lists in some places, and we could use
> STL's list container instead.  But it never happened.  C++ is a
> complicated beast.  The people who work on edbrowse are more comfortable
> with C.  I used to be decent with C++, but I've managed to forget most
> of what I knew.  So I'm not chomping at the bit to do this.

I also prefer working in C, but have been forced to become fairly decent in c++ in
order to use certain c++-only libraries.
Honestly, although the string objects would be nice in terms of moving code out
of edbrowse, and the same for stl lists,
personally I've never really minded handling this kind of thing in C. Also as you
said c++ brings its own set of complications.

> I wonder how difficult the adapter code will be?  I assume it'll just be tedious.

I've got an idea how to do it whilst hopefully minimising the changes.
I think the main files to alter are jsdom.c and jsloc.c.
I'd then provide a jsdom.h and jsloc.h (or something)
which is included in the rest of the program.
This, with a few more changes, should *hopefully* allow the c++ code to be
isolated to these 2 files.
In the makefile, these would then be built with the c++ compiler,
with the rest of the project still built using the c compiler.
This assumes of course that there aren't additional functions in other files
which require bits of the javascript library.
If so, they'd have to be added to the adapter api.

This approach'd also have the nice side-effect that,
if in the future we decide to abandon SpiderMonkey altogether,
we'd only have to change the implementation of the api exposed to edbrowse.
At least that's the theory.

Does this sound sensible?  Anything I've missed?

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Edbrowse-dev] Javascript support
  2013-12-21 18:00 ` [Edbrowse-dev] Javascript support MENGUAL Jean-Philippe
  2013-12-22 15:42   ` Chris Brannon
@ 2014-01-08 12:50   ` Adam Thompson
  1 sibling, 0 replies; 10+ messages in thread
From: Adam Thompson @ 2014-01-08 12:50 UTC (permalink / raw)
  To: MENGUAL Jean-Philippe; +Cc: Edbrowse-dev

Hi,

On Sat, Dec 21, 2013 at 07:00:58PM +0100, MENGUAL Jean-Philippe wrote:
> Can you tell me if some plan exist to enable edbrowse to support
> Javascript from libmozjs version 26? So far, it seems the libmozjs
> on the base on which edbrowse builds has build failures and runtime
> issues. It is a safety problem. Do you have some guidelines in this
> matter?

I've been working on getting edbrowse built against a new libmozjs,
but couldn't find the 26 sources. I have, however,
got a version which builds against mozjs 24 (the latest version from the website).

Attempting to build against the experimental libmozjs26d package in Debian
yielded the problem that the jsapi.h under /usr/include/mozjs is a broken
symlink to something in /tmp (I don't know the exact path off the top of my
head, but can re-install the relevant dev package if this isn't already a reported bug).
This problem is present in both the i386 and amd64 packages.
In addition, although the altered version of edbrowse builds against a version
of mozjs 24 I compiled (using the --enable-optimize and --disable-debug
configure options), building against the debian supplied libmozjs24d package
causes the program to segfault. Could you please tell me what the correct
options are to compile a mozjs 24 as per the debian package so I can check if
this is an edbrowse issue or something in the Debian package?

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Edbrowse-dev] html unicode translations in edbrowse
  2013-12-18 15:59 [Edbrowse-dev] html unicode translations in edbrowse Karl Dahlke
@ 2013-12-18 17:06 ` Adam Thompson
  0 siblings, 0 replies; 10+ messages in thread
From: Adam Thompson @ 2013-12-18 17:06 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: acsint, Edbrowse-dev

On Wed, Dec 18, 2013 at 10:59:31AM -0500, Karl Dahlke wrote:
> My jupiter adapter will pronounce unicodes in utf8 in the tty buffer
> according to pronunciations that you can set in the config file.
> Here is an example, the start of Greek.
> 
> u945	alpha
> u946	beta
> u947	gamma
> 
> So when this code appears as 2 bytes in utf8 it is read alpha,
> no matter how it got there.

That sounds like a good idea.

> How did I use to do it?
> The html browser would turn the html code
> &#945; into the word alpha when rendering html.
> See format.c line 1330
> That works fine as long as I am browsing files from the web,
> or html files that I wrote myself,
> but if alpha beta gamma are in a document or from pdf or some other
> source well I am just out of luck.
> You can see at a glance that such things are better handled in the adapter.
> It's a more general and flexible approach.

Again agreed.

> 
> Once the latest version of Jupiter is pushed,
> I may request of Chris that most or all
> of those hard-coded translations in format.c go away,
> and instead you just crank out the unicode that is implied by the html tag.
> It's up to the adapter then to read it properly.

This makes sense as long as the user's adapter does handle utf8.
I use speakup with espeak which seems to handle most things,
but probably not everything, and I've got no idea what those characters would
do to my braille display.

I'm not against the idea, but it may be worth remembering that edbrowse has a
wider user community than those using jupiter,
particularly as there's a debian package for edbrowse and not for jupiter (at
least not in the main repos).

Also, are you planning to ship an example list of these characters or do users
have to go through the utf8 charset to work out what's what?

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Edbrowse-dev] html unicode translations in edbrowse
@ 2013-12-18 15:59 Karl Dahlke
  2013-12-18 17:06 ` Adam Thompson
  0 siblings, 1 reply; 10+ messages in thread
From: Karl Dahlke @ 2013-12-18 15:59 UTC (permalink / raw)
  To: Edbrowse-dev, acsint

This is a heads up of where we are headed, quite soon I hope.

My jupiter adapter will pronounce unicodes in utf8 in the tty buffer
according to pronunciations that you can set in the config file.
Here is an example, the start of Greek.

u945	alpha
u946	beta
u947	gamma

So when this code appears as 2 bytes in utf8 it is read alpha,
no matter how it got there.

How did I use to do it?
The html browser would turn the html code
&#945; into the word alpha when rendering html.
See format.c line 1330
That works fine as long as I am browsing files from the web,
or html files that I wrote myself,
but if alpha beta gamma are in a document or from pdf or some other
source well I am just out of luck.
You can see at a glance that such things are better handled in the adapter.
It's a more general and flexible approach.

Once the latest version of Jupiter is pushed,
I may request of Chris that most or all
of those hard-coded translations in format.c go away,
and instead you just crank out the unicode that is implied by the html tag.
It's up to the adapter then to read it properly.
It's mostly deleting code that I'm happy to get rid of,
so should be no trouble.
The real test will be reading my math pages,
which are full of greek letters etc.

Thanks.

Karl Dahlke

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-01-08 12:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-18 18:45 [Edbrowse-dev] html unicode translations in edbrowse Karl Dahlke
2013-12-19 12:20 ` Adam Thompson
2013-12-21 18:00 ` [Edbrowse-dev] Javascript support MENGUAL Jean-Philippe
2013-12-22 15:42   ` Chris Brannon
2013-12-22 16:48     ` Adam Thompson
2013-12-22 18:36       ` Chris Brannon
2013-12-22 19:10         ` Adam Thompson
2014-01-08 12:50   ` Adam Thompson
  -- strict thread matches above, loose matches on Subject: below --
2013-12-18 15:59 [Edbrowse-dev] html unicode translations in edbrowse Karl Dahlke
2013-12-18 17:06 ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).