edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] html unicode translations in edbrowse
@ 2013-12-18 15:59 Karl Dahlke
  2013-12-18 17:06 ` Adam Thompson
  0 siblings, 1 reply; 4+ messages in thread
From: Karl Dahlke @ 2013-12-18 15:59 UTC (permalink / raw)
  To: Edbrowse-dev, acsint

This is a heads up of where we are headed, quite soon I hope.

My jupiter adapter will pronounce unicodes in utf8 in the tty buffer
according to pronunciations that you can set in the config file.
Here is an example, the start of Greek.

u945	alpha
u946	beta
u947	gamma

So when this code appears as 2 bytes in utf8 it is read alpha,
no matter how it got there.

How did I use to do it?
The html browser would turn the html code
α into the word alpha when rendering html.
See format.c line 1330
That works fine as long as I am browsing files from the web,
or html files that I wrote myself,
but if alpha beta gamma are in a document or from pdf or some other
source well I am just out of luck.
You can see at a glance that such things are better handled in the adapter.
It's a more general and flexible approach.

Once the latest version of Jupiter is pushed,
I may request of Chris that most or all
of those hard-coded translations in format.c go away,
and instead you just crank out the unicode that is implied by the html tag.
It's up to the adapter then to read it properly.
It's mostly deleting code that I'm happy to get rid of,
so should be no trouble.
The real test will be reading my math pages,
which are full of greek letters etc.

Thanks.

Karl Dahlke

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Edbrowse-dev] html unicode translations in edbrowse
  2013-12-18 15:59 [Edbrowse-dev] html unicode translations in edbrowse Karl Dahlke
@ 2013-12-18 17:06 ` Adam Thompson
  0 siblings, 0 replies; 4+ messages in thread
From: Adam Thompson @ 2013-12-18 17:06 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: acsint, Edbrowse-dev

On Wed, Dec 18, 2013 at 10:59:31AM -0500, Karl Dahlke wrote:
> My jupiter adapter will pronounce unicodes in utf8 in the tty buffer
> according to pronunciations that you can set in the config file.
> Here is an example, the start of Greek.
> 
> u945	alpha
> u946	beta
> u947	gamma
> 
> So when this code appears as 2 bytes in utf8 it is read alpha,
> no matter how it got there.

That sounds like a good idea.

> How did I use to do it?
> The html browser would turn the html code
> α into the word alpha when rendering html.
> See format.c line 1330
> That works fine as long as I am browsing files from the web,
> or html files that I wrote myself,
> but if alpha beta gamma are in a document or from pdf or some other
> source well I am just out of luck.
> You can see at a glance that such things are better handled in the adapter.
> It's a more general and flexible approach.

Again agreed.

> 
> Once the latest version of Jupiter is pushed,
> I may request of Chris that most or all
> of those hard-coded translations in format.c go away,
> and instead you just crank out the unicode that is implied by the html tag.
> It's up to the adapter then to read it properly.

This makes sense as long as the user's adapter does handle utf8.
I use speakup with espeak which seems to handle most things,
but probably not everything, and I've got no idea what those characters would
do to my braille display.

I'm not against the idea, but it may be worth remembering that edbrowse has a
wider user community than those using jupiter,
particularly as there's a debian package for edbrowse and not for jupiter (at
least not in the main repos).

Also, are you planning to ship an example list of these characters or do users
have to go through the utf8 charset to work out what's what?

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Edbrowse-dev] html unicode translations in edbrowse
  2013-12-18 18:45 Karl Dahlke
@ 2013-12-19 12:20 ` Adam Thompson
  0 siblings, 0 replies; 4+ messages in thread
From: Adam Thompson @ 2013-12-19 12:20 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Wed, Dec 18, 2013 at 01:45:11PM -0500, Karl Dahlke wrote:
> > I use speakup with espeak which seems to handle most things,
> 
> As I understand it it works well with 8859-1,
> which covers many western languages,
> but that would not include the high unicodes,
> so yes that would leave you out in the cold regarding
> alpha beta gamma and my other math symbols.

Yeah, testing by echoing utf8 in bash,
it totally fails to handle the alpha symbol.  It looks like it can't understand multi-byte codes and thus just interprets each byte.
> And I do appreciate this feedback; that's why I posted.
Thanks.

> On the other side, edbrowse renders these according to my taste,
> and in english, hard coded,
> so some of my French edbrowse users may not be thrilled with the word alpha.
> Who knows how that sounds on a french synthesizer.

Not sure, but I don't imagine it's particularly useful.

> So there's no clear right answere here;
> maybe we'll just leave edbrowse be for a while until we have
> a clear plan, or maybe a switch to turn these on or off.

The switch is a good idea, or some sort of auto-substitution list, kind of like you're doing with jupiter but in edbrowse?
This'd possibly generalise nicely if it can be added as I'm forever having to
run substitutions on pdfs and text files to fix things like this.
I'm not sure how that'd fit in the current design though.
Either that or ship an example unutf8 function in the example.ebrc.

This, combined with the ability to have a function run when a document is loaded (i.e. from a file or html, but not when creating a new buffer) would handle the current case as well as many more substitutions.
However this is turning into another feature request which probably needs more 
thought.

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Edbrowse-dev]  html unicode translations in edbrowse
@ 2013-12-18 18:45 Karl Dahlke
  2013-12-19 12:20 ` Adam Thompson
  0 siblings, 1 reply; 4+ messages in thread
From: Karl Dahlke @ 2013-12-18 18:45 UTC (permalink / raw)
  To: Edbrowse-dev

> I use speakup with espeak which seems to handle most things,

As I understand it it works well with 8859-1,
which covers many western languages,
but that would not include the high unicodes,
so yes that would leave you out in the cold regarding
alpha beta gamma and my other math symbols.
And I do appreciate this feedback; that's why I posted.

On the other side, edbrowse renders these according to my taste,
and in english, hard coded,
so some of my French edbrowse users may not be thrilled with the word alpha.
Who knows how that sounds on a french synthesizer.
So there's no clear right answere here;
maybe we'll just leave edbrowse be for a while until we have
a clear plan, or maybe a switch to turn these on or off.

Karl Dahlke

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-12-19 12:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-18 15:59 [Edbrowse-dev] html unicode translations in edbrowse Karl Dahlke
2013-12-18 17:06 ` Adam Thompson
2013-12-18 18:45 Karl Dahlke
2013-12-19 12:20 ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).