edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] andTranslate
@ 2014-02-26 13:06 Karl Dahlke
  2014-02-26 14:04 ` Chris Brannon
  0 siblings, 1 reply; 4+ messages in thread
From: Karl Dahlke @ 2014-02-26 13:06 UTC (permalink / raw)
  To: Edbrowse-dev

There is a function in format.c called andTranslate().
It takes meta-characters like &whatever; in html and turns it into
the symbol whatever.
A common example is < for the less than sign,
because a bare less than sign is the beginning of an html tag.
Every literal less than sign has to be encoded in this way.
Thus &lt; becomes <
I turn it into the character <, not the words less than or some such thing,
because every screen reader and every adapter will read the less than sign,
as you want it read, in your language.
I don't want to mess with that.
But the hiher unicodes I sometimes turn into words, English words,
unfortunately hard coded in format.c,
because screen readers may not know what to do with those unicodes.
On the other hand, more and more readers are configurable,
to render these high unicodes as you wish,
and I take that power away from the user by translating them into my own
words in format.c.

I propose that andTranslate turn every &whatever; symbol into its utf8
equivalent, and that's all.
Beyond this however, you could have in your .ebrc config file lines like

&#947 gamma

This would override the simple utf8 translation.
It would let you put in your own words if your screen reader or system
simply doesn't handle those unicodes well.
Or if you are dumping formatted html to text and would rather have it in words.
What do you think?

Of course this qualifies as a new feature, and I need not jump into it now.
We should probably continue with bug fixes and the debian confusion,
which I am very disappointed that they aren't helping us out here.
We're doing 95% of the work, and they can't come forward
with some information on how they build their libraries etc??
Well that's another story I guess.

Karl Dahlke

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Edbrowse-dev] andTranslate
  2014-02-26 13:06 [Edbrowse-dev] andTranslate Karl Dahlke
@ 2014-02-26 14:04 ` Chris Brannon
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Brannon @ 2014-02-26 14:04 UTC (permalink / raw)
  To: Edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> I propose that andTranslate turn every &whatever; symbol into its utf8
> equivalent, and that's all.

I like this idea, but I think we have a problem since we
support both source documents and output in ISO-8859-1.
Here's a scenario.
The user's console charset is ISO-8859-1.  We receive a source document
that is also ISO-8859-1.  It is never converted to UTF-8.  If
andTranslate starts inserting UTF-8 into the formatted text, we'll be
mixing character sets.
In order to use UTF-8 in andTranslate, I think we need to expect that
the user's console is always UTF-8 capable.  All formatted text needs to
be UTF-8.  Maybe that's not an unreasonable expectation in 2014.  UTF-8
has been around since 1993, and the unibyte charsets should have gone
the way of the dinosaur years ago.

Allowing the user to configure the translations would also be a nice
touch!

-- Chris

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Edbrowse-dev] andTranslate
  2014-02-26 14:10 Karl Dahlke
@ 2014-02-26 17:17 ` Adam Thompson
  0 siblings, 0 replies; 4+ messages in thread
From: Adam Thompson @ 2014-02-26 17:17 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]

On Wed, Feb 26, 2014 at 09:10:01AM -0500, Karl Dahlke wrote:
> Yeah, if the user's console is iso 8859-1, I would embed the unicode if less than 256,
> or just put a question mark if higher and untranslated in the config file.
> I, like you, think this is not a big problem;
> almost everyone is utf8 by now.

I hope so as otherwise some of the js stuff's going to do odd things.
I remember this came up when we were first playing with mozjs 24 and the
decision was that it shouldn't be a problem for js strings (intended for console output) so we may as well
take the same approach here.

I like the ability to translate things in the config file.
I almost wonder if it wouldn't be nicer to have some more generic utf8
translation mechanism available (haven't really thought how to handle input),
perhaps available as a toggle.
This'd help those of us who use edbrowse on odd consoles (or ones which claim to be utf8 but won't display it) or with screenreaders
which don't do utf8. It'd be kind of nice to tell edbrowse to 
translate the utf8 in whatever I'm reading into whatever form I've
told it to use with a single command, rather than a set of substitutions.

I know I can sit down with an utf8 table and set up all the substitutions as a
function, but if we're going to do the mechanism anyway I wonder if we can move
it from format.c to somewhere else.

This way the mechanism would be:
- Do all the mandatory html translations (&lt; &gt; etc) as now in format.c
- put any utf8 in the output
- if console is iso8859 translate according to the existing rules and the config file
- if not and the user wants utf8 translation (broken console,
screen reader or whatever) perform just the config file driven part of the translation

If I remember rightly the non-utf8 console step is performed for all files at
the moment, and I'd like to put the utf8 but broken step as a command to be ran
on non-html files as well.

As you say though this is a potentially large change and should probably wait
till the next release (or at least till we've thought through everything).

Karl or Chris, is there any chance one (or both)
of you could send an email to our debian contact as I've got nothing back from my email.

Cheers,
Adam.
PS: I'll try emailing them again when I get time

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Edbrowse-dev]  andTranslate
@ 2014-02-26 14:10 Karl Dahlke
  2014-02-26 17:17 ` Adam Thompson
  0 siblings, 1 reply; 4+ messages in thread
From: Karl Dahlke @ 2014-02-26 14:10 UTC (permalink / raw)
  To: Edbrowse-dev

Yeah, if the user's console is iso 8859-1, I would embed the unicode if less than 256,
or just put a question mark if higher and untranslated in the config file.
I, like you, think this is not a big problem;
almost everyone is utf8 by now.

Karl Dahlke

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-26 17:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 13:06 [Edbrowse-dev] andTranslate Karl Dahlke
2014-02-26 14:04 ` Chris Brannon
2014-02-26 14:10 Karl Dahlke
2014-02-26 17:17 ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).