From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from qmta12.westchester.pa.mail.comcast.net (qmta12.westchester.pa.mail.comcast.net [IPv6:2001:558:fe14:44:76:96:59:227]) by hurricane.the-brannons.com (Postfix) with ESMTP id 3F1D977B08 for ; Wed, 18 Dec 2013 07:59:37 -0800 (PST) Received: from omta12.westchester.pa.mail.comcast.net ([76.96.62.44]) by qmta12.westchester.pa.mail.comcast.net with comcast id 30r61n00A0xGWP85C3zYGK; Wed, 18 Dec 2013 15:59:32 +0000 Received: from eklhad ([107.5.36.150]) by omta12.westchester.pa.mail.comcast.net with comcast id 33zY1n0053EMmQj3Y3zY6G; Wed, 18 Dec 2013 15:59:32 +0000 To: Edbrowse-dev@lists.the-brannons.com, acsint@lists.the-brannons.com From: Karl Dahlke User-Agent: edbrowse/3.4.9 Date: Wed, 18 Dec 2013 10:59:31 -0500 Message-ID: <20131118105931.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1387382372; bh=YdqUjiN/c2AcZfLnyFULqzbulZNT+YwIt3MHNl2CaBk=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=XckhfmGN35PTeJ8myU+GJp5LuOMeUZJ22bAX5lhL0MOTgnSWkfm17VBxJwGAkcyXg BNwaoguiogC41BmKXTdcYFvCjZlLsG2N/jiYQds0codcu1lR/p+Ua3kmYwyndp/Ajx rfaMeQkGE0i4nXlfWodDnjHWL7ZQOMqnSSDhE51n/OWWvR9wVpizB+6bjzXfBW3C5N UCYthZI9dlAqMRdZ8cqvBlECd93844abAlobl8HCzWZJhfx0r+5SrOmN+IRsv9DOfR 4eRIIu53/IW01hZLqI/WtE8hKKlJSvvZWf1KkGjVo1USba57+u2M2GZmHdjCPAla6u tTw2ORzMZv42Q== Subject: [Edbrowse-dev] html unicode translations in edbrowse X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.17 Precedence: list Reply-To: Karl Dahlke List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Dec 2013 15:59:37 -0000 This is a heads up of where we are headed, quite soon I hope. My jupiter adapter will pronounce unicodes in utf8 in the tty buffer according to pronunciations that you can set in the config file. Here is an example, the start of Greek. u945 alpha u946 beta u947 gamma So when this code appears as 2 bytes in utf8 it is read alpha, no matter how it got there. How did I use to do it? The html browser would turn the html code α into the word alpha when rendering html. See format.c line 1330 That works fine as long as I am browsing files from the web, or html files that I wrote myself, but if alpha beta gamma are in a document or from pdf or some other source well I am just out of luck. You can see at a glance that such things are better handled in the adapter. It's a more general and flexible approach. Once the latest version of Jupiter is pushed, I may request of Chris that most or all of those hard-coded translations in format.c go away, and instead you just crank out the unicode that is implied by the html tag. It's up to the adapter then to read it properly. It's mostly deleting code that I'm happy to get rid of, so should be no trouble. The real test will be reading my math pages, which are full of greek letters etc. Thanks. Karl Dahlke