* Greek in luatex @ 2007-09-01 10:56 Thomas A. Schmitz 2007-09-11 6:47 ` Thomas A. Schmitz 0 siblings, 1 reply; 31+ messages in thread From: Thomas A. Schmitz @ 2007-09-01 10:56 UTC (permalink / raw) To: mailing list for ConTeXt users Hi all, I've been experimenting with my Greek stuff in luatex, and I think I'm making nice progress. Things pretty much work with Unicode input, and as soon as the kerning problem is solved, I'm very optimistic. Two questions came up for me; I assume the answers are straightforward, but couldn't find anything: 1. How can I remap single characters? Let's say that we have a Unicode character in the input stream that maps to 0x03c3, but I want it remapped to 0x3f2, how can this be achieved? 2. Similarly: if I want to support the legacy input method babel, I need to remap the input stream to the Greek characters (question 1) and also need to feed the font some ligature rules, such as: the combination >a needs to be combined into the character 0x1f00. What would be the syntax and the way to do this? All best Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-01 10:56 Greek in luatex Thomas A. Schmitz @ 2007-09-11 6:47 ` Thomas A. Schmitz 2007-09-11 10:12 ` Hans Hagen 2007-09-13 1:15 ` Arthur Reutenauer 0 siblings, 2 replies; 31+ messages in thread From: Thomas A. Schmitz @ 2007-09-11 6:47 UTC (permalink / raw) To: mailing list for ConTeXt users OK, the message below didn't get too many responses, so maybe I can rephrase my quiestions in a more precise manner: 1. For otftotfm, there's the "unicoding" command where you can replace a character in a certain slot with another unicode character, so you could say unicoding "A = uni03D1" Is anything like this possible in luatex? 2. I see this code in font-otf.lua: fonts.otf.features.data.tex = { { "endash", "hyphen hyphen" }, { "emdash", "hyphen hyphen hyphen" }, { "quotedblleft", "quoteleft quoteleft" }, { "quotedblright", "quoteright quoteright" }, { "quotedblleft", "grave grave" }, { "quotedblright", "quotesingle quotesingle" }, { "quotedblbase", "comma comma" } } and this list is used in the function function fonts.initializers.base.otf.texligatures(tfm,value) How is it possible to write a similar list and function for just a single font or fonts in a specific typescript? Thanks a lot! Thomas On Sep 1, 2007, at 12:56 PM, Thomas A. Schmitz wrote: > Hi all, > > I've been experimenting with my Greek stuff in luatex, and I think > I'm making nice progress. Things pretty much work with Unicode input, > and as soon as the kerning problem is solved, I'm very optimistic. > Two questions came up for me; I assume the answers are > straightforward, but couldn't find anything: > > 1. How can I remap single characters? Let's say that we have a > Unicode character in the input stream that maps to 0x03c3, but I want > it remapped to 0x3f2, how can this be achieved? > > 2. Similarly: if I want to support the legacy input method babel, I > need to remap the input stream to the Greek characters (question 1) > and also need to feed the font some ligature rules, such as: the > combination >a needs to be combined into the character 0x1f00. What > would be the syntax and the way to do this? > > All best > > Thomas > ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-11 6:47 ` Thomas A. Schmitz @ 2007-09-11 10:12 ` Hans Hagen 2007-09-13 1:15 ` Arthur Reutenauer 1 sibling, 0 replies; 31+ messages in thread From: Hans Hagen @ 2007-09-11 10:12 UTC (permalink / raw) To: mailing list for ConTeXt users Thomas A. Schmitz wrote: > OK, the message below didn't get too many responses, so maybe I can > rephrase my quiestions in a more precise manner: > > 1. For otftotfm, there's the "unicoding" command where you can > replace a character in a certain slot with another unicode character, > so you could say > unicoding "A = uni03D1" > Is anything like this possible in luatex? sure, but it depends a bit on what level ... font driven or not if you have open type fonts, you can add features on the fly ... \starttext \installfontfeature[otf][verb] \definefontfeature [test] [mode=node,language=dflt,script=latn, verb=yes,featurefile=verbose-digits.fea] {\font\test=name:lmroman10regular*test at 20pt \test 1 2 3 4} \ctxlua{characters.context.show(\number"00AB)} \stoptext this replaces 1 by one and 2 by two ... the file verbose-digits.fea in in the distribution and an example of a fontforge specification file > 2. I see this code in font-otf.lua: > fonts.otf.features.data.tex = { > { "endash", "hyphen hyphen" }, > { "emdash", "hyphen hyphen hyphen" }, > { "quotedblleft", "quoteleft quoteleft" }, > { "quotedblright", "quoteright quoteright" }, > { "quotedblleft", "grave grave" }, > { "quotedblright", "quotesingle quotesingle" }, > { "quotedblbase", "comma comma" } > } that's ligatures and there for backward compatibility (hm, makes me wonder if it makes more sense to do that using feature files) > and this list is used in the function function > fonts.initializers.base.otf.texligatures(tfm,value) > > How is it possible to write a similar list and function for just a > single font or fonts in a specific typescript? in principle you can add lua code in typescripts and then register that as a feature (so, texligatures or tlig is one of them, as is lineheight) it all depends on how generic things are; we can think of features like remap=name-of-remap-vector (keep in mind that this operates on node lists then; rencoding the input i.e. regimes is done differently) so .. just write down detailed specs -) Han ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-11 6:47 ` Thomas A. Schmitz 2007-09-11 10:12 ` Hans Hagen @ 2007-09-13 1:15 ` Arthur Reutenauer 2007-09-13 7:03 ` Taco Hoekwater 2007-09-13 9:45 ` Thomas A. Schmitz 1 sibling, 2 replies; 31+ messages in thread From: Arthur Reutenauer @ 2007-09-13 1:15 UTC (permalink / raw) To: mailing list for ConTeXt users [-- Attachment #1: Type: text/plain, Size: 4628 bytes --] Hello Thomas, I was waiting for someone else to answer your questions because I had no clue how to address them even if I was interested; but now I do, thanks to Hans' reply: For your general problem you need to define a new regime that will map each relevant character sequence to the corresponding Unicode character. That is, you inform ConTeXt that the character stream it sees is actually a way of coding another set of characters and that it can forget the original stream. This treatment should be done before any sort of font property intervenes, because it does not depend on the appearance of the typeset text. That's what regimes are for. Now I turn up to Hans to give us guidelines on how to define an advanced regime in Mark IV: Hans, what we need here is to replace sequences of characters by other characters, so the mapping is not one-to-one and it's more complicated than simple regimes defined by a table lookup; but I guess all we have to do is write a lua function that we could plug into the input stream reading routine (just like other regimes work). As far as the rest of Hans' reply is concerned (Opentype features and such), I would like to add that it is a very interesting and fascinating thing to do, but definitely not what you want here, for a lot of reasons: Opentype features can be used to alter the appearance of the text, but the not nature of characters themselves. That is, if you did the transformation of your input stream at the font level, you would actually tell ConTeXt that you are handling Latin characters with a special appearance (that the font takes care of), so for example, the underlying text in a PDF would be a stream of Latin characters, and copying-and-pasting would yield Latin characters, not Greek. That is not what you want here: you want your "a" to be understood as "alpha" and your "less-than acute-sign w vertical-bar" to be considered an "omega with dasia, varia and subscribed iota". Nor should you think of these transformations as a collection of ligatures (which act at the font level), but rather as a text encoding, just like UTF-8 is an encoding of the Unicode characters: in UTF-8 the byte sequence "hexadecimal byte E1, hexadecimal byte BC, hexadecimal byte 80" is the coding for the Unicode character U+1F00 GREEK SMALL LETTER ALPHA WITH PSILI, and in the Babel input scheme for Ancient Greek the same character is encoded with the byte sequence "hexadecimal byte 3C [ASCII '<'], hexadecimal byte 61 [ASCII 'a']". Of course in the past, these transformations were handled at the font level and sequences like "< a" were actually ligatures, because that was all we had (and copypasting from a PDF was, mostly, doomed to fail); but we should not persist in that use now we can treat them as real Unicode characters. As for your other question in your original message from September 1st (remapping single characters, for example U+03C3 to U+03F2), I have to say first that I'm not very comfortable commenting on it since I'm not quite sure what the issues are here; it may be that you have a simple variant of some character, and this you should handle at font level (some glyph being transformed into some other one); but if I am to judge by the very example you gave, I would deem this should be a part of your input regime: indeed, if every sigma is to be mapped to lunate sigma, then it probably means that the lunate sigmas are part of your character stream (even if you didn't input it directly). But I really can't give any general advice here, especially because I don't actually know what a lunate sigma really is ;-) You would have to decide for yourself as a specialist of Greek if you're dealing with really different characters or simple font variants; in the former case you should handle the transformation as a part of your regime; in the latter, by defining a font feature like Hans demonstrated. But for now, as long as it is understood that font tricks aren't the general solution for the problem at stake, I would like to demonstrate that it is still possible to do everything at font level :-) If you have a look at the attached greek-babel.tex (and the features definition file greek-babel.fea) you will see that (almost) everything is taken care of using Opentype substitutions. You need Bosporos and GFS Baskerville to compile the file; by the way, the line with GFS Baskerville is a further proof that you shouldn't handle the transformation at font level: can you explain why it doesn't work here? As a compliment, I also attach the Perl script which I wrote to generate the .fea file. Arthur [-- Attachment #2: greek-babel.tex --] [-- Type: text/x-tex, Size: 1161 bytes --] % For Thomas Schmitz. % Define a new Opentype feature to replace new Babel input scheme and use it % with some polytonic Greek fonts % Not quite complete; some rhos with breathings and accents are missing from % the .fea file (where are they?) and the final sigma isn't accounted for. \installfontfeature[otf][grbl] \definefontfeature [greek-babel] [mode=node,language=dflt,script=latn, grbl=yes,featurefile=greek-babel.fea] \font\grbask=name:GFSBaskerville*greek-babel at 20pt \font\bosphoros=name:BosporosU*greek-babel at 20pt \starttext \catcode`\~=11 \bosphoros Peis'istratis m'en o>~un >egkateg'hrase t~h| >arq~h| ka`i >ap'ejane nos'hsas >ep`i Fil'onew >'rqontos, af' o<~ou m`en kat'esth t`o pr~wton t'urannos >'eth tri'akonta ka`i tr'ia Bi'wsas, <`a d' >en t~h| >arq~h| di'emeinen <enos d'eonta e>'ikosi; >'efeuge g`ap t`a loip`a. % Don't do that! \grbask Peis'istratis m'en o>~un >egkateg'hrase t~h| >arq~h| ka`i >ap'ejane nos'hsas >ep`i Fil'onew >'rqontos, af' o<~ou m`en kat'esth t`o pr~wton t'urannos >'eth tri'akonta ka`i tr'ia Bi'wsas, <`a d' >en t~h| >arq~h| di'emeinen <enos d'eonta e>'ikosi; >'efeuge g`ap t`a loip`a. \stoptext [-- Attachment #3: greek-babel.fea --] [-- Type: text/plain, Size: 8471 bytes --] # An Opentype feature to replace the Babel input scheme # Not quite complete; some rhos with breathings and accents are missing (where # are they?) and the final sigma isn't accounted for. lookup GreekBabelLookupSimple { lookupflag 0 ; sub a by alpha ; sub b by beta ; sub g by gamma ; sub d by delta ; sub e by epsilon ; sub z by zeta ; sub h by eta ; sub j by theta ; sub i by iota ; sub k by kappa ; sub l by lambda ; sub m by mu ; sub n by nu ; sub x by xi ; sub o by omicron ; sub p by pi ; sub r by rho ; sub c by sigmafinal ; sub s by sigma ; sub t by tau ; sub u by upsilon ; sub f by phi ; sub q by chi ; sub y by psi ; sub w by omega ; sub A by Alpha ; sub B by Beta ; sub G by Gamma ; sub D by Delta ; sub E by Epsilon ; sub Z by Zeta ; sub H by Eta ; sub J by Theta ; sub I by Iota ; sub K by Kappa ; sub L by Lambda ; sub M by Mu ; sub N by Nu ; sub X by Xi ; sub O by Omicron ; sub P by Pi ; sub R by Rho ; sub C by Uni03C2 ; sub S by Sigma ; sub T by Tau ; sub U by Upsilon ; sub F by Phi ; sub Q by Chi ; sub Y by Psi ; sub W by Omega ; sub semicolon by periodcentered ; } GreekBabelLookupSimple ; lookup GreekBabelLookupMultiple { lookupflag 1 ; # sub s 'space by sigmafinal ; sub greater a by uni1F00 ; sub greater A by uni1F08 ; sub greater e by uni1F10 ; sub greater E by uni1F18 ; sub greater h by uni1F20 ; sub greater H by uni1F28 ; sub greater i by uni1F30 ; sub greater I by uni1F38 ; sub greater o by uni1F40 ; sub greater O by uni1F48 ; sub greater u by uni1F50 ; # sub greater U by uni1F58 ; sub greater w by uni1F60 ; sub greater W by uni1F68 ; sub greater grave a by uni1F02 ; sub greater grave A by uni1F0A ; sub greater grave e by uni1F12 ; sub greater grave E by uni1F1A ; sub greater grave h by uni1F22 ; sub greater grave H by uni1F2A ; sub greater grave i by uni1F32 ; sub greater grave I by uni1F3A ; sub greater grave o by uni1F42 ; sub greater grave O by uni1F4A ; sub greater grave u by uni1F52 ; # sub greater grave U by uni1F5A ; sub greater grave w by uni1F62 ; sub greater grave W by uni1F6A ; sub greater quotesingle a by uni1F04 ; sub greater quotesingle A by uni1F0C ; sub greater quotesingle e by uni1F14 ; sub greater quotesingle E by uni1F1C ; sub greater quotesingle h by uni1F24 ; sub greater quotesingle H by uni1F2C ; sub greater quotesingle i by uni1F34 ; sub greater quotesingle I by uni1F3C ; sub greater quotesingle o by uni1F44 ; sub greater quotesingle O by uni1F4C ; sub greater quotesingle u by uni1F54 ; sub greater quotesingle U by uni1F5C ; sub greater quotesingle w by uni1F64 ; sub greater quotesingle W by uni1F6C ; sub greater asciitilde a by uni1F06 ; sub greater asciitilde A by uni1F0E ; sub greater asciitilde e by uni1F16 ; sub greater asciitilde E by uni1F1E ; sub greater asciitilde h by uni1F26 ; sub greater asciitilde H by uni1F2E ; sub greater asciitilde i by uni1F36 ; sub greater asciitilde I by uni1F3E ; sub greater asciitilde o by uni1F46 ; sub greater asciitilde O by uni1F4E ; sub greater asciitilde u by uni1F56 ; sub greater asciitilde U by uni1F5E ; sub greater asciitilde w by uni1F66 ; sub greater asciitilde W by uni1F6E ; sub less a by uni1F01 ; sub less A by uni1F09 ; sub less e by uni1F11 ; sub less E by uni1F19 ; sub less h by uni1F21 ; sub less H by uni1F29 ; sub less i by uni1F31 ; sub less I by uni1F39 ; sub less o by uni1F41 ; sub less O by uni1F49 ; sub less u by uni1F51 ; sub less U by uni1F59 ; sub less w by uni1F61 ; sub less W by uni1F69 ; sub less grave a by uni1F03 ; sub less grave A by uni1F0B ; sub less grave e by uni1F13 ; sub less grave E by uni1F1B ; sub less grave h by uni1F23 ; sub less grave H by uni1F2B ; sub less grave i by uni1F33 ; sub less grave I by uni1F3B ; sub less grave o by uni1F43 ; sub less grave O by uni1F4B ; sub less grave u by uni1F53 ; sub less grave U by uni1F5B ; sub less grave w by uni1F63 ; sub less grave W by uni1F6B ; sub less quotesingle a by uni1F05 ; sub less quotesingle A by uni1F0D ; sub less quotesingle e by uni1F15 ; sub less quotesingle E by uni1F1D ; sub less quotesingle h by uni1F25 ; sub less quotesingle H by uni1F2D ; sub less quotesingle i by uni1F35 ; sub less quotesingle I by uni1F3D ; sub less quotesingle o by uni1F45 ; sub less quotesingle O by uni1F4D ; sub less quotesingle u by uni1F55 ; sub less quotesingle U by uni1F5D ; sub less quotesingle w by uni1F65 ; sub less quotesingle W by uni1F6D ; sub less asciitilde a by uni1F07 ; sub less asciitilde A by uni1F0F ; sub less asciitilde e by uni1F17 ; sub less asciitilde E by uni1F1F ; sub less asciitilde h by uni1F27 ; sub less asciitilde H by uni1F2F ; sub less asciitilde i by uni1F37 ; sub less asciitilde I by uni1F3F ; sub less asciitilde o by uni1F47 ; sub less asciitilde O by uni1F4F ; sub less asciitilde u by uni1F57 ; sub less asciitilde U by uni1F5F ; sub less asciitilde w by uni1F67 ; sub less asciitilde W by uni1F6F ; sub grave a by uni1F70 ; sub quotesingle a by uni1F71 ; sub grave e by uni1F72 ; sub quotesingle e by uni1F73 ; sub grave h by uni1F74 ; sub quotesingle h by uni1F75 ; sub grave i by uni1F76 ; sub quotesingle i by uni1F77 ; sub grave o by uni1F78 ; sub quotesingle o by uni1F79 ; sub grave u by uni1F7A ; sub quotesingle u by uni1F7B ; sub grave w by uni1F7C ; sub quotesingle w by uni1F7D ; sub grave A by uni1FBA ; sub quotesingle A by uni1FBB ; sub grave E by uni1FC8 ; sub quotesingle E by uni1FC9 ; sub grave H by uni1FCA ; sub quotesingle H by uni1FCB ; sub grave I by uni1FDA ; sub quotesingle I by uni1FDB ; sub grave U by uni1FEA ; sub quotesingle U by uni1FEB ; sub grave W by uni1FFA ; sub quotesingle W by uni1FFB ; sub greater a bar by uni1F80 ; sub greater A bar by uni1F88 ; sub greater h bar by uni1F90 ; sub greater H bar by uni1F98 ; sub greater w bar by uni1FA0 ; sub greater W bar by uni1FA8 ; sub greater grave a bar by uni1F82 ; sub greater grave A bar by uni1F8A ; sub greater grave h bar by uni1F92 ; sub greater grave H bar by uni1F9A ; sub greater grave w bar by uni1FA2 ; sub greater grave W bar by uni1FAA ; sub greater quotesingle a bar by uni1F84 ; sub greater quotesingle A bar by uni1F8C ; sub greater quotesingle h bar by uni1F94 ; sub greater quotesingle H bar by uni1F9C ; sub greater quotesingle w bar by uni1FA4 ; sub greater quotesingle W bar by uni1FAC ; sub greater asciitilde a bar by uni1F86 ; sub greater asciitilde A bar by uni1F8E ; sub greater asciitilde h bar by uni1F96 ; sub greater asciitilde H bar by uni1F9E ; sub greater asciitilde w bar by uni1FA6 ; sub greater asciitilde W bar by uni1FAE ; sub less a bar by uni1F81 ; sub less A bar by uni1F89 ; sub less h bar by uni1F91 ; sub less H bar by uni1F99 ; sub less w bar by uni1FA1 ; sub less W bar by uni1FA9 ; sub less grave a bar by uni1F83 ; sub less grave A bar by uni1F8B ; sub less grave h bar by uni1F93 ; sub less grave H bar by uni1F9B ; sub less grave w bar by uni1FA3 ; sub less grave W bar by uni1FAB ; sub less quotesingle a bar by uni1F85 ; sub less quotesingle A bar by uni1F8D ; sub less quotesingle h bar by uni1F95 ; sub less quotesingle H bar by uni1F9D ; sub less quotesingle w bar by uni1FA5 ; sub less quotesingle W bar by uni1FAD ; sub less asciitilde a bar by uni1F87 ; sub less asciitilde A bar by uni1F8F ; sub less asciitilde h bar by uni1F97 ; sub less asciitilde H bar by uni1F9F ; sub less asciitilde w bar by uni1FA7 ; sub less asciitilde W bar by uni1FAF ; sub grave a bar by uni1FB2 ; sub a bar by uni1FB3 ; sub quotesingle a bar by uni1FB4 ; sub grave h bar by uni1FC2 ; sub h bar by uni1FC3 ; sub quotesingle h bar by uni1FC4 ; sub grave w bar by uni1FD2 ; sub w bar by uni1FD3 ; sub quotesingle w bar by uni1FD4 ; sub asciitilde a by uni1FB6 ; sub asciitilde a bar by uni1FB7 ; sub asciitilde h by uni1FC6 ; sub asciitilde h bar by uni1FC7 ; sub asciitilde w by uni1FD6 ; sub asciitilde w bar by uni1FD7 ; sub greater r by uni1FE4 ; sub less r by uni1FE5 ; sub less R by uni1FEC ; } GreekBabelLookupMultiple ; feature grbl { script DFLT ; language dflt ; lookup GreekBabelLookupMultiple ; lookup GreekBabelLookupSimple ; script latn; language dflt ; lookup GreekBabelLookupMultiple ; lookup GreekBabelLookupSimple ; } grbl ; [-- Attachment #4: greek-babel.pdf --] [-- Type: application/pdf, Size: 12656 bytes --] [-- Attachment #5: babelify --] [-- Type: text/plain, Size: 4965 bytes --] #!/usr/bin/perl -W # Outputs GSUB rules for replacing Babel-inputted greek characters with their # Unicode value. # In Adobe Feature Language, suitable for use in Fontlab's .fea files. use strict ; use utf8 ; # Character types: breathings, accents, vowels # The void string is considered an accent for convenience with breathings my %charmask ; my $charshift = 8 ; my @breathings = ('greater', 'less') ; my @accents = ('', 'grave', 'quotesingle', 'asciitilde') ; my @vowels = ('a', 'e', 'h', 'i', 'o', 'u', 'w') ; # Unicode masks for characters with breathings $charmask{''} = 0 ; $charmask{'greater'} = 0 ; $charmask{'less'} = 1 ; $charmask{'grave'} = 2 ; $charmask{'quotesingle'} = 4 ; $charmask{'asciitilde'} = 6 ; $charmask{'a'} = 0x1F00 ; $charmask{'e'} = 0x1F10 ; $charmask{'h'} = 0x1F20 ; $charmask{'i'} = 0x1F30 ; $charmask{'o'} = 0x1F40 ; $charmask{'u'} = 0x1F50 ; $charmask{'w'} = 0x1F60 ; # Local variables my $breathing ; my $accent ; my $vowel ; my $uchar ; # First the U+1F00–U+1F6F sequence: breathing accent vowel # We compile the Unicode code points by simply ORing the mask of each element # Note that some of these characters actually don't exist! # But is was easier this way (we can always edit the output afterward) foreach $breathing (@breathings) { foreach $accent (@accents) { foreach $vowel (@vowels) { # Space cadet input scheme ;-) $uchar = $charmask{$breathing} | $charmask{$accent} | $charmask{$vowel} ; printf "sub $breathing $accent $vowel by uni%04X ;\n", $uchar ; # Uppercase characters: the same shifted 8. $uchar = $charmask{$breathing} | $charmask{$accent} | $charmask{$vowel} | $charshift ; printf "sub $breathing $accent %s by uni%04X ;\n", uc($vowel), $uchar ; } } } # The U+1F7x range: lowercase vowels with only one accent. # I have no idea why Unicode decided to put them there ... (especially seen as # the uppercase vowels are somewhere else, and in an even more clumsy # arrangement). # We have to change the masks $charmask{'grave'} = 0 ; $charmask{'quotesingle'} = 1 ; $charmask{'a'} = 0x1F70 ; $charmask{'e'} = 0x1F72 ; $charmask{'h'} = 0x1F74 ; $charmask{'i'} = 0x1F76 ; $charmask{'o'} = 0x1F78 ; $charmask{'u'} = 0x1F7A ; $charmask{'w'} = 0x1F7C ; foreach $vowel (@vowels) { foreach $accent ('grave', 'quotesingle') { $uchar = $charmask{$accent} | $charmask{$vowel} ; printf "sub $accent $vowel by uni%04X ;\n", $uchar ; } } # As announced before, the uppercase counterparts of these 14 characters are in # a delighfully crappy mess. Simply output them one by one. print "sub grave A by uni1FBA ;\n" ; print "sub quotesingle A by uni1FBB ;\n" ; print "sub grave E by uni1FC8 ;\n" ; print "sub quotesingle E by uni1FC9 ;\n" ; print "sub grave H by uni1FCA ;\n" ; print "sub quotesingle H by uni1FCB ;\n" ; print "sub grave I by uni1FDA ;\n" ; print "sub quotesingle I by uni1FDB ;\n" ; print "sub grave U by uni1FEA ;\n" ; print "sub quotesingle U by uni1FEB ;\n" ; print "sub grave W by uni1FFA ;\n" ; print "sub quotesingle W by uni1FFB ;\n" ; # U+1F80–U+1FAF: characters with subscribed iotas and breathings. # We have to change the masks once again. $charmask{'grave'} = 2 ; $charmask{'quotesingle'} = 4 ; $charmask{'a'} = 0x1F80 ; $charmask{'h'} = 0x1F90 ; $charmask{'w'} = 0x1FA0 ; foreach $breathing (@breathings) { foreach $accent (@accents) { foreach $vowel ('a', 'h', 'w') # Only these three vowels! { $uchar = $charmask{$breathing} | $charmask{$accent} | $charmask{$vowel} ; printf "sub $breathing $accent $vowel bar by uni%04X ;\n", $uchar ; # Uppercase counterparts $uchar = $charmask{$breathing} | $charmask{$accent} | $charmask{$vowel} | $charshift ; printf "sub $breathing $accent %s bar by uni%04X ;\n", uc($vowel), $uchar ; } } } # And finally, the characters with subscribed iotas but without breathings. # Only nine of them, write them one by one. print "sub grave a bar by uni1FB2 ;\n" ; print "sub a bar by uni1FB3 ;\n" ; print "sub quotesingle a bar by uni1FB4 ;\n" ; print "sub grave h bar by uni1FC2 ;\n" ; print "sub h bar by uni1FC3 ;\n" ; print "sub quotesingle h bar by uni1FC4 ;\n" ; print "sub grave w bar by uni1FD2 ;\n" ; print "sub w bar by uni1FD3 ;\n" ; print "sub quotesingle w bar by uni1FD4 ;\n" ; # And some more with perispomeni ... print "sub asciitilde a by uni1FB6 ;\n" ; print "sub asciitilde a bar by uni1FB7 ;\n" ; print "sub asciitilde h by uni1FC6 ;\n" ; print "sub asciitilde h bar by uni1FC7 ;\n" ; print "sub asciitilde w by uni1FD6 ;\n" ; print "sub asciitilde w bar by uni1FD7 ;\n" ; # Rhos print "sub greater r by uni1FE4 ;\n" ; print "sub less r by uni1FE5 ;\n" ; print "sub less R by uni1FEC ;\n" ; # We leave some over but that should already be useful. Enjoy! [-- Attachment #6: Type: text/plain, Size: 487 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 1:15 ` Arthur Reutenauer @ 2007-09-13 7:03 ` Taco Hoekwater 2007-09-13 10:24 ` Arthur Reutenauer 2007-09-13 17:42 ` Hans Hagen 2007-09-13 9:45 ` Thomas A. Schmitz 1 sibling, 2 replies; 31+ messages in thread From: Taco Hoekwater @ 2007-09-13 7:03 UTC (permalink / raw) To: mailing list for ConTeXt users Arthur Reutenauer wrote: > Hello Thomas, > > I was waiting for someone else to answer your questions because I > had no clue how to address them even if I was interested; but now I do, > thanks to Hans' reply: > > For your general problem you need to define a new regime that will > map each relevant character sequence to the corresponding Unicode > character. That is, you inform ConTeXt that the character stream it sees > is actually a way of coding another set of characters and that it can > forget the original stream. This treatment should be done before any sort > of font property intervenes, because it does not depend on the > appearance of the typeset text. That's what regimes are for. Yes, except that we need a more powerful version (almost like OTPs) if we want to handle transcriptions properly. The vital point is that it should operate on tokens, not on nodes. I am not sure if Hans already has a hook there that can be extended. > If you have a look at the attached greek-babel.tex (and the features > definition file greek-babel.fea) you will see that (almost) everything > is taken care of using Opentype substitutions. You need Bosporos and > GFS Baskerville to compile the file; by the way, the line with GFS > Baskerville is a further proof that you shouldn't handle the > transformation at font level: can you explain why it doesn't work here? Possibly because a single one of the glyphs has a different name in GFS Baskerville, or because a previous gsub rule has e.g. replaced F;i; => Fi; (your own gsub rules are always executed last, after everything defined by the font) As you say, .fea's are definately not the right way to handle this, even if they would work flawlessly. ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 7:03 ` Taco Hoekwater @ 2007-09-13 10:24 ` Arthur Reutenauer 2007-09-13 11:38 ` Taco Hoekwater 2007-09-13 17:42 ` Hans Hagen 1 sibling, 1 reply; 31+ messages in thread From: Arthur Reutenauer @ 2007-09-13 10:24 UTC (permalink / raw) To: mailing list for ConTeXt users > Yes, except that we need a more powerful version (almost like OTPs) if > we want to handle transcriptions properly. The vital point is that it > should operate on tokens, not on nodes. Yes, sure. OTP would work fine here, but I thought Mark IV had already something handy. > Possibly because a single one of the glyphs has a different name in > GFS Baskerville, or because a previous gsub rule has e.g. replaced > F;i; => Fi; No, simply because GFS Baskerville has no glyphs for Latin characters, so they're dropped by the token reader and can't be transformed afterwards! Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 10:24 ` Arthur Reutenauer @ 2007-09-13 11:38 ` Taco Hoekwater 2007-09-13 12:54 ` Thomas A. Schmitz 2007-09-13 18:36 ` Arthur Reutenauer 0 siblings, 2 replies; 31+ messages in thread From: Taco Hoekwater @ 2007-09-13 11:38 UTC (permalink / raw) To: mailing list for ConTeXt users [-- Attachment #1: Type: text/plain, Size: 464 bytes --] Arthur Reutenauer wrote: >> Yes, except that we need a more powerful version (almost like OTPs) if >> we want to handle transcriptions properly. The vital point is that it >> should operate on tokens, not on nodes. > > Yes, sure. OTP would work fine here, but I thought Mark IV had already > something handy. I played a bit, see attachment. Surely Hans will want to improve on this interface, so don't patch any of the core files just now. Best wishes, Taco [-- Attachment #2: tokfilter.tex --] [-- Type: text/x-tex, Size: 1751 bytes --] % engine=luatex %D First a hack to the core. two changes: %D * don't force end_cs to be \relax %D * don't remove end_cs from the input stream \ctxlua{ function collectors.install(tag,end_cs) collectors.data[tag] = { } local data = collectors.data[tag] local call = token.command_id("call") local endcs = token.csname_id(end_cs) local expand = collectors.registered local get = token.get_next while true do local t = get() local a, b = t[1], t[3] if b == endcs then tex.print('\\' ..end_cs) return elseif a == call and expand[b] then token.expand() else data[\string#data+1] = t end end end } %D a small extension to the core interface, to have a %D nice wrapper around the lua code \ctxlua { function collectors.handle(tag,handle) collectors.data[tag] = handle(collectors.data[tag]) end } \def\handletokens[#1][#2]{\ctxlua{collectors.handle("#1",#2)}} %D Here starts the document-specific code %D Start capturing tokens in the buffer named 'babel', stop %D at \stopbabel \def\startbabel {\ctxlua{collectors.install("babel", "stopbabel")}} %D The lua mutation function. str is a table containing the captured %D tokens, each itself a three-item table (this is explained in the %D luatex manual) \ctxlua { function convert_babel(str) local t = { } for k,v in ipairs(str) do t[\string#t+1] = tokens.other('*') t[\string#t+1] = v end return t end } %D convert the tokens using that lua function, then %D flush the result \def\stopbabel {\handletokens[babel][convert_babel] \flushtokens[babel]} \starttext \startbabel% some stuff here \stopbabel \stoptext [-- Attachment #3: Type: text/plain, Size: 487 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 11:38 ` Taco Hoekwater @ 2007-09-13 12:54 ` Thomas A. Schmitz 2007-09-13 18:36 ` Arthur Reutenauer 1 sibling, 0 replies; 31+ messages in thread From: Thomas A. Schmitz @ 2007-09-13 12:54 UTC (permalink / raw) To: mailing list for ConTeXt users On Sep 13, 2007, at 1:38 PM, Taco Hoekwater wrote: > Arthur Reutenauer wrote: >>> Yes, except that we need a more powerful version (almost like >>> OTPs) if >>> we want to handle transcriptions properly. The vital point is >>> that it >>> should operate on tokens, not on nodes. >> Yes, sure. OTP would work fine here, but I thought Mark IV had >> already >> something handy. > > I played a bit, see attachment. Surely Hans will want to improve on > this interface, so don't patch any of the core files just now. > > Best wishes, > Taco Taco, it almost feels like today's my birthday - thanks again! Will look at it more closely soonish! Best Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 11:38 ` Taco Hoekwater 2007-09-13 12:54 ` Thomas A. Schmitz @ 2007-09-13 18:36 ` Arthur Reutenauer 2007-09-13 18:49 ` Hans Hagen 2007-09-13 19:24 ` Hans Hagen 1 sibling, 2 replies; 31+ messages in thread From: Arthur Reutenauer @ 2007-09-13 18:36 UTC (permalink / raw) To: mailing list for ConTeXt users [-- Attachment #1: Type: text/plain, Size: 608 bytes --] > I played a bit, see attachment. Surely Hans will want to improve on this > interface, so don't patch any of the core files just now. Fantastic! Now I played a bit with your file myself, and compared with the behaviour of an OTP which has the same action: you can see that macros arguments between square brackets are preserved by OTP, whereas your function (obviously) converts everything unconditionally. How difficult would it be to program the same behaviour, that is, make collectors.handle pass to convert_babel only contiguous ranges of characters that are situated outside matching brackets? [-- Attachment #2: tokfilter_otp.tex --] [-- Type: text/x-tex, Size: 2247 bytes --] % engine=luatex %D First a hack to the core. two changes: %D * don't force end_cs to be \relax %D * don't remove end_cs from the input stream \ctxlua{ function collectors.install(tag,end_cs) collectors.data[tag] = { } local data = collectors.data[tag] local call = token.command_id("call") local endcs = token.csname_id(end_cs) local expand = collectors.registered local get = token.get_next while true do local t = get() local a, b = t[1], t[3] if b == endcs then tex.print('\\' ..end_cs) return elseif a == call and expand[b] then token.expand() else data[\string#data+1] = t end end end } %D a small extension to the core interface, to have a %D nice wrapper around the lua code \ctxlua { function collectors.handle(tag,handle) collectors.data[tag] = handle(collectors.data[tag]) end } \def\handletokens[#1][#2]{\ctxlua{collectors.handle("#1",#2)}} %D Here starts the document-specific code %D Start capturing tokens in the buffer named 'babel', stop %D at \stopbabel \def\startbabel {\ctxlua{collectors.install("babel", "stopbabel")}} %D The lua mutation function. str is a table containing the captured %D tokens, each itself a three-item table (this is explained in the %D luatex manual) \ctxlua { function convert_babel(str) local t = { } for k,v in ipairs(str) do t[\string#t+1] = tokens.other('*') t[\string#t+1] = v end return t end } %D convert the tokens using that lua function, then %D flush the result \def\stopbabel {\handletokens[babel][convert_babel] \flushtokens[babel]} \usetypescript[palatino][ec] \starttext \section{With Taco's \type{\startbabel}} \startbabel% some stuff here \blank[medium] some other stuff \switchtobodyfont[palatino] \subsection{More stuff} stuff stuff stuff \stopbabel \blank[big] \section{With an \tt OTP} % Do the same as convert_babel, with a simple OTP (stars.otp) \ocp\stars=stars \ocplist\StarsOCP=\addbeforeocplist1\stars\nullocplist \pushocplist\StarsOCP some stuff here \blank[medium] some other stuff \switchtobodyfont[palatino] \subsection{More stuff} stuff stuff stuff \stoptext [-- Attachment #3: stars.ocp --] [-- Type: application/octet-stream, Size: 60 bytes --] [-- Attachment #4: stars.otp --] [-- Type: application/vnd.oasis.opendocument.presentation-template, Size: 58 bytes --] [-- Attachment #5: Type: text/plain, Size: 487 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 18:36 ` Arthur Reutenauer @ 2007-09-13 18:49 ` Hans Hagen 2007-09-13 19:24 ` Hans Hagen 1 sibling, 0 replies; 31+ messages in thread From: Hans Hagen @ 2007-09-13 18:49 UTC (permalink / raw) To: mailing list for ConTeXt users Arthur Reutenauer wrote: >> I played a bit, see attachment. Surely Hans will want to improve on this >> interface, so don't patch any of the core files just now. > > Fantastic! > > Now I played a bit with your file myself, and compared with the > behaviour of an OTP which has the same action: you can see that macros > arguments between square brackets are preserved by OTP, whereas your > function (obviously) converts everything unconditionally. How difficult > would it be to program the same behaviour, that is, make > collectors.handle pass to convert_babel only contiguous ranges of > characters that are situated outside matching brackets? i'll wrap tacos macro up a bit however, dealing with things like \blank[whatever] is not trivial (1) we need to prevent expansion (register feature) (2) but sometimes we need to expand (3) and not all commands are treated the same this is why otp liek things are suboptimal also, a proper toks handling mechanism should look at its neighbours Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 18:36 ` Arthur Reutenauer 2007-09-13 18:49 ` Hans Hagen @ 2007-09-13 19:24 ` Hans Hagen 2007-09-13 19:45 ` Arthur Reutenauer 1 sibling, 1 reply; 31+ messages in thread From: Hans Hagen @ 2007-09-13 19:24 UTC (permalink / raw) To: mailing list for ConTeXt users Arthur Reutenauer wrote: ... greek ... greek ... new beta \defineremapper[babelgreek] \remapcharacter[babelgreek][`a]{\alpha} \remapcharacter[babelgreek][`b]{\beta} \remapcharacter[babelgreek][`c]{\gamma} \remapcharacter[babelgreek][`d]{OEPS} \starttext [\startbabelgreek a b c some stuff here \blank[big] oeps b d \stopbabelgreek] [\babelgreek{some stuff here}] \stoptext i can think of a more clever mechanism (have some ideas) but not now (in the middle of something else) for arthur ... [] are skipped for mojca ... this beta also fixes your accent problem (if she's in the mood for source browsing ... interesting solution) for luigi ... working on a variant xml parser ... now loading 40 meg in 5 seconds for taco ... i made your example into a configurable one Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 19:24 ` Hans Hagen @ 2007-09-13 19:45 ` Arthur Reutenauer 2007-09-13 20:20 ` Hans Hagen 2007-09-13 20:38 ` Thomas A. Schmitz 0 siblings, 2 replies; 31+ messages in thread From: Arthur Reutenauer @ 2007-09-13 19:45 UTC (permalink / raw) To: mailing list for ConTeXt users > for arthur ... [] are skipped Thanks! I guess there's more to it and token filtering is not the only way to do it, but it's still great. Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 19:45 ` Arthur Reutenauer @ 2007-09-13 20:20 ` Hans Hagen 2007-09-14 0:24 ` Arthur Reutenauer 2007-09-13 20:38 ` Thomas A. Schmitz 1 sibling, 1 reply; 31+ messages in thread From: Hans Hagen @ 2007-09-13 20:20 UTC (permalink / raw) To: mailing list for ConTeXt users Arthur Reutenauer wrote: >> for arthur ... [] are skipped > > Thanks! I guess there's more to it and token filtering is not the only > way to do it, but it's still great. indeed, also, its' important to look fresh at these things an dforget about how we do things now, else we replace hack with hack Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 20:20 ` Hans Hagen @ 2007-09-14 0:24 ` Arthur Reutenauer 0 siblings, 0 replies; 31+ messages in thread From: Arthur Reutenauer @ 2007-09-14 0:24 UTC (permalink / raw) To: mailing list for ConTeXt users > indeed, also, its' important to look fresh at these things an dforget > about how we do things now, else we replace hack with hack Sure, of course. I only thought this was a nice way of handling things but I'm not settled on that. Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 19:45 ` Arthur Reutenauer 2007-09-13 20:20 ` Hans Hagen @ 2007-09-13 20:38 ` Thomas A. Schmitz 2007-09-13 21:05 ` Hans Hagen 2007-09-15 23:22 ` Arthur Reutenauer 1 sibling, 2 replies; 31+ messages in thread From: Thomas A. Schmitz @ 2007-09-13 20:38 UTC (permalink / raw) To: mailing list for ConTeXt users On Sep 13, 2007, at 9:45 PM, Arthur Reutenauer wrote: > Thanks! I guess there's more to it and token filtering is not the > only > way to do it, but it's still great. > > Arthur Oh boy... I'm afraid I lost you there. Hans, your remapper looks just like the thing I'd need for my Greek stuff. Right now, there appears to be a slight problem with the pdfs I produce with this code: on my system (OS X), they freeze or crash most pdf viewers (Adobe Reader can handle them, preview, TeXShop and pdfview all crash or freeze). Arthur, I also played with your fontfeatures. Most of the substitutions work, but there were a couple of problems that I just couldn't resolve, especially regarding the characters with an iota subscript: combinations involving accents and breathing (such as >~h|) were remapped correctly; the pure vowel + iota (h|) was not remapped. I guess I will wait till the dust settles a bit and you tell me which is the best way to pursue. Taco, one question: Hans mentioned that support for "wide" postscript fonts via afm was not supported yet. Does that mean that type 1 fonts with a unicode encoding do not work yet? Thanks so much, all best Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 20:38 ` Thomas A. Schmitz @ 2007-09-13 21:05 ` Hans Hagen 2007-09-13 21:52 ` Taco Hoekwater 2007-09-15 23:22 ` Arthur Reutenauer 1 sibling, 1 reply; 31+ messages in thread From: Hans Hagen @ 2007-09-13 21:05 UTC (permalink / raw) To: mailing list for ConTeXt users Thomas A. Schmitz wrote: > Taco, one question: Hans mentioned that support for "wide" postscript > fonts via afm was not supported yet. Does that mean that type 1 fonts > with a unicode encoding do not work yet? the latest mkiv works ok with wide fonts, the latest luatex also, but best wait till begin next week when all subsetting issues are resolved Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 21:05 ` Hans Hagen @ 2007-09-13 21:52 ` Taco Hoekwater 0 siblings, 0 replies; 31+ messages in thread From: Taco Hoekwater @ 2007-09-13 21:52 UTC (permalink / raw) To: mailing list for ConTeXt users Hans Hagen wrote: > Thomas A. Schmitz wrote: > >> Taco, one question: Hans mentioned that support for "wide" postscript >> fonts via afm was not supported yet. Does that mean that type 1 fonts >> with a unicode encoding do not work yet? > > the latest mkiv works ok with wide fonts, the latest luatex also, but > best wait till begin next week when all subsetting issues are resolved Like the man says. Best wishes, Taco PS It is amazing how Hans manages to answer questions to me before I even see them! All ntg-context mail arrives completely out of order and hours late, today. ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 20:38 ` Thomas A. Schmitz 2007-09-13 21:05 ` Hans Hagen @ 2007-09-15 23:22 ` Arthur Reutenauer 2007-09-16 6:56 ` Taco Hoekwater ` (2 more replies) 1 sibling, 3 replies; 31+ messages in thread From: Arthur Reutenauer @ 2007-09-15 23:22 UTC (permalink / raw) To: Mailing list for ConTeXt users [-- Attachment #1: Type: text/plain, Size: 1682 bytes --] > there were a couple of problems that I just > couldn't resolve, especially regarding the characters with an iota > subscript: Indeed. This is a problem with the Fontforge code applying the GSUB features: the 'grbl' feature is defined by two lookups, one being a list of single substitutions (h -> eta) and the other a list of ligature substitutions (h bar -> eta with subscribed iota). Now, since the latter has to take precedence to avoid conflicts, I explicitely put it before the other, but it seems that Fontforge ignores this and applies the list of single substitutions before the other (this is confirmed by the cache file BosporosU@greek-babel.tma where the lookup with the single substitutions, called "GreelBabelLookupSimple", appears first in the gsub table). Note that this doesn't happen for substitutions *inside* a lookup (so things like "greater eta bar" and "eta bar" don't conflict since they're both ligature substitutions and I put the former before in the list, and the substitutions are correctly applied. As a far as I understand, this behaviour is actually compliant with the Opentype specifications and is quite widespread among typesetting engines and so it is not (only) Fontforge's fault; but, needless to say, it is nevertheless annoying. (In more crude terms: Opentype does not specify anything in that respect, so manufacturers of typesetting software can do whatever they want ...) Thomas: to solve the problem at hand, you can try the new feature file I send along with a small test (I simply define a new feature that is to be applied after 'grbl', to deal specifically with the subscribed iotas). Arthur [-- Attachment #2: greek-babel-extended.fea --] [-- Type: text/plain, Size: 8742 bytes --] # An Opentype feature to replace the Babel input scheme # Not quite complete; some rhos with breathings and accents are missing (where # are they?) and the final sigma isn't accounted for. lookup GreekBabelLookupSimple { lookupflag 0 ; sub a by alpha ; sub b by beta ; sub g by gamma ; sub d by delta ; sub e by epsilon ; sub z by zeta ; sub h by eta ; sub j by theta ; sub i by iota ; sub k by kappa ; sub l by lambda ; sub m by mu ; sub n by nu ; sub x by xi ; sub o by omicron ; sub p by pi ; sub r by rho ; sub c by sigmafinal ; sub s by sigma ; sub t by tau ; sub u by upsilon ; sub f by phi ; sub q by chi ; sub y by psi ; sub w by omega ; sub A by Alpha ; sub B by Beta ; sub G by Gamma ; sub D by Delta ; sub E by Epsilon ; sub Z by Zeta ; sub H by Eta ; sub J by Theta ; sub I by Iota ; sub K by Kappa ; sub L by Lambda ; sub M by Mu ; sub N by Nu ; sub X by Xi ; sub O by Omicron ; sub P by Pi ; sub R by Rho ; sub C by Uni03C2 ; sub S by Sigma ; sub T by Tau ; sub U by Upsilon ; sub F by Phi ; sub Q by Chi ; sub Y by Psi ; sub W by Omega ; sub semicolon by periodcentered ; } GreekBabelLookupSimple ; lookup GreekBabelLookupMultiple { lookupflag 1 ; # sub s 'space by sigmafinal ; sub greater a by uni1F00 ; sub greater A by uni1F08 ; sub greater e by uni1F10 ; sub greater E by uni1F18 ; sub greater h by uni1F20 ; sub greater H by uni1F28 ; sub greater i by uni1F30 ; sub greater I by uni1F38 ; sub greater o by uni1F40 ; sub greater O by uni1F48 ; sub greater u by uni1F50 ; # sub greater U by uni1F58 ; sub greater w by uni1F60 ; sub greater W by uni1F68 ; sub greater grave a by uni1F02 ; sub greater grave A by uni1F0A ; sub greater grave e by uni1F12 ; sub greater grave E by uni1F1A ; sub greater grave h by uni1F22 ; sub greater grave H by uni1F2A ; sub greater grave i by uni1F32 ; sub greater grave I by uni1F3A ; sub greater grave o by uni1F42 ; sub greater grave O by uni1F4A ; sub greater grave u by uni1F52 ; # sub greater grave U by uni1F5A ; sub greater grave w by uni1F62 ; sub greater grave W by uni1F6A ; sub greater quotesingle a by uni1F04 ; sub greater quotesingle A by uni1F0C ; sub greater quotesingle e by uni1F14 ; sub greater quotesingle E by uni1F1C ; sub greater quotesingle h by uni1F24 ; sub greater quotesingle H by uni1F2C ; sub greater quotesingle i by uni1F34 ; sub greater quotesingle I by uni1F3C ; sub greater quotesingle o by uni1F44 ; sub greater quotesingle O by uni1F4C ; sub greater quotesingle u by uni1F54 ; sub greater quotesingle U by uni1F5C ; sub greater quotesingle w by uni1F64 ; sub greater quotesingle W by uni1F6C ; sub greater asciitilde a by uni1F06 ; sub greater asciitilde A by uni1F0E ; sub greater asciitilde e by uni1F16 ; sub greater asciitilde E by uni1F1E ; sub greater asciitilde h by uni1F26 ; sub greater asciitilde H by uni1F2E ; sub greater asciitilde i by uni1F36 ; sub greater asciitilde I by uni1F3E ; sub greater asciitilde o by uni1F46 ; sub greater asciitilde O by uni1F4E ; sub greater asciitilde u by uni1F56 ; sub greater asciitilde U by uni1F5E ; sub greater asciitilde w by uni1F66 ; sub greater asciitilde W by uni1F6E ; sub less a by uni1F01 ; sub less A by uni1F09 ; sub less e by uni1F11 ; sub less E by uni1F19 ; sub less h by uni1F21 ; sub less H by uni1F29 ; sub less i by uni1F31 ; sub less I by uni1F39 ; sub less o by uni1F41 ; sub less O by uni1F49 ; sub less u by uni1F51 ; sub less U by uni1F59 ; sub less w by uni1F61 ; sub less W by uni1F69 ; sub less grave a by uni1F03 ; sub less grave A by uni1F0B ; sub less grave e by uni1F13 ; sub less grave E by uni1F1B ; sub less grave h by uni1F23 ; sub less grave H by uni1F2B ; sub less grave i by uni1F33 ; sub less grave I by uni1F3B ; sub less grave o by uni1F43 ; sub less grave O by uni1F4B ; sub less grave u by uni1F53 ; sub less grave U by uni1F5B ; sub less grave w by uni1F63 ; sub less grave W by uni1F6B ; sub less quotesingle a by uni1F05 ; sub less quotesingle A by uni1F0D ; sub less quotesingle e by uni1F15 ; sub less quotesingle E by uni1F1D ; sub less quotesingle h by uni1F25 ; sub less quotesingle H by uni1F2D ; sub less quotesingle i by uni1F35 ; sub less quotesingle I by uni1F3D ; sub less quotesingle o by uni1F45 ; sub less quotesingle O by uni1F4D ; sub less quotesingle u by uni1F55 ; sub less quotesingle U by uni1F5D ; sub less quotesingle w by uni1F65 ; sub less quotesingle W by uni1F6D ; sub less asciitilde a by uni1F07 ; sub less asciitilde A by uni1F0F ; sub less asciitilde e by uni1F17 ; sub less asciitilde E by uni1F1F ; sub less asciitilde h by uni1F27 ; sub less asciitilde H by uni1F2F ; sub less asciitilde i by uni1F37 ; sub less asciitilde I by uni1F3F ; sub less asciitilde o by uni1F47 ; sub less asciitilde O by uni1F4F ; sub less asciitilde u by uni1F57 ; sub less asciitilde U by uni1F5F ; sub less asciitilde w by uni1F67 ; sub less asciitilde W by uni1F6F ; sub grave a by uni1F70 ; sub quotesingle a by uni1F71 ; sub grave e by uni1F72 ; sub quotesingle e by uni1F73 ; sub grave h by uni1F74 ; sub quotesingle h by uni1F75 ; sub grave i by uni1F76 ; sub quotesingle i by uni1F77 ; sub grave o by uni1F78 ; sub quotesingle o by uni1F79 ; sub grave u by uni1F7A ; sub quotesingle u by uni1F7B ; sub grave w by uni1F7C ; sub quotesingle w by uni1F7D ; sub grave A by uni1FBA ; sub quotesingle A by uni1FBB ; sub grave E by uni1FC8 ; sub quotesingle E by uni1FC9 ; sub grave H by uni1FCA ; sub quotesingle H by uni1FCB ; sub grave I by uni1FDA ; sub quotesingle I by uni1FDB ; sub grave U by uni1FEA ; sub quotesingle U by uni1FEB ; sub grave W by uni1FFA ; sub quotesingle W by uni1FFB ; sub greater a bar by uni1F80 ; sub greater A bar by uni1F88 ; sub greater h bar by uni1F90 ; sub greater H bar by uni1F98 ; sub greater w bar by uni1FA0 ; sub greater W bar by uni1FA8 ; sub greater grave a bar by uni1F82 ; sub greater grave A bar by uni1F8A ; sub greater grave h bar by uni1F92 ; sub greater grave H bar by uni1F9A ; sub greater grave w bar by uni1FA2 ; sub greater grave W bar by uni1FAA ; sub greater quotesingle a bar by uni1F84 ; sub greater quotesingle A bar by uni1F8C ; sub greater quotesingle h bar by uni1F94 ; sub greater quotesingle H bar by uni1F9C ; sub greater quotesingle w bar by uni1FA4 ; sub greater quotesingle W bar by uni1FAC ; sub greater asciitilde a bar by uni1F86 ; sub greater asciitilde A bar by uni1F8E ; sub greater asciitilde h bar by uni1F96 ; sub greater asciitilde H bar by uni1F9E ; sub greater asciitilde w bar by uni1FA6 ; sub greater asciitilde W bar by uni1FAE ; sub less a bar by uni1F81 ; sub less A bar by uni1F89 ; sub less h bar by uni1F91 ; sub less H bar by uni1F99 ; sub less w bar by uni1FA1 ; sub less W bar by uni1FA9 ; sub less grave a bar by uni1F83 ; sub less grave A bar by uni1F8B ; sub less grave h bar by uni1F93 ; sub less grave H bar by uni1F9B ; sub less grave w bar by uni1FA3 ; sub less grave W bar by uni1FAB ; sub less quotesingle a bar by uni1F85 ; sub less quotesingle A bar by uni1F8D ; sub less quotesingle h bar by uni1F95 ; sub less quotesingle H bar by uni1F9D ; sub less quotesingle w bar by uni1FA5 ; sub less quotesingle W bar by uni1FAD ; sub less asciitilde a bar by uni1F87 ; sub less asciitilde A bar by uni1F8F ; sub less asciitilde h bar by uni1F97 ; sub less asciitilde H bar by uni1F9F ; sub less asciitilde w bar by uni1FA7 ; sub less asciitilde W bar by uni1FAF ; sub grave a bar by uni1FB2 ; sub quotesingle a bar by uni1FB4 ; sub grave h bar by uni1FC2 ; sub quotesingle h bar by uni1FC4 ; sub grave w bar by uni1FD2 ; sub quotesingle w bar by uni1FD4 ; sub asciitilde a by uni1FB6 ; sub asciitilde a bar by uni1FB7 ; sub asciitilde h by uni1FC6 ; sub asciitilde h bar by uni1FC7 ; sub asciitilde w by uni1FD6 ; sub asciitilde w bar by uni1FD7 ; sub greater r by uni1FE4 ; sub less r by uni1FE5 ; sub less R by uni1FEC ; } GreekBabelLookupMultiple ; lookup GreekBabel2LookupMultiple { lookupflag 1 ; sub alpha bar by uni1FB3 ; sub eta bar by uni1FC3 ; sub omega bar by uni1FF3 ; } GreekBabel2LookupMultiple ; feature grbl { script DFLT ; language dflt ; lookup GreekBabelLookupMultiple ; lookup GreekBabelLookupSimple ; script latn; language dflt ; lookup GreekBabelLookupMultiple ; lookup GreekBabelLookupSimple ; } grbl ; feature grb2 { script DFLT ; language dflt ; lookup GreekBabel2LookupMultiple ; script latn; language dflt ; lookup GreekBabel2LookupMultiple ; } grb2 ; [-- Attachment #3: subscribed_iotas.tex --] [-- Type: text/x-tex, Size: 378 bytes --] % For Thomas Schmitz. % Deal with subscribed iotas \installfontfeature[otf][grbl] \installfontfeature[otf][grb2] \definefontfeature [greek-babel] [mode=node,language=dflt,script=latn, grbl=yes,grb2=yes,featurefile=greek-babel-extended.fea] \font\bosphoros=name:BosporosU*greek-babel at 20pt \starttext \catcode`\~=11 \catcode`\|=11 \bosphoros a| h| w| \stoptext [-- Attachment #4: Type: text/plain, Size: 487 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-15 23:22 ` Arthur Reutenauer @ 2007-09-16 6:56 ` Taco Hoekwater 2007-09-16 8:22 ` Taco Hoekwater 2007-09-17 8:48 ` Hans Hagen 2 siblings, 0 replies; 31+ messages in thread From: Taco Hoekwater @ 2007-09-16 6:56 UTC (permalink / raw) To: Mailing list for ConTeXt users Arthur Reutenauer wrote: > As a far as I understand, this > behaviour is actually compliant with the Opentype specifications and is > quite widespread among typesetting engines and so it is not (only) > Fontforge's fault; but, needless to say, it is nevertheless annoying. > (In more crude terms: Opentype does not specify anything in that respect, > so manufacturers of typesetting software can do whatever they want ...) The specification says that lookups should be applied in LookupList order. Featurefiles don't have an explicit ordering command, but that does not mean that ordering should be irrelevant. So I think this is a bug in the version of fontforge I am using in luatex. I will do some testing. Best wishes, Taco ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-15 23:22 ` Arthur Reutenauer 2007-09-16 6:56 ` Taco Hoekwater @ 2007-09-16 8:22 ` Taco Hoekwater 2007-09-16 13:01 ` Thomas A. Schmitz 2007-09-16 13:08 ` Arthur Reutenauer 2007-09-17 8:48 ` Hans Hagen 2 siblings, 2 replies; 31+ messages in thread From: Taco Hoekwater @ 2007-09-16 8:22 UTC (permalink / raw) To: Mailing list for ConTeXt users Hi guys, Try this ordering: > lookup GreekBabelLookupMultiple { > ... > } GreekBabelLookupMultiple ; > > lookup GreekBabelLookupSimple { > ... > } GreekBabelLookupSimple ; > Best wishes, Taco ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-16 8:22 ` Taco Hoekwater @ 2007-09-16 13:01 ` Thomas A. Schmitz 2007-09-16 23:12 ` Hans Hagen 2007-09-16 13:08 ` Arthur Reutenauer 1 sibling, 1 reply; 31+ messages in thread From: Thomas A. Schmitz @ 2007-09-16 13:01 UTC (permalink / raw) To: mailing list for ConTeXt users Hi Arthur, Taco, you're my heroes! Changing the order of the lookup tables in the .fea file actually took care of the problem. Thanks for looking into this, now I get the results I was expecting; every substitution is applied to the font! Once the initial lookup has been done, this is reasonably fast, too, so I like it. I'm eagerly waiting for teh new release next week to see if this works with copy-and-past from pdfs. So this appears to be one way to deal with ASCII input a la babel. Easy to implement, but fails on fonts that don't have the glyphs for the Latin characters. One trivial question: when I want to experiment with feature files, the cached instance of the font seems to be in the way. Only after deleting the current luatex-cache, regenerating it and recompiling the format do I get proper results. Is there an easier/faster way to do this? Will now go on and experiment some more, especially with type1/afm- based fonts. Thanks a lot, best wishes Thomas On Sep 16, 2007, at 10:22 AM, Taco Hoekwater wrote: > > Hi guys, > > Try this ordering: > >> lookup GreekBabelLookupMultiple { >> ... >> } GreekBabelLookupMultiple ; >> >> lookup GreekBabelLookupSimple { >> ... >> } GreekBabelLookupSimple ; >> > > > ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-16 13:01 ` Thomas A. Schmitz @ 2007-09-16 23:12 ` Hans Hagen 0 siblings, 0 replies; 31+ messages in thread From: Hans Hagen @ 2007-09-16 23:12 UTC (permalink / raw) To: mailing list for ConTeXt users Thomas A. Schmitz wrote: > Hi Arthur, Taco, > > you're my heroes! Changing the order of the lookup tables in the .fea > file actually took care of the problem. Thanks for looking into this, > now I get the results I was expecting; every substitution is applied > to the font! Once the initial lookup has been done, this is > reasonably fast, too, so I like it. I'm eagerly waiting for teh new > release next week to see if this works with copy-and-past from pdfs. > So this appears to be one way to deal with ASCII input a la babel. > Easy to implement, but fails on fonts that don't have the glyphs for > the Latin characters. arthur mentions the final sigma in the fea file .. can be a (part of) feature too (like fina) > One trivial question: when I want to experiment with feature files, > the cached instance of the font seems to be in the way. Only after > deleting the current luatex-cache, regenerating it and recompiling > the format do I get proper results. Is there an easier/faster way to > do this? jumping the version number of the otf handler will force this, but this is a bad idea; also, caching is fast because no file checking has to be done, so deleting cached files (just the one you test) is the price you pay when developing a font (fea) file btw, the fea file can be part of the distribution (but we need to think of a naming scheme) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-16 8:22 ` Taco Hoekwater 2007-09-16 13:01 ` Thomas A. Schmitz @ 2007-09-16 13:08 ` Arthur Reutenauer 2007-09-16 13:44 ` Thomas A. Schmitz 1 sibling, 1 reply; 31+ messages in thread From: Arthur Reutenauer @ 2007-09-16 13:08 UTC (permalink / raw) To: Mailing list for ConTeXt users > Try this ordering: Yes, it works. So Fontforge is sensitive to the order in which the lookups are defined in the file? Interesting ... Thomas, you can try this but I have made a mistake in the Unicode code for omega with subscribed iota: it should be 1FF3 and not 1FD3. Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-16 13:08 ` Arthur Reutenauer @ 2007-09-16 13:44 ` Thomas A. Schmitz 0 siblings, 0 replies; 31+ messages in thread From: Thomas A. Schmitz @ 2007-09-16 13:44 UTC (permalink / raw) To: Mailing list for ConTeXt users On Sep 16, 2007, at 3:08 PM, Arthur Reutenauer wrote: > Yes, it works. So Fontforge is sensitive to the order in which the > lookups > are defined in the file? Interesting ... > > Thomas, you can try this but I have made a mistake in the Unicode > code > for omega with subscribed iota: it should be 1FF3 and not 1FD3. > > Arthur Yep, I had already fixed that (and also replied to Taco's message, the context list is again a bit out of order today). Arthur, while we're at it: could you try and insert this line into the fea-file: sub quotedbl quotesingle i by un1FD3 ; whenever I try anything like this with the quotedbl character (which produces some ligatures), I get this error: </Users/tas/texmf/fonts/opentype/greek/bosporos/BosporosU.otf !luaTeX error (file /Users/tas/texmf/fonts/opentype/greek/bosporos/ BosporosU.otf): Unexpected error: 255 != 256 ==> Fatal error occurred, no output PDF file produced! (Or similar errors with other fonts). The mechanism for the single dieresis works: sub quotedbl i by uni03CA ; but nothing with quotedbl + something else. Do you have any ideal what triggers this error? Best Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-15 23:22 ` Arthur Reutenauer 2007-09-16 6:56 ` Taco Hoekwater 2007-09-16 8:22 ` Taco Hoekwater @ 2007-09-17 8:48 ` Hans Hagen 2 siblings, 0 replies; 31+ messages in thread From: Hans Hagen @ 2007-09-17 8:48 UTC (permalink / raw) To: Mailing list for ConTeXt users Hi Arthur and Thomas, i've put the greek file in the distribution (fea path), do we also need this babel stuff for "u and such? we should start thinking about a set of predefined features Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 7:03 ` Taco Hoekwater 2007-09-13 10:24 ` Arthur Reutenauer @ 2007-09-13 17:42 ` Hans Hagen 1 sibling, 0 replies; 31+ messages in thread From: Hans Hagen @ 2007-09-13 17:42 UTC (permalink / raw) To: mailing list for ConTeXt users Taco Hoekwater wrote: > Yes, except that we need a more powerful version (almost like OTPs) if > we want to handle transcriptions properly. The vital point is that it > should operate on tokens, not on nodes. I am not sure if Hans already > has a hook there that can be extended. there are hooks, but i want to avoid token processign as much as possible beause it's slow (so it can definitely not be -as with nodes- done on all the data, i must give it some thought .. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 1:15 ` Arthur Reutenauer 2007-09-13 7:03 ` Taco Hoekwater @ 2007-09-13 9:45 ` Thomas A. Schmitz 2007-09-13 10:49 ` Arthur Reutenauer 2007-09-13 17:51 ` Hans Hagen 1 sibling, 2 replies; 31+ messages in thread From: Thomas A. Schmitz @ 2007-09-13 9:45 UTC (permalink / raw) To: mailing list for ConTeXt users Hi Arthur, first of all: thank you so much for your time and your expertise! Your reply and your scripts really make things a lot clearer for me; this is a huge step forward! I'll have to experiment and think more about it, here's just a few reactions to some of your remarks: On Sep 13, 2007, at 3:15 AM, Arthur Reutenauer wrote: > Hello Thomas, > > I was waiting for someone else to answer your questions because I > had no clue how to address them even if I was interested; but now I > do, > thanks to Hans' reply: > > For your general problem you need to define a new regime that will > map each relevant character sequence to the corresponding Unicode > character. That is, you inform ConTeXt that the character stream > it sees > is actually a way of coding another set of characters and that it can > forget the original stream. This treatment should be done before > any sort > of font property intervenes, because it does not depend on the > appearance of the typeset text. That's what regimes are for. I agree that this would probably be the cleanest solution: since luatex has unicode support, map everything to the corresponding Unicode characters. This would also make hyphenation easier to achieve. > > Now I turn up to Hans to give us guidelines on how to define an > advanced regime in Mark IV: Hans, what we need here is to replace > sequences of characters by other characters, so the mapping is not > one-to-one and it's more complicated than simple regimes defined by a > table lookup; but I guess all we have to do is write a lua function > that > we could plug into the input stream reading routine (just like other > regimes work). > > As far as the rest of Hans' reply is concerned (Opentype features > and > such), I would like to add that it is a very interesting and > fascinating > thing to do, but definitely not what you want here, for a lot of > reasons: Opentype features can be used to alter the appearance of the > text, but the not nature of characters themselves. That is, if you > did > the transformation of your input stream at the font level, you would > actually tell ConTeXt that you are handling Latin characters with a > special appearance (that the font takes care of), so for example, the > underlying text in a PDF would be a stream of Latin characters, and > copying-and-pasting would yield Latin characters, not Greek. The question of copy-and-paste is one of the big mysteries, and I have no clue why it works in some cases, but not in others. Right now, on my system (OS X 10.4), only Adobe Reader 8.0 does copy-paste correctly, and it does it correctly no matter if I use babel or Unicode input. Never touch a running system: I just take this as some sort of divine favor and leave it at that... > That is > not what you want here: you want your "a" to be understood as "alpha" > and your "less-than acute-sign w vertical-bar" to be considered an > "omega with dasia, varia and subscribed iota". Nor should you > think of > these transformations as a collection of ligatures (which act at the > font level), but rather as a text encoding, just like UTF-8 is an > encoding of the Unicode characters: in UTF-8 the byte sequence > "hexadecimal byte E1, hexadecimal byte BC, hexadecimal byte 80" is the > coding for the Unicode character U+1F00 GREEK SMALL LETTER ALPHA > WITH PSILI, > and in the Babel input scheme for Ancient Greek the same character is > encoded with the byte sequence "hexadecimal byte 3C [ASCII '<'], > hexadecimal byte 61 [ASCII 'a']". Yes, that's crystal clear. It would also take care of another problem: in the input stream, you know exactly which character sequence translates to what. On the font level, legacy fonts sometimes have their own ideas about where to put certain glyphs. > > Of course in the past, these transformations were handled at the > font > level and sequences like "< a" were actually ligatures, because > that was > all we had (and copypasting from a PDF was, mostly, doomed to > fail); but > we should not persist in that use now we can treat them as real > Unicode > characters. Well yes, but see above. > > As for your other question in your original message from > September 1st > (remapping single characters, for example U+03C3 to U+03F2), I have to > say first that I'm not very comfortable commenting on it since I'm not > quite sure what the issues are here; it may be that you have a simple > variant of some character, and this you should handle at font level > (some glyph being transformed into some other one); but if I am to > judge > by the very example you gave, I would deem this should be a part of > your > input regime: indeed, if every sigma is to be mapped to lunate sigma, > then it probably means that the lunate sigmas are part of your > character > stream (even if you didn't input it directly). But I really can't > give > any general advice here, especially because I don't actually know > what a > lunate sigma really is ;-) You would have to decide for yourself as a > specialist of Greek if you're dealing with really different characters > or simple font variants; in the former case you should handle the > transformation as a part of your regime; in the latter, by defining a > font feature like Hans demonstrated. I guess that different sorts of users would respond differently. In Unicode, there's a different slot for some alternate characters, so the Unicode standard really considers them different characters. For the classicist, a sigma is a sigma, and the fact that it can be rendered as a "lunate" or a "normal" sigma is irrelevant. For me, this makes more sense, so I would support this on the font level. > > But for now, as long as it is understood that font tricks aren't the > general solution for the problem at stake, I would like to demonstrate > that it is still possible to do everything at font level :-) > > If you have a look at the attached greek-babel.tex (and the features > definition file greek-babel.fea) you will see that (almost) everything > is taken care of using Opentype substitutions. You need Bosporos and > GFS Baskerville to compile the file; by the way, the line with GFS > Baskerville is a further proof that you shouldn't handle the > transformation at font level: can you explain why it doesn't work > here? > As a compliment, I also attach the Perl script which I wrote to > generate > the .fea file. Wonderful! I will look carefully at these files. I've been playing with perl and python all day yesterday for another problem, so I'm very much looking forward to studying your script. Thanks so much, and all best Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 9:45 ` Thomas A. Schmitz @ 2007-09-13 10:49 ` Arthur Reutenauer 2007-09-13 12:51 ` Thomas A. Schmitz 2007-09-13 14:25 ` Taco Hoekwater 2007-09-13 17:51 ` Hans Hagen 1 sibling, 2 replies; 31+ messages in thread From: Arthur Reutenauer @ 2007-09-13 10:49 UTC (permalink / raw) To: Mailing list for ConTeXt users > Right > now, on my system (OS X 10.4), only Adobe Reader 8.0 does copy-paste > correctly, and it does it correctly no matter if I use babel or > Unicode input. You mean with LuaTeX? Copypasting isn't supported yet in LuaTeX so it's no surprise that it wouldn't work (for me Adobe Reader and Preview fail in two different ways). As for pdfTeX I leave that to Taco and others to answer. But hyphenation is another important issue, maybe even clearer. > I guess that different sorts of users would respond differently. In > Unicode, there's a different slot for some alternate characters, so > the Unicode standard really considers them different characters. Actually, now I think about it, the name for U+03F2 has "symbol" in it, and that's a clear indication that the character is intended for "technical use", not for inputting Greek text; so your choice is consistent with the intent of the Standard. > Wonderful! I will look carefully at these files. I've been playing > with perl and python all day yesterday for another problem, so I'm > very much looking forward to studying your script. Somewhere in the middle of writing it, I realized that I should have written it in Lua :-) It wouldn't have been much different. Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 10:49 ` Arthur Reutenauer @ 2007-09-13 12:51 ` Thomas A. Schmitz 2007-09-13 14:25 ` Taco Hoekwater 1 sibling, 0 replies; 31+ messages in thread From: Thomas A. Schmitz @ 2007-09-13 12:51 UTC (permalink / raw) To: Mailing list for ConTeXt users On Sep 13, 2007, at 12:49 PM, Arthur Reutenauer wrote: > > You mean with LuaTeX? Copypasting isn't supported yet in LuaTeX so > it's no surprise that it wouldn't work (for me Adobe Reader and > Preview > fail in two different ways). As for pdfTeX I leave that to Taco and > others to answer. > > But hyphenation is another important issue, maybe even clearer. > Yes, I meant in pdfTeX, sorry for being imprecise. > Actually, now I think about it, the name for U+03F2 has "symbol" in > it, and that's a clear indication that the character is intended for > "technical use", not for inputting Greek text; so your choice is > consistent with the intent of the Standard. > OK, good to hear that. I now realize that much of the stuff that I hacked together for use with pdfTeX worked by dumb luck; with luaTeX, I'll be forced to be adhere to standards more closely. I guess that's a good thing... > > Somewhere in the middle of writing it, I realized that I should have > written it in Lua :-) It wouldn't have been much different. > Yes, I'm hoping to look into lua as well. Thanks so much! Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 10:49 ` Arthur Reutenauer 2007-09-13 12:51 ` Thomas A. Schmitz @ 2007-09-13 14:25 ` Taco Hoekwater 1 sibling, 0 replies; 31+ messages in thread From: Taco Hoekwater @ 2007-09-13 14:25 UTC (permalink / raw) To: Mailing list for ConTeXt users Arthur Reutenauer wrote: >> Right >> now, on my system (OS X 10.4), only Adobe Reader 8.0 does copy-paste >> correctly, and it does it correctly no matter if I use babel or >> Unicode input. > > You mean with LuaTeX? Copypasting isn't supported yet in LuaTeX so > it's no surprise that it wouldn't work (for me Adobe Reader and Preview > fail in two different ways). As for pdfTeX I leave that to Taco and > others to answer. The next luatex release will finally have support for cut&paste when using opentype and truetype fonts. In pdftex, cut&paste for traditional type1 fonts was already present, and that will continue to work as it did (at least for the immediate future). Best wishes, Taco ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Greek in luatex 2007-09-13 9:45 ` Thomas A. Schmitz 2007-09-13 10:49 ` Arthur Reutenauer @ 2007-09-13 17:51 ` Hans Hagen 1 sibling, 0 replies; 31+ messages in thread From: Hans Hagen @ 2007-09-13 17:51 UTC (permalink / raw) To: mailing list for ConTeXt users Thomas A. Schmitz wrote: >> For your general problem you need to define a new regime that will >> map each relevant character sequence to the corresponding Unicode >> character. That is, you inform ConTeXt that the character stream >> it sees >> is actually a way of coding another set of characters and that it can >> forget the original stream. This treatment should be done before >> any sort >> of font property intervenes, because it does not depend on the >> appearance of the typeset text. That's what regimes are for. regimes are a solution, but what solution is best depends on the input stream ... whole document? partial document? also written to external files? evenually everything can become a unicode, (private aereas) and as such travel through the system; of we can misuse virtual fonts ... >> we could plug into the input stream reading routine (just like other >> regimes work). there are mechanisms for that (because that's what i played al lot with last year; there was (maybe even is) a mechanism for chained processing of input etc >> actually tell ConTeXt that you are handling Latin characters with a >> special appearance (that the font takes care of), so for example, the >> underlying text in a PDF would be a stream of Latin characters, and >> copying-and-pasting would yield Latin characters, not Greek. not entirely true ... we can (and do) intercept the node stream ... ok, at that point we're dealing with a font/char pair, but we can chang ethe char (or node) to whatever we like ... depends on the problem > The question of copy-and-paste is one of the big mysteries, and I > have no clue why it works in some cases, but not in others. Right > now, on my system (OS X 10.4), only Adobe Reader 8.0 does copy-paste > correctly, and it does it correctly no matter if I use babel or > Unicode input. Never touch a running system: I just take this as > some sort of divine favor and leave it at that... that's a matter of associating tounicode points, of course, no unicode means no copy/paste -) >> That is >> not what you want here: you want your "a" to be understood as "alpha" >> and your "less-than acute-sign w vertical-bar" to be considered an >> "omega with dasia, varia and subscribed iota". Nor should you >> think of >> these transformations as a collection of ligatures (which act at the >> font level), but rather as a text encoding, just like UTF-8 is an >> encoding of the Unicode characters: in UTF-8 the byte sequence >> "hexadecimal byte E1, hexadecimal byte BC, hexadecimal byte 80" is the >> coding for the Unicode character U+1F00 GREEK SMALL LETTER ALPHA >> WITH PSILI, >> and in the Babel input scheme for Ancient Greek the same character is >> encoded with the byte sequence "hexadecimal byte 3C [ASCII '<'], >> hexadecimal byte 61 [ASCII 'a']". > > Yes, that's crystal clear. It would also take care of another > problem: in the input stream, you know exactly which character > sequence translates to what. On the font level, legacy fonts > sometimes have their own ideas about where to put certain glyphs. depends ... the input char becomes a node, now, if (probably controlled by attributes) a certain char is sees (say 'a') and you want it to be an alpha, well, we can change that char then in the node, >> Of course in the past, these transformations were handled at the >> font >> level and sequences like "< a" were actually ligatures, because >> that was >> all we had (and copypasting from a PDF was, mostly, doomed to >> fail); but >> we should not persist in that use now we can treat them as real >> Unicode >> characters. those hard coded mechanism were indeed not sufficient Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2007-09-17 8:48 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-09-01 10:56 Greek in luatex Thomas A. Schmitz 2007-09-11 6:47 ` Thomas A. Schmitz 2007-09-11 10:12 ` Hans Hagen 2007-09-13 1:15 ` Arthur Reutenauer 2007-09-13 7:03 ` Taco Hoekwater 2007-09-13 10:24 ` Arthur Reutenauer 2007-09-13 11:38 ` Taco Hoekwater 2007-09-13 12:54 ` Thomas A. Schmitz 2007-09-13 18:36 ` Arthur Reutenauer 2007-09-13 18:49 ` Hans Hagen 2007-09-13 19:24 ` Hans Hagen 2007-09-13 19:45 ` Arthur Reutenauer 2007-09-13 20:20 ` Hans Hagen 2007-09-14 0:24 ` Arthur Reutenauer 2007-09-13 20:38 ` Thomas A. Schmitz 2007-09-13 21:05 ` Hans Hagen 2007-09-13 21:52 ` Taco Hoekwater 2007-09-15 23:22 ` Arthur Reutenauer 2007-09-16 6:56 ` Taco Hoekwater 2007-09-16 8:22 ` Taco Hoekwater 2007-09-16 13:01 ` Thomas A. Schmitz 2007-09-16 23:12 ` Hans Hagen 2007-09-16 13:08 ` Arthur Reutenauer 2007-09-16 13:44 ` Thomas A. Schmitz 2007-09-17 8:48 ` Hans Hagen 2007-09-13 17:42 ` Hans Hagen 2007-09-13 9:45 ` Thomas A. Schmitz 2007-09-13 10:49 ` Arthur Reutenauer 2007-09-13 12:51 ` Thomas A. Schmitz 2007-09-13 14:25 ` Taco Hoekwater 2007-09-13 17:51 ` Hans Hagen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).