Good evening, Jannis! On 2011-11-01 17:16, Jan Heinen wrote: > Today I wrote the function "ConvertToConteXt" which converts special > ConTeXt-characters. You can see it below. The data necessary for converting HTML entities is already in Context, have a look at “char-ent.lua” if you are interested. Based on this you could write the deentitizer (== your “html_entity_decode” function?) + character handler in “pure” Context (no PHP needed) as follows: ··· deent.cld ··················································· thirddata = thirddata or { } thirddata.myfunctions = thirddata.myfunctions or { } local myfunctions = thirddata.myfunctions local entities = characters.entities local utf8byte = unicode.utf8.byte local lpegmatch = lpeg.match local P, R, S, Cs = lpeg.P, lpeg.R, lpeg.S, lpeg.Cs local fmt, stringupper = string.format, string.upper do local s_hex = [[{\char"%s}]] local s_dec = [[{\char%s}]] local replace_hex_entity = function (hexnum) return fmt(s_hex, stringupper(hexnum)) end local replace_dec_entity = function (decimal) return fmt(s_dec, decimal) end local replace_named_entity = function (name) return fmt(s_dec, utf8byte(entities[name])) end local replace_unsafe = function (char) return fmt(s_dec, utf8byte(char)) end --local backslash = P[[\]] --local escaped = backslash / "" * 1 local semicolon = P";" local ucase_letter = R"AZ" local lcase_letter = R"az" local decimal_digit = R"09" local decimal_number = decimal_digit^1 local hex_digit = decimal_digit + R"AF" + R"af" local hex_number = hex_digit^1 local entity_char = ucase_letter + lcase_letter + decimal_digit local entity_chars = entity_char^1 local entity = (P"&#x" / "") * (hex_number / replace_hex_entity) * (semicolon / "") + (P"&#" / "") * (decimal_number / replace_dec_entity) * (semicolon / "") + (P"&" / "") * (entity_chars / replace_named_entity) * (semicolon / "") local unsafe = S[[{}\$~%]] / replace_unsafe --local p_characters = Cs((escaped + unsafe + entity + 1)^0) local p_characters = Cs((unsafe + entity + 1)^0) myfunctions.convert_to_context = function (str) return lpegmatch(p_characters, str) end end --- Testing ... local someinput = [[ a º b • c ° d B e Ł f } g } h &non-well-formed; i { j } k { l \ m $ n + o - p ^ q _ r @ s ` t ~ u ! v % w « x ]] context.starttext() context(myfunctions.convert_to_context(someinput)) context.stoptext() ································································· > 3. Did I forget to convert a character? Most of the chars you substituted have no special semantics in the first place. Philipp > Before I put it into contextgarden.net ... > 1. ... please test it. > 2. You see three characters, where I don't know the code-number > \char??? for ConTeXt. Do you know them? > 3. Did I forget to convert a character? > > Regards > Jannis > > > > function ConvertToConteXt ( $xstring ) { > /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > * * * * * > * > * author: Jörg Kopp > * www.dr-kopp.com > * 01.11.2011 > * > * Convert special ConTeXt-characters with php > * Works with PHP5 > * > * Call it with the string you want to convert ... > * ConvertToConteXt ($xstring); > * > * ... and you get back the converted string > * > * e.g.: > * Input: > * $string = "My root-Directory: /home/hans"; > * $string = ConvertToConteXt ( $string ); > * > * Output/Return: > * $string = "My root\\char45Directory\\char58 > \\char47home\\char47hans"; > * > * When you write this into a file ... > * file_put_contents ( "example.tex", "My > root\\char45Directory\char58 \\char47home\\char47hans", FILE_APPEND > ); > * > * ... You will find the following in example.tex: > * My root\char45Directory\char58 \char47home\char47hans > * > * An when you compile example.tex with ConTeXt > * context example.text > * > * You can read the following in the resulting example.pdf: > * My root-Directory: /home/hans > * > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > * * * */ > > $xstring = html_entity_decode ( $xstring ); // convert > HTML-entities into normal characters > $xstring = str_replace ( "!", "\\char33", $xstring ); // > Ausrufungszeichen/ConvertToConteXt > $xstring = str_replace ( "\"", "\\char34", $xstring ); // > Anführungszeichen/quotation mark > $xstring = str_replace ( "#", "\\char35", $xstring ); // > Raute/number sign > $xstring = str_replace ( "$", "\\char36", $xstring ); // > Dollar-Zeichen/dollar sign > $xstring = str_replace ( "%", "\\char37", $xstring ); // > Prozent-Zeichen/percent sign > $xstring = str_replace ( "&", "\\char38", $xstring ); // > Kaufmännisches Und/ampersand > $xstring = str_replace ( "'", "\\char39", $xstring ); // > Apostroph/apostrophe > $xstring = str_replace ( "(", "\\char40", $xstring ); // Klammer > auf/left parenthesis > $xstring = str_replace ( ")", "\\char41", $xstring ); // Klammer > zu/right parenthesis > $xstring = str_replace ( "*", "\\char42", $xstring ); // > Stern/asterisk > $xstring = str_replace ( "+", "\\char43", $xstring ); // > Plus/plus sign > $xstring = str_replace ( ",", "\\char44", $xstring ); // > Komma/comma > $xstring = str_replace ( "-", "\\char45", $xstring ); // > Minus/hyphen > $xstring = str_replace ( ".", "\\char46", $xstring ); // > Punkt/period > $xstring = str_replace ( "/", "\\char47", $xstring ); // > Schrägstrich/period > $xstring = str_replace ( ":", "\\char58", $xstring ); // > Doppelpunkt/colon > $xstring = str_replace ( ";", "\\char59", $xstring ); // > Semikolon/semicolon > $xstring = str_replace ( "<", "\\char60", $xstring ); // > Kleinerzeichen/less-than > $xstring = str_replace ( "=", "\\char61", $xstring ); // > Gleichzeichen/equals-to > $xstring = str_replace ( ">", "\\char62", $xstring ); // > Größerzeichen/greater-than > $xstring = str_replace ( "?", "\\char63", $xstring ); // > Fragezeichen/question mark > $xstring = str_replace ( "@", "\\char64", $xstring ); // > at-Zeichen/at sign > $xstring = str_replace ( "[", "\\char91", $xstring ); // eckige > Klammer auf/left square bracket > $xstring = str_replace ( "\\", "\\char92", $xstring ); // > Backslash/backslash > $xstring = str_replace ( "]", "\\char93", $xstring ); // eckige > Klammer zu/right square bracket > $xstring = str_replace ( "^", "\\char94", $xstring ); // > Zirkumflex/caret > $xstring = str_replace ( "_", "\\char95", $xstring ); // > Unterstrich/underscore > //$xstring = str_replace ( "°", "\\char", $xstring ); // Grad/ < > ------ missing > $xstring = str_replace ( "`", "\\char96", $xstring ); // accent > aigu/acute accent > $xstring = str_replace ( "{", "\\char123", $xstring ); // > geschweifte Klammer auf/left curly brace > $xstring = str_replace ( "|", "\\char124", $xstring ); // > Pipezeichen/vertical bar > $xstring = str_replace ( "}", "\\char125", $xstring ); // > geschweifte Klammer zu/right curly brace > $xstring = str_replace ( "~", "\\char126", $xstring ); // > Tilde/tilde > //$xstring = str_replace ( "•", "\\char", $xstring ); // ?/ < > ------ missing > //$xstring = str_replace ( "º", "\\char", $xstring ); // ?/ < > ------ missing > > return $xstring; > } > > > ___________________________________________________________________________________ > If your question is of interest to others as well, please add an entry to the Wiki! > > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > ___________________________________________________________________________________ -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments