Hi list,
I am trying to parse an RSS feed using OCaml-RSS, which uses XML-Light, which however does not support CDATA blocks. So I added support in the ocamllex-based lexer as follows:
let ends_sq = [^']']* ']'
let ends_sq_sq = ends_sq ([^']'] ends_sq)* ']'+
let ends_sq_sq_ang = ends_sq_sq ([^'>'] ends_sq_sq)* '>'
or expanded:
let ends_sq_sq_ang = (([^']']*']') ([^']'] ([^']']*']'))* ']'+) ([^'>'] (([^']']*']') ([^']'] ([^']']*']'))* ']'+))* '>'
rule token = parse
[...]
| "<![CDATA[" (ends_sq_sq_ang as data)
[...]
Here ends_sq_sq_ang is supposed to match strings ending in ]]> which may contain ] and >. If I give it an input like "foo]]]>bar]]>" (note the extra square bracket after foo), ocamllex matches the whole input instead of just "foo]]]>" as I would expect. But Micmatch, when given the same regexp, does the right thing. (The ']'+ bits are supposed to handle the "]]]>" case.)
I have probably done something stupid and am embarrassing myself by advertising it to the list, but I did check it carefully. Any idea why this doesn't work? Thanks,
Jake