Hi list,

I am trying to parse an RSS feed using OCaml-RSS, which uses XML-Light, which however does not support CDATA blocks. So I added support in the ocamllex-based lexer as follows:

  let ends_sq = [^']']* ']'
  let ends_sq_sq = ends_sq ([^']'] ends_sq)* ']'+
  let ends_sq_sq_ang = ends_sq_sq ([^'>'] ends_sq_sq)* '>'

or expanded:

  let ends_sq_sq_ang = (([^']']*']') ([^']'] ([^']']*']'))* ']'+) ([^'>'] (([^']']*']') ([^']'] ([^']']*']'))* ']'+))* '>'

  rule token = parse
  [...]
          | "<![CDATA[" (ends_sq_sq_ang as data)
  [...]

Here ends_sq_sq_ang is supposed to match strings ending in ]]> which may contain ] and >. If I give it an input like "foo]]]>bar]]>" (note the extra square bracket after foo), ocamllex matches the whole input instead of just "foo]]]>" as I would expect. But Micmatch, when given the same regexp, does the right thing. (The ']'+ bits are supposed to handle the "]]]>" case.)

I have probably done something stupid and am embarrassing myself by advertising it to the list, but I did check it carefully. Any idea why this doesn't work? Thanks,

Jake