caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Rope.of_string looses characters
@ 2011-07-21  0:13 pjfrey
  2011-07-21  2:37 ` Edgar Friendly
  0 siblings, 1 reply; 2+ messages in thread
From: pjfrey @ 2011-07-21  0:13 UTC (permalink / raw)
  To: caml-list

I am encountering a very baffling problem with Batteries:
BatRope.of_string looses characters when converting strings longer then
about 250 characters. This is so hard to imagine that I have written a
small demo program:

  let filename =  (try ((Sys.argv.(1))) with _ -> "chinese.txt") in
  let size = int_of_string (try ((Sys.argv.(2))) with _ -> "0") in
		(*  read some file into a string; this works fine *)
  let fd_in = BatFile.open_in filename in
  let fs = BatFile.size_of filename in
  let file_size = if size = 0 then fs else (min size fs) in
  let filestring = String.create file_size in
  let _ = ignore (BatIO.really_input fd_in filestring 0 file_size) in
  BatIO.close_in fd_in;
	  (* convert string to rope and back; rope MISSES CHARACTERS *)
  let rope = BatRope.of_string filestring in
  let rLen = BatRope.length rope in
  let reconverted = BatRope.to_string rope in
  let sLen = String.length reconverted in printf
   "len text:%i BatRope.of_string len:%i BatRope.to_string len:%i in" 
	   file_size		     rLen		      sLen;
  printf"Number of missing characters after BatRope.to_string:%i
in" (file_size-sLen);
  let x = filestring in let size' = sLen in let x' = reconverted in
  let rec show ix dropped lastError =
    if (ix - dropped ) = size' then () 
    else begin
      if x.[ix] <> x'.[ix - dropped] then begin
	printf"%i,%i\t%3i\n" ix dropped (ix - lastError); 
	show (succ ix) (succ dropped) ix
      end else 
	(* printf"%c" x.[ix];  *)
	show (succ ix) dropped lastError
    end  
  in show 0 0 0

./a.out ISO-8859-1 10000
len text:5000 BatRope.of_string len:4981 BatRope.to_string len:4981
Number of missing characters after BatRope.to_string:19
256,0	256
513,1	257
770,2	257
1030,3	260
1284,4	254
1548,5	264
1802,6	254
2057,7	255
2312,8	255
2569,9	257
2826,10 257
3083,11 257
3340,12 257
3597,13 257
3860,14 263
4111,15 251
4368,16 257
4625,17 257
4882,18 257

The list above shows the indexes of the dropped characters, the count of
dropped characters and the difference in the index between occurences of
misses.
With chinese text it looses entire characters so the list above looks
like:
738,0	738
739,1	  1
740,2	  1
1400,3	660
1401,4	  1
1402,5	  1
2024,6	622
2653,7	629
3362,8	709
3363,9	  1
3364,10   1
... thus it looses complete 3-byte characters, as expected.

It appears that BatRope.of_string splits the file into chunks, that are
not correctly re_assembled, but that would affect a lot of other code...

Please, somebody tell me I am imagining things and there is a silly
error in above code.

Peter Frey

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Caml-list] Rope.of_string looses characters
  2011-07-21  0:13 [Caml-list] Rope.of_string looses characters pjfrey
@ 2011-07-21  2:37 ` Edgar Friendly
  0 siblings, 0 replies; 2+ messages in thread
From: Edgar Friendly @ 2011-07-21  2:37 UTC (permalink / raw)
  To: caml-list

On 07/20/2011 08:13 PM, pjfrey@sympatico.ca wrote:
> I am encountering a very baffling problem with Batteries:
> BatRope.of_string looses characters when converting strings longer then
> about 250 characters. This is so hard to imagine that I have written a
> small demo program:
> Please, somebody tell me I am imagining things and there is a silly
> error in above code.
>
Not your fault - mine.  Apparently the rope code is less tested than 
thought.  Bug fixed by git commit d88dca5.  Time to add a bunch more tests.

E.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-07-21  2:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-21  0:13 [Caml-list] Rope.of_string looses characters pjfrey
2011-07-21  2:37 ` Edgar Friendly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).