* Re: Unsigned integers?
@ 2000-03-23 19:42 Damien Doligez
0 siblings, 0 replies; 14+ messages in thread
From: Damien Doligez @ 2000-03-23 19:42 UTC (permalink / raw)
To: caml-list; +Cc: maxs
>From: Max Skaller <maxs@in.ot.com.au>
>I would be happy to replace, in this code,
>evey use of 'lor', 'land', + - * < etc with
>'ulor' 'uland' 'uplus' 'uminus' 'uless' etc, if only
>I could define them. (I could do this in C .. but then,
>I could write the below routines in C too)
For ulor, uland, uplus, uminus, umult, as well as lsr and lsl, they
are identical to their signed counterparts, so you don't need to do
anything.
For uless, since you are only ever comparing to a positive constant
less than max_int, I suggest replacing "if i < constant" with
"if 0 <= i && i < constant".
>Note these operations MUST be extremely fast,
If the above works, I doubt you can go any faster. For more complex
code, you may have to use a full-blown unsigned comparison:
(not tested; could be wrong)
let uless x y = if (x < 0) = (y < 0) then x < y else x > y;;
The only difficulty would be with division and modulo, as noted by
Xavier, but I gather you don't need them for this application.
-- Damien
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Syntax for label, NEW PROPOSAL
@ 2000-03-15 13:58 Pierre Weis
2000-03-16 2:55 ` Jacques Garrigue
0 siblings, 1 reply; 14+ messages in thread
From: Pierre Weis @ 2000-03-15 13:58 UTC (permalink / raw)
To: Jacques Garrigue; +Cc: caml-list
[Sorry, no french version for this long message]
Abstract:
A long answer to Jacques's proposal. I do not discuss syntax but
semantic issues of the label extension. My conclusion is to be very
careful in adding labels into the standard libraries, and also state
as a extremely desirable design guideline to keep the usage of higher
order functions as simple as possible.
> *** Proposal
>
> Objective Caml 3.00 is not yet released, and I believe we can still
> have modifications on this point.
Yes, you're perfectly right, we can still modify several points.
However, I think there are many other points that are more important
than the choice of ``%'' instead of ``:'', which is only cosmetic
after all.
Thus, I would prefer to discuss deeper and more semantic problems:
-- Problem1: labels can be reserved keywords. This is questionable
and it has been strongly criticised by some Caml users, especially when
reading in the code the awful sequence fun:begin fun ...
-- Problem2: labels that spread all over the standard libraries, even
when they do not add any good. I would cite:
* the labels completely redundant with the types
(E.g. char:char in the type of String.contains or String.index)
* undesired labels: in many cases I don't want to have labels just
because I don't want to remember their names. (E.g., I very often
mispell the label acc since I've always used accu to name an
accumulator; furthermore, when I do not mispell this label, I feel
acc:accu extremely verbose). Also because labels are verbose at
application.
* labels that prevent you to use comfortably your traditional functions.
This is particularly evident for the List.map or List.fold_right
higher-order functionals.
This last point is a real problem. Compare the usual way of using
functionals to define the sum of the elements of a list:
$ ocaml
Objective Caml version 2.99+10
# let sum l = List.fold_right ( + ) l 0;;
val sum : int list -> int = <fun>
Clearly application is denoted in ML with only one character: a space.
Now, consider using the so-called ``Modern'' versions of these
functionals, obtained with the -modern option of the compiler:
$ ocamlpedantic
Objective Caml version 2.99+10
# let sum l = List.fold_right ( + ) l 0;;
^^^^^
This expression has type int -> int -> int but is here used with type 'a list
Clearly, there is something wrong now! We may remark that the error
message is not that clear, but this is a minor point, since error
messages are never clear enough anyway!
The real problem is that fixing the code makes no good at all to its
readability (at least that's what I would say):
# let sum l = List.fold_right fun:begin fun x acc:y -> x + y end acc:0;;
val sum : 'a -> int list -> int = <fun>
It seems that, in the ``modern'' mode, application of higher order
functions is now denoted by a new kind of parens opening by
``fun:begin fun'' and ending by ``end''. This is extremely explicit
but also a bit heavy (in my mind).
For all these reasons, I would suggest to carefully use labels into
the standard libraries:
-- remove labels from higher-order functional
-- remove redundant labels: when no ambiguity can occur you need not
to add a label.
-- use labels when typechecking ambiguity is evident (for instance
when there are two or more parameters with the same type).
Labels must enforce readability of code or help documenting the
libraries, it should not be an extra burden to the programmer and a
way of offuscating code.
Evidently, as any other extension, labels must not offuscate the
overall picture, that is they must not clobber the semantics, nor add
extra exceptional cases to the few general rules we have for the
syntax and semantics of Caml.
In this respect, optional labelled arguments might also be discussed,
particularly for the following facts:
-- syntactically identical patterns and expressions now may have
incompatible types:
# let f ?style:x _ = x;;
val f : ?style:'a -> 'b -> 'a option = <fun>
As a pattern on the left-hand side x has type 'a, while as an
expression on the right hand side it has type 'a option
-- some expressions can be only written as arguments in an application
context:
# let f ?style:x g = ?style:x;;
^
Syntax error
# let f ?style:x g = g ?style:x;;
val f : ?style:'a -> (?style:'a -> 'b) -> 'b = <fun>
-- the simple addition of a default value to an optional argument may
trigger a typechecking error:
# let f ?(style:x) g = g ?style:x;;
val f : ?style:'a -> (?style:'a -> 'b) -> 'b = <fun>
# let f ?(style:x = 1) g = g ?style:x;;
This expression has type int but is here used with type 'a option
Do not forget the design decision that has always been used before in
the development of Caml: interesting but not universal extensions to
the language must carefully be kept orthogonal to the core language
and its libraries. This has been successfully achieved for the
important addition of modules (that do not prevent the users from
using the old interface-implementation view of modules) as well as for
the objects system addition that has been also maintained orthogonal
to the rest of the language (in particular the standard library has
never been ``objectified''). I don't know of any reason why labels
cannot follow the same safe guidelines.
> Here is an alternative proposal, to use `%' in place of `:'. Labels
> are kept as a lexical entity. This still breaks some programs, since
> `%' was registered as infix, but this is not so bad.
> Con:
> * I still think that `:' looks better, particularly inside types.
> * On my keyboard I can type in `:' without pressing shift :-)
> * We will need some tool to convert existing code.
I think that % should be the infix integer modulo symbol.
> Do you think it would be better?
No.
> Are there people around who would rather keep `:' ?
Yes. However this is syntax and we have to consider semantics in the
first place.
There are also people around that would like to keep Caml a true
functional language, where usage of higer order functions is easy and
natural. We have to be careful not to lose what is the actual
strength of the language.
--
Pierre Weis
INRIA, Projet Cristal, http://pauillac.inria.fr/~weis
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Syntax for label, NEW PROPOSAL
2000-03-15 13:58 Syntax for label, NEW PROPOSAL Pierre Weis
@ 2000-03-16 2:55 ` Jacques Garrigue
2000-03-21 22:22 ` Unsigned integers? John Max Skaller
0 siblings, 1 reply; 14+ messages in thread
From: Jacques Garrigue @ 2000-03-16 2:55 UTC (permalink / raw)
To: Pierre.Weis; +Cc: caml-list
From: Pierre Weis <Pierre.Weis@inria.fr>
> > *** Proposal
> >
> > Objective Caml 3.00 is not yet released, and I believe we can still
> > have modifications on this point.
>
> Yes, you're perfectly right, we can still modify several points.
> However, I think there are many other points that are more important
> than the choice of ``%'' instead of ``:'', which is only cosmetic
> after all.
Well, I think I've got enough reactions to rule out %.
These are of the small details of syntax you want to check for
sure. And that are very hard to change afterwards.
> Thus, I would prefer to discuss deeper and more semantic problems:
>
> -- Problem1: labels can be reserved keywords. This is questionable
> and it has been strongly criticised by some Caml users, especially when
> reading in the code the awful sequence fun:begin fun ...
Well, maybe just because I've grown used to it, I do not find it awful
at all.
The rationale behind allowing the use of keywords as labels is that
there are lots of keywords in Caml, and many of them are good
candidates for labels. Not allowing them makes often choice more
difficult, or forces to use longer names.
On the other hand, their particular use in the standard library may be
another subject of discussion.
If I remember correctly, there are only 2 keywords used as label in
the standard library: `fun' (in functionals) and `to' (in output
functions).
If we want to remove them, I think a good policy would be to change
the "default" length of labels from 3 to 4. Since many Caml keywords
are of length 3, we avoid lots of conflicts!
This would mean:
fun -> func
to -> chan or dest
acc -> accu :-)
src -> orig (src meant a lot to a now disappeared company)
dst -> dest (maybe because I'm French, dst reminds me of something)
buf -> buff ???
...
Still it seems reasonable to keep pos: and len:.
Beware however that label changes are inherently dangerous: you often
realize afterwards that a choice was not so good, and I really hoped
that we could definitively fix the standard library labels in 3.00.
> -- Problem2: labels that spread all over the standard libraries, even
> when they do not add any good.
Well, the current policy (as defined in the manual) is to put labels
on at least all arguments but one (with some very special exceptions
like printf). This is necessary to allow commuting between arguments,
which is also an important theme of labels. As such I would not say
that they do not do any good.
> I would cite:
>
> * the labels completely redundant with the types
> (E.g. char:char in the type of String.contains or String.index)
Following the above policy, I would say that these are not ideal
choices, but then what other possibility is there?
Moreover, even if this label is redundant in types, it is not
necessarilly so in code.
> * undesired labels: in many cases I don't want to have labels just
> because I don't want to remember their names. (E.g., I very often
> mispell the label acc since I've always used accu to name an
> accumulator; furthermore, when I do not mispell this label, I feel
> acc:accu extremely verbose). Also because labels are verbose at
> application.
Isn't there a classic mode for that?
If you think that labels are verbose, you can use it, and it even
provides some partial support for label checking when you want it.
I suppose we should have a precise idea of what labels are there for.
For me they are a systematic mechanism, and the fact we allow to have
no label on one argument in function types is just a little comfort,
particularly useful when working with functionals.
> * labels that prevent you to use comfortably your traditional functions.
> This is particularly evident for the List.map or List.fold_right
> higher-order functionals.
>
> This last point is a real problem. Compare the usual way of using
> functionals to define the sum of the elements of a list:
>
> # let sum l = List.fold_right ( + ) l 0;;
> val sum : int list -> int = <fun>
Whether you consider this readable or not is a question of taste. I
personally believe that you need quite a background in HO formalisms
to understand the above. But of course most current Caml users have
this background.
> # let sum l = List.fold_right ( + ) l 0;;
> ^^^^^
> This expression has type int -> int -> int but is here used with type 'a list
Aargh, that's right, error messages are a pain. I believe I worked a
lot making them informative, but there are still difficult situations.
I'll try to improve this.
Remark however that, in modern mode, it is already strange to have an
application with no labels at all...
> The real problem is that fixing the code makes no good at all to its
> readability (at least that's what I would say):
>
> # let sum = List.fold_right fun:begin fun x acc:y -> x + y end acc:0;;
> val sum : 'a -> int list -> int = <fun>
You do not have to use begin .. end everywhere.
I do personally write:
# let sum = List.fold_right acc:0 fun:(fun x :acc -> x + acc);;
Remark that here you didn't care about whether acc is the first or
second argument of the passed function, just because (+) is
commutative. However, in general functions are not commutative, and I
always have to think a lot when using List.fold_left or
List.fold_right without labels, just because I may write wrong code
without the type checker telling me anything.
> For all these reasons, I would suggest to carefully use labels into
> the standard libraries:
>
> -- remove labels from higher-order functional
> -- remove redundant labels: when no ambiguity can occur you need not
> to add a label.
> -- use labels when typechecking ambiguity is evident (for instance
> when there are two or more parameters with the same type).
>
> Labels must enforce readability of code or help documenting the
> libraries, it should not be an extra burden to the programmer and a
> way of offuscating code.
This seems to boil down to using labels for documentation purposes
only. If you reduce them to such an extent, I'm afraid they will not
do very much for code readability anymore. And I stated above, you get
typechecking holes in functionals.
By the way, I do not see how labels can help obfuscating code.
What you are probably intending to say is that they make more
difficult to write some combinator-based HO-style code. Does it mean
we should have a version of the List module without labels for such
uses?
> Evidently, as any other extension, labels must not offuscate the
> overall picture, that is they must not clobber the semantics, nor add
> extra exceptional cases to the few general rules we have for the
> syntax and semantics of Caml.
>
> In this respect, optional labelled arguments might also be discussed,
> particularly for the following facts:
>
> -- syntactically identical patterns and expressions now may have
> incompatible types:
> # let f ?style:x _ = x;;
> val f : ?style:'a -> 'b -> 'a option = <fun>
>
> As a pattern on the left-hand side x has type 'a, while as an
> expression on the right hand side it has type 'a option
Well, internally the left-hand side is also 'a option, but option is
abbreviated because it is redundant. I do not think people want to see
the left-hand side option, but maybe this technical part should be
made more explicit in the manual.
Conversion from 'a to 'a option is done at application. I do not
think there is any semantical problem here.
> -- some expressions can be only written as arguments in an application
> context:
> # let f ?style:x g = ?style:x;;
?style:x is not an expression: labels are part of the application
node, not of the arguments.
> -- the simple addition of a default value to an optional argument may
> trigger a typechecking error:
>
> # let f ?(style:x) g = g ?style:x;;
> val f : ?style:'a -> (?style:'a -> 'b) -> 'b = <fun>
>
> # let f ?(style:x = 1) g = g ?style:x;;
> This expression has type int but is here used with type 'a option
That one is more serious. This is not a "simple addition". Default
values unroll some syntactic sugar, changing the type of the
argument. Whether this is a good idea to have such syntactic sugar in
the language is an interesting question. However I believe it provides
some real comfort.
> Do not forget the design decision that has always been used before in
> the development of Caml: interesting but not universal extensions to
> the language must carefully be kept orthogonal to the core language
> and its libraries. This has been successfully achieved for the
> important addition of modules (that do not prevent the users from
> using the old interface-implementation view of modules) as well as for
> the objects system addition that has been also maintained orthogonal
> to the rest of the language (in particular the standard library has
> never been ``objectified''). I don't know of any reason why labels
> cannot follow the same safe guidelines.
I just believed that it was agreed that labelizing the standard
library was a progress. Of course this also means that you do not stay
100% compatible in modern mode. Still, there are no optional arguments
in the standard library, meaning that you stay 100% compatible in classic
mode.
> There are also people around that would like to keep Caml a true
> functional language, where usage of higer order functions is easy and
> natural. We have to be careful not to lose what is the actual
> strength of the language.
Of course, I agree with that.
A first approach would be to say that one can always use classic mode.
Modern is not pedantic, it is just another typing discipline.
But this would endanger the unicity of the language, so this cannot be
an universal answer.
Now, the problem we are talking about seems to boil down to higher
order functions in modern mode. And more particularly List.fold_left
and List.fold_right, since these are about the only two functions
where the argument itself has labels.
It would cost nothing to add 4 more unlabelled functions to the List
module.
val foldl : fun:('a -> 'b -> 'a) -> 'a -> 'b list -> 'a
val foldr : fun:('a -> 'b -> 'b) -> 'a list -> 'b -> 'b
val foldl2 : fun:('a -> 'b -> 'c -> 'a) -> 'a -> 'b list -> 'c list -> 'a
val foldr2 : fun:('a -> 'b -> 'c -> 'c) -> 'a list -> 'b list -> 'c -> 'c
This way you can write
# let sum l = List.foldr fun:(+) l 0
The same additions would also apply to the Array module.
There is also the problem of iteri and mapi, where i: may be not
that important in practice (even when working on an array of integers,
I wouldn't expect anybody to make such a mistake).
val iteri : fun:(i:int -> 'a -> unit) -> 'a array -> unit
val mapi : fun:(i:int -> 'a -> 'b) -> 'a array -> 'b array
One may even insist that fun: is superfluous.
Then it would probably be better to have another List module without
labels at all. Again, this is pretty easy to do, and does not incur
any code blow in general.
Or do you think that the problem is deeper, and that labels are
breaking the foundations of the language ?
Amicalement,
Jacques
---------------------------------------------------------------------------
Jacques Garrigue Kyoto University garrigue at kurims.kyoto-u.ac.jp
<A HREF=http://wwwfun.kurims.kyoto-u.ac.jp/~garrigue/>JG</A>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Unsigned integers?
2000-03-16 2:55 ` Jacques Garrigue
@ 2000-03-21 22:22 ` John Max Skaller
2000-03-22 16:22 ` Sven LUTHER
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: John Max Skaller @ 2000-03-21 22:22 UTC (permalink / raw)
Cc: caml-list
I have some code for processing ISO-10646 characters and UTF-8,
which uses caml integers. ISO-10646 has 2^31 code points, which
can be covered by caml integers on a 32bit machine. Using an
unboxed type is mandatory for performance.
Unfortunately, caml integers are signed, which makes most of the
code I have written wrong (I haven't taken the care to handle
integers over 2^30 correctly).
What is the best way to handle this problem?
Would a (standard?) library module (written in C), that treats
integers as unsigned be a reasonable solution?
[This may require writing 'uint_add x y' instead of 'x+y',
but that doesn't matter in the above mentioned application,
since the integers are being used to represent characters]
--
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-21 22:22 ` Unsigned integers? John Max Skaller
@ 2000-03-22 16:22 ` Sven LUTHER
2000-03-23 2:08 ` Max Skaller
2000-03-22 17:05 ` Jean-Christophe Filliatre
2000-03-22 19:47 ` Xavier Leroy
2 siblings, 1 reply; 14+ messages in thread
From: Sven LUTHER @ 2000-03-22 16:22 UTC (permalink / raw)
To: John Max Skaller; +Cc: caml-list
On Wed, Mar 22, 2000 at 09:22:15AM +1100, John Max Skaller wrote:
> I have some code for processing ISO-10646 characters and UTF-8,
> which uses caml integers. ISO-10646 has 2^31 code points, which
> can be covered by caml integers on a 32bit machine. Using an
> unboxed type is mandatory for performance.
>
> Unfortunately, caml integers are signed, which makes most of the
> code I have written wrong (I haven't taken the care to handle
> integers over 2^30 correctly).
>
> What is the best way to handle this problem?
> Would a (standard?) library module (written in C), that treats
> integers as unsigned be a reasonable solution?
>
> [This may require writing 'uint_add x y' instead of 'x+y',
> but that doesn't matter in the above mentioned application,
> since the integers are being used to represent characters]
Just use the caml integer and ignore the fact that they are signed ?
after the moto : that doesn't matter in the above mentioned application,
since the integers are being used to represent characters]
But then i don't know what you use it for ...
And also, you would have to check exactly how integer overflow work, but in my
experience max_int+1 = min_int.
Friendly,
Sven LUTHER
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-22 16:22 ` Sven LUTHER
@ 2000-03-23 2:08 ` Max Skaller
2000-03-23 7:50 ` Sven LUTHER
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Max Skaller @ 2000-03-23 2:08 UTC (permalink / raw)
To: luther; +Cc: John Max Skaller, caml-list
Sven LUTHER wrote:
>
> On Wed, Mar 22, 2000 at 09:22:15AM +1100, John Max Skaller wrote:
> > I have some code for processing ISO-10646 characters and UTF-8,
> > which uses caml integers. ISO-10646 has 2^31 code points, which
> > can be covered by caml integers on a 32bit machine. Using an
> > unboxed type is mandatory for performance.
> >
> > Unfortunately, caml integers are signed, which makes most of the
> > code I have written wrong (I haven't taken the care to handle
> > integers over 2^30 correctly).
> >
> > What is the best way to handle this problem?
> > Would a (standard?) library module (written in C), that treats
> > integers as unsigned be a reasonable solution?
> >
> > [This may require writing 'uint_add x y' instead of 'x+y',
> > but that doesn't matter in the above mentioned application,
> > since the integers are being used to represent characters]
>
> Just use the caml integer and ignore the fact that they are signed ?
>
> after the moto : that doesn't matter in the above mentioned application,
Perhaps my explanation was unclear. In my code, I must
calculate a UTF-8 encoding from a ISO-10646 code point,
and calculate an ISO-10646 code point from a UTF-8 encoding.
The code is below. The code works for values <2^30,
but fails when and int goes negative.
I would be happy to replace, in this code,
evey use of 'lor', 'land', + - * < etc with
'ulor' 'uland' 'uplus' 'uminus' 'uless' etc, if only
I could define them. (I could do this in C .. but then,
I could write the below routines in C too)
Note these operations MUST be extremely fast,
and in particular, compact storage of ISO-10646
code points in arrays of integers is OK,
while arrays of boxed values is out of the question.
(So I can't use int32).
-------------------------------------------------------
let parse_utf8 (s : string) (i : int) : int * int =
let ord = int_of_char
and n = (String.length s) - i
in if n <= 0 then begin print_endline "FAILURE"; (-1),i end
else let lead = ord (s.[i]) in
if (lead land 0x80) = 0 then
lead land 0x7F,i+1 (* ASCII *)
else if lead land 0xE0 = 0xC0 && n > 1 then
((lead land 0x1F) lsl 6) lor
(ord(s.[i+1]) land 0x3F),i+2
else if lead land 0xF0 = 0xE0 && n > 2 then
((lead land 0x1F) lsl 12) lor
((ord(s.[i+1]) land 0x3F) lsl 6) lor
(ord(s.[i+2]) land 0x3F),i+3
else if lead land 0xF8 = 0xF0 && n > 3 then
((lead land 0x1F) lsl 18) lor
((ord(s.[i+1]) land 0x3F) lsl 12) lor
((ord(s.[i+2]) land 0x3F) lsl 6) lor
(ord(s.[i+3]) land 0x3F),i+4
else if lead land 0xFC = 0xF8 && n > 4 then
((lead land 0x1F) lsl 24) lor
((ord(s.[i+1]) land 0x3F) lsl 18) lor
((ord(s.[i+2]) land 0x3F) lsl 12) lor
((ord(s.[i+3]) land 0x3F) lsl 6) lor
(ord(s.[i+4]) land 0x3F),i+5
else if lead land 0xFE = 0xFC && n > 5 then
((lead land 0x1F) lsl 30) lor
((ord(s.[i+1]) land 0x3F) lsl 24) lor
((ord(s.[i+2]) land 0x3F) lsl 18) lor
((ord(s.[i+3]) land 0x3F) lsl 12) lor
((ord(s.[i+4]) land 0x3F) lsl 6) lor
(ord(s.[i+5]) land 0x3F),i+6
else lead, i+1 (* error, just use bad character *)
(* convert an integer into a utf-8 encoded string of bytes *)
let utf8_of_int i =
let chr x = String.make 1 (Char.chr x) in
if i < 0x80 then
chr(i)
else if i < 0x800 then
chr(0xC0 lor ((i lsr 6) land 0x1F)) ^
chr(0x80 lor (i land 0x3F))
else if i < 0x10000 then
chr(0xE0 lor ((i lsr 12) land 0xF)) ^
chr(0x80 lor ((i lsr 6) land 0x3F)) ^
chr(0x80 lor (i land 0x3F))
else if i < 0x200000 then
chr(0xF0 lor ((i lsr 18) land 0x7)) ^
chr(0x80 lor ((i lsr 12) land 0x3F)) ^
chr(0x80 lor ((i lsr 6) land 0x3F)) ^
chr(0x80 lor (i land 0x3F))
else if i < 0x4000000 then
chr(0xF8 lor ((i lsr 24) land 0x3)) ^
chr(0x80 lor ((i lsr 18) land 0x3F)) ^
chr(0x80 lor ((i lsr 12) land 0x3F)) ^
chr(0x80 lor ((i lsr 6) land 0x3F)) ^
chr(0x80 lor (i land 0x3F))
else chr(0xFC lor ((i lsr 30) land 0x1)) ^
chr(0x80 lor ((i lsr 24) land 0x3F)) ^
chr(0x80 lor ((i lsr 18) land 0x3F)) ^
chr(0x80 lor ((i lsr 12) land 0x3F)) ^
chr(0x80 lor ((i lsr 6) land 0x3F)) ^
chr(0x80 lor (i land 0x3F))
--
John (Max) Skaller at OTT [Open Telecommications Ltd]
mailto:maxs@in.ot.com.au -- at work
mailto:skaller@maxtal.com.au -- at home
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-23 2:08 ` Max Skaller
@ 2000-03-23 7:50 ` Sven LUTHER
2000-03-24 2:50 ` Jacques Garrigue
2000-03-24 14:50 ` Xavier Leroy
2 siblings, 0 replies; 14+ messages in thread
From: Sven LUTHER @ 2000-03-23 7:50 UTC (permalink / raw)
To: Max Skaller; +Cc: John Max Skaller, caml-list
On Thu, Mar 23, 2000 at 01:08:54PM +1100, Max Skaller wrote:
> Sven LUTHER wrote:
> >
> > On Wed, Mar 22, 2000 at 09:22:15AM +1100, John Max Skaller wrote:
> > > I have some code for processing ISO-10646 characters and UTF-8,
> > > which uses caml integers. ISO-10646 has 2^31 code points, which
> > > can be covered by caml integers on a 32bit machine. Using an
> > > unboxed type is mandatory for performance.
> > >
> > > Unfortunately, caml integers are signed, which makes most of the
> > > code I have written wrong (I haven't taken the care to handle
> > > integers over 2^30 correctly).
> > >
> > > What is the best way to handle this problem?
> > > Would a (standard?) library module (written in C), that treats
> > > integers as unsigned be a reasonable solution?
> > >
> > > [This may require writing 'uint_add x y' instead of 'x+y',
> > > but that doesn't matter in the above mentioned application,
> > > since the integers are being used to represent characters]
> >
> > Just use the caml integer and ignore the fact that they are signed ?
> >
> > after the moto : that doesn't matter in the above mentioned application,
>
> Perhaps my explanation was unclear. In my code, I must
> calculate a UTF-8 encoding from a ISO-10646 code point,
> and calculate an ISO-10646 code point from a UTF-8 encoding.
>
> The code is below. The code works for values <2^30,
> but fails when and int goes negative.
>
> I would be happy to replace, in this code,
> evey use of 'lor', 'land', + - * < etc with
> 'ulor' 'uland' 'uplus' 'uminus' 'uless' etc, if only
> I could define them. (I could do this in C .. but then,
> I could write the below routines in C too)
>
just redefine the above mentioned operations in caml, taking the overflow in
account, it should not be too difficult, altough it should be a bit less
efficient than the normal +, -, ... (altough i am not sure about it, maybe you
could just ignore it and use the normal functions. At least for + and - it
should work without problem.
to test them, use a function to print the type as unsigned int, and use
#install_printer to use it as default printer for ints.
Friendly
Sven LUTHER
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-23 2:08 ` Max Skaller
2000-03-23 7:50 ` Sven LUTHER
@ 2000-03-24 2:50 ` Jacques Garrigue
2000-03-24 15:59 ` Xavier Leroy
2000-03-25 4:03 ` John Max Skaller
2000-03-24 14:50 ` Xavier Leroy
2 siblings, 2 replies; 14+ messages in thread
From: Jacques Garrigue @ 2000-03-24 2:50 UTC (permalink / raw)
To: maxs; +Cc: caml-list
From: Max Skaller <maxs@in.ot.com.au>
> Note these operations MUST be extremely fast,
> and in particular, compact storage of ISO-10646
> code points in arrays of integers is OK,
> while arrays of boxed values is out of the question.
> (So I can't use int32).
If compact storage is the problem, ocaml 3.00 also provides bigarrays,
which allow you to store int32 values in flat arrays (even
multidimensional).
For the cost of boxing/unboxing in int32 computations, you will
probably have to test whether it meets your needs or not.
By the way, is there any plan to do for int32 the same kind of
optimizations as are done for floats (no boxing/unboxing in the middle
of a computation)? Already done?
---------------------------------------------------------------------------
Jacques Garrigue Kyoto University garrigue at kurims.kyoto-u.ac.jp
<A HREF=http://wwwfun.kurims.kyoto-u.ac.jp/~garrigue/>JG</A>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-24 2:50 ` Jacques Garrigue
@ 2000-03-24 15:59 ` Xavier Leroy
2000-03-25 4:03 ` John Max Skaller
1 sibling, 0 replies; 14+ messages in thread
From: Xavier Leroy @ 2000-03-24 15:59 UTC (permalink / raw)
To: Jacques Garrigue, maxs; +Cc: caml-list
> By the way, is there any plan to do for int32 the same kind of
> optimizations as are done for floats (no boxing/unboxing in the middle
> of a computation)? Already done?
Already done! For int32, nativeint, and even for int64 on 64-bit
processors.
The only difference between boxed integers and floats, as far as
boxing elimination in ocamlopt goes, is that there is a hack to unbox
floats in arrays, but no corresponding hack for arrays of boxed
integers. As Jacques said, the new Bigarray module does provide
arrays of unboxed int32 / nativeint / int64, although of a different
type than the standard Caml arrays.
- Xavier Leroy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-24 2:50 ` Jacques Garrigue
2000-03-24 15:59 ` Xavier Leroy
@ 2000-03-25 4:03 ` John Max Skaller
1 sibling, 0 replies; 14+ messages in thread
From: John Max Skaller @ 2000-03-25 4:03 UTC (permalink / raw)
To: Jacques Garrigue; +Cc: maxs, caml-list
Jacques Garrigue wrote:
> If compact storage is the problem, ocaml 3.00 also provides bigarrays,
> which allow you to store int32 values in flat arrays (even
> multidimensional).
BTW: all these new 'specialisations' for generic constructions
just shows that C++ isn't so bad after all. :-)
--
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-23 2:08 ` Max Skaller
2000-03-23 7:50 ` Sven LUTHER
2000-03-24 2:50 ` Jacques Garrigue
@ 2000-03-24 14:50 ` Xavier Leroy
2 siblings, 0 replies; 14+ messages in thread
From: Xavier Leroy @ 2000-03-24 14:50 UTC (permalink / raw)
To: Max Skaller, caml-list
> The code is below. The code works for values <2^30,
> but fails when and int goes negative.
It is easy to fix this. The "parse_utf8" function needs not be
modified. For "utf8_of_int", just replace all tests i < CST by
i >= 0 && i < CST, e.g.
> let utf8_of_int i =
> let chr x = String.make 1 (Char.chr x) in
> if i >= 0 && i < 0x80 then
> chr(i)
> else if i >= 0 && i < 0x800 then
> chr(0xC0 lor ((i lsr 6) land 0x1F)) ^
> chr(0x80 lor (i land 0x3F))
> else if i >= 0 && i < 0x10000 then
> chr(0xE0 lor ((i lsr 12) land 0xF)) ^
> chr(0x80 lor ((i lsr 6) land 0x3F)) ^
> chr(0x80 lor (i land 0x3F))
> else if i >= 0 && i < 0x200000 then
> chr(0xF0 lor ((i lsr 18) land 0x7)) ^
> chr(0x80 lor ((i lsr 12) land 0x3F)) ^
> chr(0x80 lor ((i lsr 6) land 0x3F)) ^
> chr(0x80 lor (i land 0x3F))
> else if i >= 0 && i < 0x4000000 then
> chr(0xF8 lor ((i lsr 24) land 0x3)) ^
> chr(0x80 lor ((i lsr 18) land 0x3F)) ^
> chr(0x80 lor ((i lsr 12) land 0x3F)) ^
> chr(0x80 lor ((i lsr 6) land 0x3F)) ^
> chr(0x80 lor (i land 0x3F))
> else chr(0xFC lor ((i lsr 30) land 0x1)) ^
> chr(0x80 lor ((i lsr 24) land 0x3F)) ^
> chr(0x80 lor ((i lsr 18) land 0x3F)) ^
> chr(0x80 lor ((i lsr 12) land 0x3F)) ^
> chr(0x80 lor ((i lsr 6) land 0x3F)) ^
> chr(0x80 lor (i land 0x3F))
or special-case i < 0 immediately and treat it as in the last "else"
clause.
> Note these operations MUST be extremely fast,
> and in particular, compact storage of ISO-10646
> code points in arrays of integers is OK,
> while arrays of boxed values is out of the question.
> (So I can't use int32).
If they MUST be extremely fast, you'd rather avoid the repeated "^"
operations and allocate and fill the resulting string directly, e.g.
> else if i >= 0 && i < 0x800 then begin
let res = String.create 2 in
res.[0] <- chr(0xC0 lor ((i lsr 6) land 0x1F));
res.[1] <- chr(0x80 lor (i land 0x3F));
res
end else ...
Hope this helps,
- Xavier Leroy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-21 22:22 ` Unsigned integers? John Max Skaller
2000-03-22 16:22 ` Sven LUTHER
@ 2000-03-22 17:05 ` Jean-Christophe Filliatre
2000-03-22 19:10 ` Markus Mottl
2000-03-23 2:41 ` Max Skaller
2000-03-22 19:47 ` Xavier Leroy
2 siblings, 2 replies; 14+ messages in thread
From: Jean-Christophe Filliatre @ 2000-03-22 17:05 UTC (permalink / raw)
To: John Max Skaller; +Cc: caml-list
In his message of Wed March 22, 2000, John Max Skaller writes:
>
> Unfortunately, caml integers are signed, which makes most of the
> code I have written wrong (I haven't taken the care to handle
> integers over 2^30 correctly).
>
> What is the best way to handle this problem?
> Would a (standard?) library module (written in C), that treats
> integers as unsigned be a reasonable solution?
I wrote such a C library to handle (boxed) 32 or 64 bits integers (you
can find it on my web page). But it appeared that it was not very
efficient, and when I rewrote my program using an encoding with two
Caml integers, it was really faster.
So I would suggest you to write such a library in Caml. For a good
starting point, you may have a look at the module Nativeint in ocaml
sources (in utils/nativeint.ml).
Best regards,
--
Jean-Christophe Filliatre
Computer Science Laboratory Phone (650) 859-5173
SRI International FAX (650) 859-2844
333 Ravenswood Ave. email filliatr@csl.sri.com
Menlo Park, CA 94025, USA web http://www.csl.sri.com/~filliatr
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-22 17:05 ` Jean-Christophe Filliatre
@ 2000-03-22 19:10 ` Markus Mottl
2000-03-23 2:41 ` Max Skaller
1 sibling, 0 replies; 14+ messages in thread
From: Markus Mottl @ 2000-03-22 19:10 UTC (permalink / raw)
To: filliatr; +Cc: OCAML
> So I would suggest you to write such a library in Caml. For a good
> starting point, you may have a look at the module Nativeint in ocaml
> sources (in utils/nativeint.ml).
Or even more conveniently: check out the current CVS-repository at INRIA!
Seems that the problems with integers are soon going to be history...
-> ocaml/stdlib/int32.mli
ocaml/stdlib/int64.mli
Though, I fear that unboxed, complete native integers will never be
supported. Anyway, if you only need complete 32-bit-ints, you may as well
purchase a real processor (Alpha)... ;-)
Best regards,
Markus Mottl
--
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-22 17:05 ` Jean-Christophe Filliatre
2000-03-22 19:10 ` Markus Mottl
@ 2000-03-23 2:41 ` Max Skaller
1 sibling, 0 replies; 14+ messages in thread
From: Max Skaller @ 2000-03-23 2:41 UTC (permalink / raw)
To: Jean-Christophe Filliatre; +Cc: John Max Skaller, caml-list
Jean-Christophe Filliatre wrote:
> I wrote such a C library to handle (boxed) 32 or 64 bits integers (you
> can find it on my web page). But it appeared that it was not very
> efficient, and when I rewrote my program using an encoding with two
> Caml integers, it was really faster.
>
> So I would suggest you to write such a library in Caml. For a good
> starting point, you may have a look at the module Nativeint in ocaml
> sources (in utils/nativeint.ml).
OK. This is probably the way to do it.
--
John (Max) Skaller at OTT [Open Telecommications Ltd]
mailto:maxs@in.ot.com.au -- at work
mailto:skaller@maxtal.com.au -- at home
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-21 22:22 ` Unsigned integers? John Max Skaller
2000-03-22 16:22 ` Sven LUTHER
2000-03-22 17:05 ` Jean-Christophe Filliatre
@ 2000-03-22 19:47 ` Xavier Leroy
2000-03-23 12:55 ` John Max Skaller
2 siblings, 1 reply; 14+ messages in thread
From: Xavier Leroy @ 2000-03-22 19:47 UTC (permalink / raw)
To: John Max Skaller; +Cc: caml-list
> I have some code for processing ISO-10646 characters and UTF-8,
> which uses caml integers. ISO-10646 has 2^31 code points, which
> can be covered by caml integers on a 32bit machine. Using an
> unboxed type is mandatory for performance.
OCaml 3.00 includes three new library modules, Int32, Int64 and
Nativeint, implementing (boxed) 32-bit, 64-bit and platform-native
integers, resepctively. (Platform-native integers are 32 bits on 32
bit processors and 64 bits on 64 bit processors). The native-code
compiler was modified to inline the operations on those types,
including elimination of unnecessary boxing/unboxing, like for floats.
That may or may not be efficient enough for your application.
> Unfortunately, caml integers are signed, which makes most of the
> code I have written wrong (I haven't taken the care to handle
> integers over 2^30 correctly).
Actually, on 2's-complement machines at least, arithmetic operations
over usigned integers are exactly identical to those over signed
integers of the same size, except divisio, modulus, and
comparisons <, >, <=, >=. So, for your application, Caml's "int"
type could be good enough, although you may need special comparison
functions (which you can write in C, using casts to unsigned long int,
or in Caml, by treating the sign bit specially).
- Xavier Leroy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unsigned integers?
2000-03-22 19:47 ` Xavier Leroy
@ 2000-03-23 12:55 ` John Max Skaller
0 siblings, 0 replies; 14+ messages in thread
From: John Max Skaller @ 2000-03-23 12:55 UTC (permalink / raw)
To: Xavier Leroy; +Cc: caml-list
Xavier Leroy wrote:
>
> > I have some code for processing ISO-10646 characters and UTF-8,
> > which uses caml integers. ISO-10646 has 2^31 code points, which
> > can be covered by caml integers on a 32bit machine. Using an
> > unboxed type is mandatory for performance.
>
> OCaml 3.00 includes three new library modules, Int32, Int64 and
> Nativeint, implementing (boxed) 32-bit, 64-bit and platform-native
> integers, resepctively. (Platform-native integers are 32 bits on 32
> bit processors and 64 bits on 64 bit processors). The native-code
> compiler was modified to inline the operations on those types,
> including elimination of unnecessary boxing/unboxing, like for floats.
> That may or may not be efficient enough for your application.
This is probably enough, provided I can write
conversions to/from ints.
--
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2000-03-27 17:17 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-03-23 19:42 Unsigned integers? Damien Doligez
-- strict thread matches above, loose matches on Subject: below --
2000-03-15 13:58 Syntax for label, NEW PROPOSAL Pierre Weis
2000-03-16 2:55 ` Jacques Garrigue
2000-03-21 22:22 ` Unsigned integers? John Max Skaller
2000-03-22 16:22 ` Sven LUTHER
2000-03-23 2:08 ` Max Skaller
2000-03-23 7:50 ` Sven LUTHER
2000-03-24 2:50 ` Jacques Garrigue
2000-03-24 15:59 ` Xavier Leroy
2000-03-25 4:03 ` John Max Skaller
2000-03-24 14:50 ` Xavier Leroy
2000-03-22 17:05 ` Jean-Christophe Filliatre
2000-03-22 19:10 ` Markus Mottl
2000-03-23 2:41 ` Max Skaller
2000-03-22 19:47 ` Xavier Leroy
2000-03-23 12:55 ` John Max Skaller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).