Record field label locality

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Record field label locality
@ 2008-08-10 10:04 Brighten Godfrey
  2008-08-10 19:38 ` [Caml-list] " Jon Harrop
  0 siblings, 1 reply; 11+ messages in thread
From: Brighten Godfrey @ 2008-08-10 10:04 UTC (permalink / raw)
  To: caml-list

Hi,

Here's something that I've wondered about for years; maybe someone  
here can enlighten me.  One of the few major annoyances in OCaml code  
style is that if I define a record in one module, say a Graph module:

     type t = {
         nodes : node_t array;
         }

then when I use it in another module, say with a graph variable `g',  
then I have to write `g.Graph.nodes' rather than `g.nodes'.

I can understand why a record field label has to be uniquely  
identified.  But can't the explicit naming of the Graph module  
usually be avoided, since the compiler will know that `g' is a  
`Graph.t'?  For example if I write something like

     let g : Graph.t = make_graph () in
     g.nodes

it seems to me that on the second line, the type of `g' and hence the  
meaning of `g.nodes' is unambiguous.

Thanks!
~Brighten Godfrey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Record field label locality
  2008-08-10 10:04 Record field label locality Brighten Godfrey
@ 2008-08-10 19:38 ` Jon Harrop
  2008-08-12 21:03   ` Brighten Godfrey
  0 siblings, 1 reply; 11+ messages in thread
From: Jon Harrop @ 2008-08-10 19:38 UTC (permalink / raw)
  To: caml-list

On Sunday 10 August 2008 11:04:37 Brighten Godfrey wrote:
> Hi,
>
> Here's something that I've wondered about for years; maybe someone
> here can enlighten me.  One of the few major annoyances in OCaml code
> style is that if I define a record in one module, say a Graph module:
>
>      type t = {
>          nodes : node_t array;
>          }
>
> then when I use it in another module, say with a graph variable `g',
> then I have to write `g.Graph.nodes' rather than `g.nodes'.
>
> I can understand why a record field label has to be uniquely
> identified.  But can't the explicit naming of the Graph module
> usually be avoided, since the compiler will know that `g' is a
> `Graph.t'?  For example if I write something like
>
>      let g : Graph.t = make_graph () in
>      g.nodes
>
> it seems to me that on the second line, the type of `g' and hence the
> meaning of `g.nodes' is unambiguous.

Although this is a topic of debate and is even something that F# has done 
differently in exactly the way you describe for the reason you describe, I 
would certainly not call it a "major annoyance" in OCaml and, in fact, I 
would not even describe the alternative as "better".

In correct OCaml code, function bodies basically don't care what type 
annotations appear above them, including in the function definition. This is 
very useful because it makes the code compositional. The biggest problem with 
the "solution" you refer to is that code is no longer compositional because 
the body of a function now relies upon the type annotations in the function 
definition that it came from in order to be correct: move the body expression 
to another location that does not happen to be preceeded by the same 
annotations and it will no longer compile. Moreover, you have created a 
stumbling block for users who are new to type inference (currently almost all 
newbies) because the error messages they get are unexpected and totally 
unnecessary. Finally, if you had just defined a local record type with a 
field of the same name then your type annotation must shadow it with the 
field from the Graph module, leading to even more obscure errors.

I think breaking the ability to compose expressions is introducing a more 
serious flaw because compositionality is one of the critical ingredients that 
makes OCaml so productive. So my personal opinion is that this approach to 
disambiguation should only be used to avoid an explosion in the number of 
arithmetic operators because that is the only situation where I perceive 
OCaml's current solution to be a major annoyance.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Record field label locality
  2008-08-10 19:38 ` [Caml-list] " Jon Harrop
@ 2008-08-12 21:03   ` Brighten Godfrey
  2008-08-13  0:12     ` Edgar Friendly
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Brighten Godfrey @ 2008-08-12 21:03 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Aug 10, 2008, at 12:38 PM, Jon Harrop wrote:

> On Sunday 10 August 2008 11:04:37 Brighten Godfrey wrote:
>> Hi,
>>
>> Here's something that I've wondered about for years; maybe someone
>> here can enlighten me.  One of the few major annoyances in OCaml code
>> style is that if I define a record in one module, say a Graph module:
>>
>>      type t = {
>>          nodes : node_t array;
>>          }
>>
>> then when I use it in another module, say with a graph variable `g',
>> then I have to write `g.Graph.nodes' rather than `g.nodes'.
>>
>> I can understand why a record field label has to be uniquely
>> identified.  But can't the explicit naming of the Graph module
>> usually be avoided, since the compiler will know that `g' is a
>> `Graph.t'?  For example if I write something like
>>
>>      let g : Graph.t = make_graph () in
>>      g.nodes
>>
>> it seems to me that on the second line, the type of `g' and hence the
>> meaning of `g.nodes' is unambiguous.
>
> Although this is a topic of debate and is even something that F#  
> has done
> differently in exactly the way you describe for the reason you  
> describe, I
> would certainly not call it a "major annoyance" in OCaml and, in  
> fact, I
> would not even describe the alternative as "better".
>
> In correct OCaml code, function bodies basically don't care what type
> annotations appear above them, including in the function  
> definition. This is
> very useful because it makes the code compositional. The biggest  
> problem with
> the "solution" you refer to is that code is no longer compositional  
> because
> the body of a function now relies upon the type annotations in the  
> function
> definition that it came from in order to be correct: move the body  
> expression
> to another location that does not happen to be preceeded by the same
> annotations and it will no longer compile.

I think I see what you're getting at.  Is it possible to define  
compositionality as follows?:  "Removing a type annotation from  
correct OCaml code results in correct OCaml code."

I would claim that the current syntax (`g.Graph.nodes' in the above  
example) is effectively a type annotation that permits the type  
inference that `g' is a `Graph.t'.  The annoying bit is that you are  
required to use every single time you use a record in `g'.  I  
understand this is not the way the compiler views it.  But it's the  
way I think about it when I'm programming and I suspect many others  
are in the same situation.

Actually, what I want seems to be the way OCaml treats methods in  
objects: given an object, you can name the method directly without  
mentioning its module.  I can write a function

     let f x = x#some_method "argument"

where `x' might be an object defined in another module, or locally.   
Why can't records be handled like this?

> Moreover, you have created a
> stumbling block for users who are new to type inference (currently  
> almost all
> newbies) because the error messages they get are unexpected and  
> totally
> unnecessary. Finally, if you had just defined a local record type  
> with a
> field of the same name then your type annotation must shadow it  
> with the
> field from the Graph module, leading to even more obscure errors.

I'm not sure why the error messages must necessarily be confusing. I  
can see shadowing being confusing, but that can happen already.

Thanks very much for your reply!
~Brighten Godfrey


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Record field label locality
  2008-08-12 21:03   ` Brighten Godfrey
@ 2008-08-13  0:12     ` Edgar Friendly
  2008-08-13  1:17       ` Brighten Godfrey
  2008-08-13  1:51     ` blue storm
  2008-08-13  8:14     ` Richard Jones
  2 siblings, 1 reply; 11+ messages in thread
From: Edgar Friendly @ 2008-08-13  0:12 UTC (permalink / raw)
  To: Brighten Godfrey; +Cc: Jon Harrop, caml-list

Brighten Godfrey wrote:
> Actually, what I want seems to be the way OCaml treats methods in
> objects: given an object, you can name the method directly without
> mentioning its module.  I can write a function
> 
>     let f x = x#some_method "argument"
> 
> where `x' might be an object defined in another module, or locally.  Why
> can't records be handled like this?
> 

1) Implementation
Record field access is almost identical to array lookup -- internally,
records are stored as arrays and during compilation the field name gets
translated to the correct index to get.  But since type information goes
away after compilation (including record field names), there's no way to
do the same kind of dispatch you get with objects.

Then you run into the problem that the record field labels aren't
global, so you could have the same label as different indexes in
different modules.  Thus the compiler needs to know which module that
record field came from to do the conversion to field index.

2) Typing of x.field

Given the following:

module M1 = struct type t = { f1 : int } end
module M2 = struct type t2 = { f2 : int; f1: string } end

let get_f1 x = x.f1

How should f1 be typed?   M1.t -> int  or M2.t2 -> string?  And how to
deal with separate compilation, such that M1 and M2 aren't even in the
same file as get_f1?

E.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Record field label locality
  2008-08-13  0:12     ` Edgar Friendly
@ 2008-08-13  1:17       ` Brighten Godfrey
  2008-08-13 12:48         ` Edgar Friendly
  0 siblings, 1 reply; 11+ messages in thread
From: Brighten Godfrey @ 2008-08-13  1:17 UTC (permalink / raw)
  To: Edgar Friendly; +Cc: Jon Harrop, caml-list

On Aug 12, 2008, at 5:12 PM, Edgar Friendly wrote:

> Brighten Godfrey wrote:
>> Actually, what I want seems to be the way OCaml treats methods in
>> objects: given an object, you can name the method directly without
>> mentioning its module.  I can write a function
>>
>>     let f x = x#some_method "argument"
>>
>> where `x' might be an object defined in another module, or  
>> locally.  Why
>> can't records be handled like this?
>>
>
> 1) Implementation
> Record field access is almost identical to array lookup -- internally,
> records are stored as arrays and during compilation the field name  
> gets
> translated to the correct index to get.  But since type information  
> goes
> away after compilation (including record field names), there's no  
> way to
> do the same kind of dispatch you get with objects.
>
> Then you run into the problem that the record field labels aren't
> global, so you could have the same label as different indexes in
> different modules.  Thus the compiler needs to know which module that
> record field came from to do the conversion to field index.

Thanks, that helps.  But what about handling typing as with objects,  
but without the dynamic dispatch that you get with objects?  (see below)

> 2) Typing of x.field
>
> Given the following:
>
> module M1 = struct type t = { f1 : int } end
> module M2 = struct type t2 = { f2 : int; f1: string } end
>
> let get_f1 x = x.f1
>
> How should f1 be typed?   M1.t -> int  or M2.t2 -> string?  And how to
> deal with separate compilation, such that M1 and M2 aren't even in the
> same file as get_f1?

Two things come to mind:

(1) The type of get_f1 is handled analogously to the way it is  
handled for objects, something like this:

     val get_f1 : < x : 'a; .. > -> 'a = <fun>

I'm guessing that if you did this, you would have to "instantiate"  
`get_f1' each time it is applied to a new record type, which I assume  
is inconvenient (or not?).

(2) Require that all record field accesses refer to a globally-unique  
record type, making conversion to a record field index is easy.  So  
the example code Edgar gave would result in a compilation error  
because the compiler cannot determine which `.f1' field the access  
refers to.  But consider this code:

     let return_garlic () =
         let x = {M2.f2=5; M2.f1="garlic"} in
         x.f1

In line 2, globally unique record field names are given, which allows  
the compiler to tag variable `x' with type `M2.t2'.  Then in line 3,  
the record field access `x.f1' can only mean `x.M2.f1'.

Summary:  I can see why it is useful to require that each field  
access be mapped to a globally-unique record type.  OCaml today does  
this by having the programmer explicitly name a globally-unique  
record type with every field access.  But couldn't this instead be  
done by type inference?

Thanks very much for your reply.
~Brighten

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Record field label locality
  2008-08-12 21:03   ` Brighten Godfrey
  2008-08-13  0:12     ` Edgar Friendly
@ 2008-08-13  1:51     ` blue storm
  2008-08-13  8:14     ` Richard Jones
  2 siblings, 0 replies; 11+ messages in thread
From: blue storm @ 2008-08-13  1:51 UTC (permalink / raw)
  To: Brighten Godfrey; +Cc: Jon Harrop, caml-list

[-- Attachment #1: Type: text/plain, Size: 206 bytes --]

It might be a bit off-topic, but if you want to ease the syntaxic pain only,
you can use the pa_openin ( http://alain.frisch.fr/soft.html#openin ) camlp4
extension :

  open Graph in { g with nodes = foo }

[-- Attachment #2: Type: text/html, Size: 294 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Record field label locality
  2008-08-12 21:03   ` Brighten Godfrey
  2008-08-13  0:12     ` Edgar Friendly
  2008-08-13  1:51     ` blue storm
@ 2008-08-13  8:14     ` Richard Jones
  2008-08-13  9:30       ` Brighten Godfrey
  2 siblings, 1 reply; 11+ messages in thread
From: Richard Jones @ 2008-08-13  8:14 UTC (permalink / raw)
  To: Brighten Godfrey; +Cc: Jon Harrop, caml-list

On Tue, Aug 12, 2008 at 02:03:46PM -0700, Brighten Godfrey wrote:
> I think I see what you're getting at.  Is it possible to define  
> compositionality as follows?:

I think Jon means that you can copy and paste code around and it still
works.

> "Removing a type annotation from  
> correct OCaml code results in correct OCaml code."

This is mostly correct.  However very occasionally it is necessary to
help the compiler out by annotating expressions with types.  I believe
this is because type inference used by OCaml is undecidable.  You'll
notice this effect more often if you use OCaml's object system.

> I would claim that the current syntax (`g.Graph.nodes' in the above  
> example) is effectively a type annotation that permits the type  
> inference that `g' is a `Graph.t'.  The annoying bit is that you are  
> required to use every single time you use a record in `g'.

You might want to try renaming the Graph module, ie:

  module G = Graph

  ... g.G.nodes ...

Or if you have control over the module itself, you could also try
renaming the fields to make them unique (eg. g_nodes), at which point
you can just 'open Graph'.  There are different trade-offs to each
approach.

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Record field label locality
  2008-08-13  8:14     ` Richard Jones
@ 2008-08-13  9:30       ` Brighten Godfrey
  0 siblings, 0 replies; 11+ messages in thread
From: Brighten Godfrey @ 2008-08-13  9:30 UTC (permalink / raw)
  To: Richard Jones, blue storm, Jon Harrop, OCaml List

On Aug 13, 2008, at 1:14 AM, Richard Jones wrote:

> On Tue, Aug 12, 2008 at 02:03:46PM -0700, Brighten Godfrey wrote:
>> I think I see what you're getting at.  Is it possible to define
>> compositionality as follows?:
>
> I think Jon means that you can copy and paste code around and it still
> works.

(...which is not true for various other reasons (e.g. conflicting  
type annotations, variables with different types, etc.).)

>> "Removing a type annotation from
>> correct OCaml code results in correct OCaml code."
>
> This is mostly correct.  However very occasionally it is necessary to
> help the compiler out by annotating expressions with types.  I believe
> this is because type inference used by OCaml is undecidable.  You'll
> notice this effect more often if you use OCaml's object system.

Yes, I've noticed that.


> You might want to try renaming the Graph module, ie:
>
>   module G = Graph
>
>   ... g.G.nodes ...
>
> Or if you have control over the module itself, you could also try
> renaming the fields to make them unique (eg. g_nodes), at which point
> you can just 'open Graph'.  There are different trade-offs to each
> approach.

On Aug 12, 2008, at 6:51 PM, blue storm wrote:
> It might be a bit off-topic, but if you want to ease the syntaxic  
> pain only, you can use the pa_openin (http://alain.frisch.fr/ 
> soft.html#openin ) camlp4 extension :
>
>   open Graph in { g with nodes = foo }


Thanks to both of you for suggesting these workarounds.  Probably  
renaming the module is the easiest and least likely to cause other  
problems.  I am still curious about the language design question  
though...

Thanks,
~Brighten


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Record field label locality
  2008-08-13  1:17       ` Brighten Godfrey
@ 2008-08-13 12:48         ` Edgar Friendly
  2008-08-14  6:38           ` Brighten Godfrey
  0 siblings, 1 reply; 11+ messages in thread
From: Edgar Friendly @ 2008-08-13 12:48 UTC (permalink / raw)
  To: Brighten Godfrey; +Cc: Jon Harrop, caml-list

Brighten Godfrey wrote:
> Two things come to mind:
> 
> (1) The type of get_f1 is handled analogously to the way it is handled
> for objects, something like this:
> 
>     val get_f1 : < x : 'a; .. > -> 'a = <fun>
> 
> I'm guessing that if you did this, you would have to "instantiate"
> `get_f1' each time it is applied to a new record type, which I assume is
> inconvenient (or not?).
> 
Yes - this breaks separate compilation.

> (2) Require that all record field accesses refer to a globally-unique
> record type, making conversion to a record field index is easy.  So the
> example code Edgar gave would result in a compilation error because the
> compiler cannot determine which `.f1' field the access refers to.  But
> consider this code:
> 
>     let return_garlic () =
>         let x = {M2.f2=5; M2.f1="garlic"} in
>         x.f1
> 
> In line 2, globally unique record field names are given, which allows
> the compiler to tag variable `x' with type `M2.t2'.  Then in line 3, the
> record field access `x.f1' can only mean `x.M2.f1'.
> 
In this situation, the type information influences what code is
generated.  The OCaml developers have been as careful as possible to
avoid this.  The typing stage of compilation acts as a filter to
eliminate incorrect programs, and that's it.

I've had my share of ideas of how typing information could usefully
connect with code generation, but have been shut down because the ocaml
developers (probably rightly) don't want to bridge this separation.
(Probably because the quality of the compiler would go down the tubes
right quick.)  Although...  hmmm..  I guess type information is used for
 some optimization (specializing = for ints and such).

Also you lose the compositionality as before - you can't break this
function into two parts because the second line "needs" the first line
to work.

> Summary:  I can see why it is useful to require that each field access
> be mapped to a globally-unique record type.  OCaml today does this by
> having the programmer explicitly name a globally-unique record type with
> every field access.  But couldn't this instead be done by type inference?
> 
> Thanks very much for your reply.
> ~Brighten


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Record field label locality
  2008-08-13 12:48         ` Edgar Friendly
@ 2008-08-14  6:38           ` Brighten Godfrey
  2008-08-14 10:11             ` David Allsopp
  0 siblings, 1 reply; 11+ messages in thread
From: Brighten Godfrey @ 2008-08-14  6:38 UTC (permalink / raw)
  To: Edgar Friendly; +Cc: Jon Harrop, OCaml List

On Aug 13, 2008, at 5:48 AM, Edgar Friendly wrote:

> Brighten Godfrey wrote:
>> Two things come to mind:
>>
>> (1) The type of get_f1 is handled analogously to the way it is  
>> handled
>> for objects, something like this:
>>
>>     val get_f1 : < x : 'a; .. > -> 'a = <fun>
>>
>> I'm guessing that if you did this, you would have to "instantiate"
>> `get_f1' each time it is applied to a new record type, which I  
>> assume is
>> inconvenient (or not?).
>>
> Yes - this breaks separate compilation.
>

Makes sense.

>> (2) Require that all record field accesses refer to a globally-unique
>> record type, making conversion to a record field index is easy.   
>> So the
>> example code Edgar gave would result in a compilation error  
>> because the
>> compiler cannot determine which `.f1' field the access refers to.   
>> But
>> consider this code:
>>
>>     let return_garlic () =
>>         let x = {M2.f2=5; M2.f1="garlic"} in
>>         x.f1
>>
>> In line 2, globally unique record field names are given, which allows
>> the compiler to tag variable `x' with type `M2.t2'.  Then in line  
>> 3, the
>> record field access `x.f1' can only mean `x.M2.f1'.
>>
> In this situation, the type information influences what code is
> generated.  The OCaml developers have been as careful as possible to
> avoid this.  The typing stage of compilation acts as a filter to
> eliminate incorrect programs, and that's it.
>
> I've had my share of ideas of how typing information could usefully
> connect with code generation, but have been shut down because the  
> ocaml
> developers (probably rightly) don't want to bridge this separation.
> (Probably because the quality of the compiler would go down the tubes
> right quick.)  Although...  hmmm..  I guess type information is  
> used for
>  some optimization (specializing = for ints and such).

This is a good point.  Thanks for the explanation.  I'm having a hard  
time thinking of any case other than =,<,> etc where type information  
would be necessary to determine code generation.  On the other hand  
if you break the separation for those operators, maybe it's OK to  
break it for record names as well.

> Also you lose the compositionality as before - you can't break this
> function into two parts because the second line "needs" the first line
> to work.

It can still work, for example this would work:

     let garlic_part_1 () = {M2.f2=5; M2.f1="garlic"}
     let garlic_part_2 x = x.f1

     let return_garlic () =
	garlic_part_2 (garlic_part_1 ())

Using this notation the programmer could of course choose to always  
specify the full record name (x.M2.f1) if desired.

~Brighten


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Caml-list] Record field label locality
  2008-08-14  6:38           ` Brighten Godfrey
@ 2008-08-14 10:11             ` David Allsopp
  0 siblings, 0 replies; 11+ messages in thread
From: David Allsopp @ 2008-08-14 10:11 UTC (permalink / raw)
  To: 'Brighten Godfrey', 'Edgar Friendly'; +Cc: 'OCaml List'

> This is a good point.  Thanks for the explanation.  I'm having a hard  
> time thinking of any case other than =,<,> etc where type information  
> would be necessary to determine code generation.  On the other hand  
> if you break the separation for those operators, maybe it's OK to  
> break it for record names as well.

Until OCaml knows the type of the record, field access can't happen either -
because OCaml doesn't know how to map the field name to a field number
within the block representing the value (I'd highly recommend reading
Section 18.2 of the manual - even if you never plan on linking C code, it
gives a good insight into how the OCaml code you write deals with values).

Edgar's point is that after type-checking you throw away all of the type
inferences. So when generating the code, the code generator sees [x.f1]
again and has no idea what type the field [f1] comes from - with x.M2.f1 it
knows to look at the *declared* type [M2.t] and so discovers that the field
named [f1] will be stored in field 1 of the block representing the value.

> > Also you lose the compositionality as before - you can't break this
> > function into two parts because the second line "needs" the first line
> > to work.
>
> It can still work, for example this would work:
>
>      let garlic_part_1 () = {M2.f2=5; M2.f1="garlic"}
>      let garlic_part_2 x = x.f1
>
>      let return_garlic () =
>	garlic_part_2 (garlic_part_1 ())

A brief aside (but I think relevant given your complaint!) - you only have
to name one of the fields fully when declaring a record - so your first line
can be:

let garlic_part_1 () = {M2.f2=5; f1="garlic"}

which does save a little typing!

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-08-14 10:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-10 10:04 Record field label locality Brighten Godfrey
2008-08-10 19:38 ` [Caml-list] " Jon Harrop
2008-08-12 21:03   ` Brighten Godfrey
2008-08-13  0:12     ` Edgar Friendly
2008-08-13  1:17       ` Brighten Godfrey
2008-08-13 12:48         ` Edgar Friendly
2008-08-14  6:38           ` Brighten Godfrey
2008-08-14 10:11             ` David Allsopp
2008-08-13  1:51     ` blue storm
2008-08-13  8:14     ` Richard Jones
2008-08-13  9:30       ` Brighten Godfrey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).