caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Bytecode object files structure
@ 2006-11-12 14:42 Pierre-Etienne Meunier
  2006-11-12 14:56 ` [Caml-list] " Alain Frisch
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Pierre-Etienne Meunier @ 2006-11-12 14:42 UTC (permalink / raw)
  To: caml-list

Hi,

I'm trying to decrypt .cmo files produced by simple programs, such as
1+1;;
or
print_string "string";;
or
List.length [1;2;3;4;5];;

According to the source of Ocaml, there's something called the 
"cmo_magic_number", systematically written at the beginning of all .cmo 
files. Does it have a real function for executing the programs, or is it just 
a way to make sure the file contains ocaml bytecode ?

Then, there's the address of what seems to be the last bytecode instruction.
Then, the bytecode instructions, as documented in opcodes.ml.

After that, I can't understand anything : there vaguely seems to be some 
information related to linking or so... What is the precise structure of this 
part ? Is there some kind of a bytecode assembler ?

Thanks,
P.E. Meunier (pierreetienne.meunier@ens-lyon.fr)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Bytecode object files structure
  2006-11-12 14:42 Bytecode object files structure Pierre-Etienne Meunier
@ 2006-11-12 14:56 ` Alain Frisch
  2006-11-13  9:16 ` Yann Régis-Gianas
       [not found] ` <968382EE-B8CB-452C-A86F-684879E33798@free.fr>
  2 siblings, 0 replies; 5+ messages in thread
From: Alain Frisch @ 2006-11-12 14:56 UTC (permalink / raw)
  To: Pierre-Etienne Meunier; +Cc: caml-list

Pierre-Etienne Meunier wrote:
> According to the source of Ocaml, there's something called the 
> "cmo_magic_number", systematically written at the beginning of all .cmo 
> files. Does it have a real function for executing the programs, or is it just 
> a way to make sure the file contains ocaml bytecode ?

It is just a way to make sure that the file contains ocaml bytecode with
the expected version.

> After that, I can't understand anything : there vaguely seems to be some 
> information related to linking or so... What is the precise structure of this 
> part ? Is there some kind of a bytecode assembler ?

The structure is a compilation unit descriptor, described in
bytecomp/cmo_format.mli.

-- Alain



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Bytecode object files structure
  2006-11-12 14:42 Bytecode object files structure Pierre-Etienne Meunier
  2006-11-12 14:56 ` [Caml-list] " Alain Frisch
@ 2006-11-13  9:16 ` Yann Régis-Gianas
       [not found] ` <968382EE-B8CB-452C-A86F-684879E33798@free.fr>
  2 siblings, 0 replies; 5+ messages in thread
From: Yann Régis-Gianas @ 2006-11-13  9:16 UTC (permalink / raw)
  To: Pierre-Etienne Meunier; +Cc: caml-list

Hi,

The file tools/dumpobj.ml in the O'Caml tree may be used to parse the
object file. This should be a first step to understand the bytecode
file format.

Hope this help,

--
Yann Régis-Gianas



On 11/12/06, Pierre-Etienne Meunier <pierreetienne.meunier@ens-lyon.fr> wrote:
> Hi,
>
> I'm trying to decrypt .cmo files produced by simple programs, such as
> 1+1;;
> or
> print_string "string";;
> or
> List.length [1;2;3;4;5];;
>
> According to the source of Ocaml, there's something called the
> "cmo_magic_number", systematically written at the beginning of all .cmo
> files. Does it have a real function for executing the programs, or is it just
> a way to make sure the file contains ocaml bytecode ?
>
> Then, there's the address of what seems to be the last bytecode instruction.
> Then, the bytecode instructions, as documented in opcodes.ml.
>
> After that, I can't understand anything : there vaguely seems to be some
> information related to linking or so... What is the precise structure of this
> part ? Is there some kind of a bytecode assembler ?
>
> Thanks,
> P.E. Meunier (pierreetienne.meunier@ens-lyon.fr)
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


-- 
Yann


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Bytecode object files structure
       [not found] ` <968382EE-B8CB-452C-A86F-684879E33798@free.fr>
@ 2006-11-13 11:36   ` Pierre-Etienne Meunier
  0 siblings, 0 replies; 5+ messages in thread
From: Pierre-Etienne Meunier @ 2006-11-13 11:36 UTC (permalink / raw)
  To: Xavier Clerc; +Cc: caml-list

Hello,

I'd like to write an assembler, to be able to understand how the vm really 
works. I've to work on this for a school project (a compiler, I want it to 
output caml bytecode object files).

I've understood that the data part, after the code itself, was generated using 
output_value (I didn't know this function before).  What I don't get now are 
the cu_reloc, cu_primitives and cu_imports fields of the compilation_unit 
type.

If you can help on this,
Thanks
P.E. Meunier

On Monday 13 November 2006 11:53, Xavier Clerc wrote:
> Hello,
>
> As I read a substancial part of the ocaml source code, I may help you
> understanding file formats.
> Could you be more precise about what you are particularly interested
> in :
> 	- file type : bytecode file, cmo file, cmi file ?
> 	- code or data section of these files ?
>
> May I also ask you what you are trying to do using these elements ?
>
>
> Cordially,
>
> Xavier Clerc
>
> Le 12 nov. 06 à 15:42, Pierre-Etienne Meunier a écrit :
> > Hi,
> >
> > I'm trying to decrypt .cmo files produced by simple programs, such as
> > 1+1;;
> > or
> > print_string "string";;
> > or
> > List.length [1;2;3;4;5];;
> >
> > According to the source of Ocaml, there's something called the
> > "cmo_magic_number", systematically written at the beginning of
> > all .cmo
> > files. Does it have a real function for executing the programs, or
> > is it just
> > a way to make sure the file contains ocaml bytecode ?
> >
> > Then, there's the address of what seems to be the last bytecode
> > instruction.
> > Then, the bytecode instructions, as documented in opcodes.ml.
> >
> > After that, I can't understand anything : there vaguely seems to be
> > some
> > information related to linking or so... What is the precise
> > structure of this
> > part ? Is there some kind of a bytecode assembler ?
> >
> > Thanks,
> > P.E. Meunier (pierreetienne.meunier@ens-lyon.fr)
> >
> > _______________________________________________
> > Caml-list mailing list. Subscription management:
> > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> > Archives: http://caml.inria.fr
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Bytecode object files structure
@ 2006-11-15 13:41 Xavier Clerc
  0 siblings, 0 replies; 5+ messages in thread
From: Xavier Clerc @ 2006-11-15 13:41 UTC (permalink / raw)
  To: Pierre-Etienne Meunier


Le 13 nov. 06 à 16:50, Pierre-Etienne Meunier a écrit :

> Hello,
>
> I'd like to write an assembler, to be able to understand how the vm  
> really
> works. I've to work on this for a school project (a compiler, I  
> want it to
> output caml bytecode object files).

If you are working on a compiler that should output files to be  
executed by the ocaml runtime, it does not seem necessary to handle  
cmo/cmi files as the format of bytecode file should be sufficient to  
code your compiler. Unless you have to link with ocaml modules.


> I've understood that the data part, after the code itself, was  
> generated using
> output_value (I didn't know this function before).

This fonction is used by the Marshal module. It transforms any non- 
abstract value into a chain of bytes.
The format of marshalling can be understood from the extern_rec  
function of the byterun/extern.c file.


> What I don't get now are
> the cu_reloc, cu_primitives and cu_imports fields of the  
> compilation_unit
> type.

You should remember that cmo files are parts that will be put  
together (linked) in order to create a bytecode file.
Given this context :
	- cu_imports lists the name of imported (used) modules the current  
cmo should be linked with in order to produce a bytecode file (the  
digest of the imported modules is also kept to ensure that you link  
with the same version you compiled against) ;
	- cu_primitives lists the primitives declared by the current module  
(each 'external f : type1 -> type2 = "primitive" ' will result in a  
"primitive" entry of this list), needed to ensure that all required C  
primitives are provided ;
	- cu_reloc : as each module is compiled independently, it can  
declare some elements (e.g. global variables) and use them using a 0- 
based index ; thus, when you link several modules together, you have  
to relocate this information to ensure that the first module uses  
indexes from 0 to n, the second module uses indexes from n+1 to n+m  
and so on ...


Hope this helps,

Xavier Clerc

PS : I am working on some documents describing marshalling format,  
bytecode files as well as instruction opcodes.
I will hopefully release them before xmas but don't hold your breath  
as I don't have much spare time these days.
In the meantime, you can contact me off-list for any related question.

>
> If you can help on this,
> Thanks
> P.E. Meunier
>
> On Monday 13 November 2006 11:53, you wrote:
>> Hello,
>>
>> As I read a substancial part of the ocaml source code, I may help you
>> understanding file formats.
>> Could you be more precise about what you are particularly interested
>> in :
>> 	- file type : bytecode file, cmo file, cmi file ?
>> 	- code or data section of these files ?
>>
>> May I also ask you what you are trying to do using these elements ?
>>
>>
>> Cordially,
>>
>> Xavier Clerc
>>
>> Le 12 nov. 06 à 15:42, Pierre-Etienne Meunier a écrit :
>>> Hi,
>>>
>>> I'm trying to decrypt .cmo files produced by simple programs,  
>>> such as
>>> 1+1;;
>>> or
>>> print_string "string";;
>>> or
>>> List.length [1;2;3;4;5];;
>>>
>>> According to the source of Ocaml, there's something called the
>>> "cmo_magic_number", systematically written at the beginning of
>>> all .cmo
>>> files. Does it have a real function for executing the programs, or
>>> is it just
>>> a way to make sure the file contains ocaml bytecode ?
>>>
>>> Then, there's the address of what seems to be the last bytecode
>>> instruction.
>>> Then, the bytecode instructions, as documented in opcodes.ml.
>>>
>>> After that, I can't understand anything : there vaguely seems to be
>>> some
>>> information related to linking or so... What is the precise
>>> structure of this
>>> part ? Is there some kind of a bytecode assembler ?
>>>
>>> Thanks,
>>> P.E. Meunier (pierreetienne.meunier@ens-lyon.fr)
>>>
>>> _______________________________________________
>>> Caml-list mailing list. Subscription management:
>>> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
>>> Archives: http://caml.inria.fr
>>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-11-15 13:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-12 14:42 Bytecode object files structure Pierre-Etienne Meunier
2006-11-12 14:56 ` [Caml-list] " Alain Frisch
2006-11-13  9:16 ` Yann Régis-Gianas
     [not found] ` <968382EE-B8CB-452C-A86F-684879E33798@free.fr>
2006-11-13 11:36   ` Pierre-Etienne Meunier
2006-11-15 13:41 Xavier Clerc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).