type inference for python

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* type inference for python
@ 2005-02-01  5:26 Philippe Fremy
  2005-02-01  6:56 ` [Caml-list] " Richard Cole
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Philippe Fremy @ 2005-02-01  5:26 UTC (permalink / raw)
  To: caml-list

	Hi,

I would like to implement something similar to the type inference of 
ocaml for the python language. I have always found it very impressive 
(although I have only used caml light).

I have no experience with the topic, it is just a project that seems 
cool to me :-)

Do you have any hints or where I could build up my knowledge (code, 
books, article, ...) to do that kind of thing.

I imagine that it works in a kind of three way pass:

1. analyse all the constraints of the code
Ex:
def f(a): a.append(1)
def g(a): a=a+1; f(a)
g('coucou')

=> a must support append
=> a must also be an int

2. cross-validate the constraints consistency
=> inconsistent assertions

3. validate the constraints against reality
=> g('coucou') will not work

The 2. and 3. looks like the most difficult aspect of it.

	regards,

	Philippe

-- 
Philippe Fremy
InSeal Technical Director
tel: +33 6 07 98 76 44

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] type inference for python
  2005-02-01  5:26 type inference for python Philippe Fremy
@ 2005-02-01  6:56 ` Richard Cole
  2005-02-01  6:58 ` skaller
  2005-02-01  7:25 ` Nicolas Cannasse
  2 siblings, 0 replies; 7+ messages in thread
From: Richard Cole @ 2005-02-01  6:56 UTC (permalink / raw)
  To: Philippe Fremy; +Cc: caml-list

Philippe Fremy wrote:

>
>     Hi,
>
> I would like to implement something similar to the type inference of 
> ocaml for the python language. I have always found it very impressive 
> (although I have only used caml light).
>
> I have no experience with the topic, it is just a project that seems 
> cool to me :-)

Yeah me too :)

Checkout "Demand-Driven Type Inference with Subgoal Pruning: Trading 
Precision for Scalability." by Alexander Spoon. Google can finds the 
paper if you type in the title. This paper gives a nice description of 
why type inference is hard thing to do in a dynamicly typed OO language 
like Python (aka smalltalk) and gives pointers to some work in the area 
including Cecile and Squeak. There was a later paper published by the 
same author in 04 but its not on the web yet :(

I spent a little time considering adding interfaces to Ruby 
[http://kvo.itee.uq.edu.au/twiki/bin/view/Main/RjBlog39] which is pretty 
similar to Python, I guess. The main motivation for me, in adding 
interfaces to a language like Ruby, is to make the software structure 
explicit, and to give coders a place to document contracts that are 
shared by a number of classes. So I don't mind if the type inference 
isn't complete in the sense that every variable gets assigned some sort 
of minimal type.

I'm sure one of the Ocaml guys can give you a list of references to the 
ML type system. Is there a canonical reference of the Ocaml type system?

regards,

Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] type inference for python
  2005-02-01  5:26 type inference for python Philippe Fremy
  2005-02-01  6:56 ` [Caml-list] " Richard Cole
@ 2005-02-01  6:58 ` skaller
  2005-02-01 10:54   ` Philippe Fremy
  2005-02-01  7:25 ` Nicolas Cannasse
  2 siblings, 1 reply; 7+ messages in thread
From: skaller @ 2005-02-01  6:58 UTC (permalink / raw)
  To: Philippe Fremy; +Cc: caml-list

On Tue, 2005-02-01 at 16:26, Philippe Fremy wrote:
> 	Hi,
> 
> I would like to implement something similar to the type inference of 
> ocaml for the python language. I have always found it very impressive 
> (although I have only used caml light).
> 
> I have no experience with the topic, 

I do. I did considerable work on this. The project
was called Vyper. The key idea was to build an 
interpreter which was used to initialise all the
dictionaries, then analyse them for type information,
and generate code that used that type information.

I got as far as getting the interpreter to work very well,
running most standard pure python plus some emulations of
popular C extensions, and with reasonable performance.
In addition, I made some extensions, many of which
got into later versions of Python .. including proper
lexical scoping, garbage collection (well, my version
used Ocaml for that .. ) and a few other things.

However, the type analysis was never done, and I lost
interest in the project, primarily because it would
never support continuation passing in the style of 
Stackless Python. (And the source is lost now).

A few technical problems also got in the way:

(1) I needed finalisers for __del__ methods.
Ocaml has Ocaml finalises today as a consequence.
(C finalisers existed, but I needed Ocaml ones).

Unfortunately, Python uses ref counting and finalisation
is therefore synchronous and well ordered for non-cyclic
references, whereas my implementation just used GC.
This meant certain programs didn't quite work as
they did in CPython.

(2) Ocaml native code compiler can't generate shared libraries,
so compiling Python to Ocaml will never emulate dynamic
loading. Perhaps you could do this with Ocaml bytecode,
but the interpreter needed to be fast.

This was a very serious obstacle, since every Python
C extension I wanted to emulate had to be hardwired
into the program (even though using it emulated
the usual import semantics).

(3) Python specs weren't written by someone knowing about
languages and standardisation. By far the worst problem
was exceptions. In many cases errors are specified
to throw exceptions... which means the behaviour is well
defined.

This alone make static type analysis almost impossible.
To do static typing you need a specification that
says the result of a wrong typing is undefined.
Otherwise the program can *rely* on catching an exception,
and a type error isn't an error (since the behaviour is
well defined). Trying to explain that to Pythonistas and
van Rossum in particular proved difficult.. although they
seemed to understand the problem which dynamic mutations
to modules caused and considered 'freezing' modules after
they'd been imported. This may have been done in later
versions of Python: some changes to 'eval' and friends
were made for his reason (you have to specify the dictionary
to use for an eval now).

> 1. analyse all the constraints of the code
> Ex:
> def f(a): a.append(1)
> def g(a): a=a+1; f(a)
> g('coucou')
> 
> => a must support append

No, you cannot assume that due to the 'exception' problem
mentioned above.

> => a must also be an int

And you can't assume that either. A could be a class
with an __add__ method. Alternatively, it could be anything,
and the called of g() is again relying on a type error
to throw an exception.

> 2. cross-validate the constraints consistency
> => inconsistent assertions

Python variables are all polymorphic. It isn't
inconsistent to have

	a is an int
	a is a string

This is possible, at different times .. as an example:

	for i in [1,"string"]: ..

i is first an int then a string.

> The 2. and 3. looks like the most difficult aspect of it.

Actually the difficulty is deciding how to modify
the semantics of Python so analysis is possible.
This will break some programs, so you need to choose
a reasonable set of constraints.

You would typically assume that modules were immutable
after loading, whilst class objects remained dynamic.
This means you can apply some static analysis
by actually executing the program up to the main
function call, then examining what got imported.
At least that was my theory .. in particular,
you can type module level variables, and you can
actually lookup functions defined in modules and
examine the code. By using global analysis, you can
then optimise some functions. For example,
you might determine that some function

	def f(a): return a+1

really is always called with an int argument,
and thus optimise it.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] type inference for python
  2005-02-01  5:26 type inference for python Philippe Fremy
  2005-02-01  6:56 ` [Caml-list] " Richard Cole
  2005-02-01  6:58 ` skaller
@ 2005-02-01  7:25 ` Nicolas Cannasse
  2 siblings, 0 replies; 7+ messages in thread
From: Nicolas Cannasse @ 2005-02-01  7:25 UTC (permalink / raw)
  To: Philippe Fremy, caml-list

> Hi,
>
> I would like to implement something similar to the type inference of
> ocaml for the python language. I have always found it very impressive
> (although I have only used caml light).
>
> I have no experience with the topic, it is just a project that seems
> cool to me :-)
>
> Do you have any hints or where I could build up my knowledge (code,
> books, article, ...) to do that kind of thing.
>
> I imagine that it works in a kind of three way pass:
>
> 1. analyse all the constraints of the code
> Ex:
> def f(a): a.append(1)
> def g(a): a=a+1; f(a)
> g('coucou')
>
> => a must support append
> => a must also be an int
>
> 2. cross-validate the constraints consistency
> => inconsistent assertions
>
> 3. validate the constraints against reality
> => g('coucou') will not work

In fact, the ML style type inference can be done in only one pass, at first
your variables are monomorphic then they're unified witht the first type
they match, creating linkages between types if needed. However in an OO
system such as Python, you need more than core ML type inference. You will
need some kind of structural subtyping, and subtyping is known to be tricky
when used together with polymorphism. There is several kind of subtyping
algorithms possible, using constraints is one of them (see work of François
Pottier on this).

I used another approach based on ocaml polymorphic variants typing scheme
(somehow reversed) to implement this feature in MTypes, but it still have a
lot of implementation holes difficult to handle. Also, the Python standard
library must be designed with dynamic typing in mind (several types possible
for arguments, variable number of arguments, operator overriding.....). It's
possible to rely on a dynamicly typed library but you'll have to constraint
the types a little more and add primitives so users can access different
kinds of features which are not usable anymore.

Good luck,
Nicolas Cannasse

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] type inference for python
  2005-02-01  6:58 ` skaller
@ 2005-02-01 10:54   ` Philippe Fremy
  2005-02-01 12:37     ` skaller
  0 siblings, 1 reply; 7+ messages in thread
From: Philippe Fremy @ 2005-02-01 10:54 UTC (permalink / raw)
  To: caml-list

	Hi,

> I do. I did considerable work on this. The project
> was called Vyper. 

Woa, Vyper looks like a killer project.

But unlike vyper, my goal is not to provide an alternative python 
implementation with type inference and other goodies.

I aim at something like pychecker. That is, a tool that you can run to 
check the types consistency of a python. It is perfectly acceptable for 
it to be limited and to report false warning for twisted constructs.

The goal is just to make python programming more friendly. There are 
many small mistakes that could be caught by such a tool and creep in 
python programs. I keep thinking about the interview of a mailman 
developer, who said that a big problem of python is that eventhough you 
have a program running for a long time, there might be some hidden place 
where you did type mistake that will break your whole program with an 
exception.

Regarding this, python is no better than C program segfaulting.

> (3) Python specs weren't written by someone knowing about
> languages and standardisation. By far the worst problem
> was exceptions.

It is tricky for sure.

>>1. analyse all the constraints of the code
>>Ex:
>>def f(a): a.append(1)
>>def g(a): a=a+1; f(a)
>>g('coucou')
>>
>>=> a must support append
> 
> 
> No, you cannot assume that due to the 'exception' problem
> mentioned above.

Indeed, the append could be a problem if it is meant to be caught by an 
AttributeError exception. This is why I never liked exceptions. When 
deeply hidden, they break the execution flow too much and are just as 
dangerous as pointers.

But since I am taking the "helper tool" approach, it is ok to make 
assumptions that will trigger real warning in 90% of the cases and false 
warning when you trick python too much. You can then disable the warning 
in your code, like pychecker does.

I am ready to make a few assumptions as a starter:
- operations on int/float will return an int or float
- operation + involves two objects of the same type and return an object 
of the same type
- AttributesError are not supposed to be raised

and add whatever other simplification is necessary in order to get a 
limited but useful tool.

> Alternatively, it could be anything,
> and the called of g() is again relying on a type error
> to throw an exception.

Did you find a lot of code relying on this idiom ? That's clearly a 
stopper, but I am ok for ignoring it. It would be probably ok to detect 
this if the try/catch is in the same block. If it is not in the same 
scope, I consider this bad programming practice anyway so I do not mind 
throwing a warning.

> 
>>2. cross-validate the constraints consistency
>>=> inconsistent assertions
> 
> 
> Python variables are all polymorphic. It isn't
> inconsistent to have
> 
> 	a is an int
> 	a is a string
> 
> This is possible, at different times .. as an example:
> 
> 	for i in [1,"string"]: ..
> 
> i is first an int then a string.

Sure. I used to program a bit like that:
def f(a):
	if not type(a) is type([]): a = [a]

That means that the type of a variable must be tied to a portion of the 
code or would have to be one of a possible choice.

Iterating over lists with variable types is sure tricky. But again, 
that's not a common programming pattern (or is it my imagination).

You had the constraint of supporting every language feature. I can 
afford to support only 50% of them. That would already be a big win over 
the current situation.

I understand that python typing is so loose that it creates many problems.

I am puzzled because the industry uses poor languages to do important 
things. Our life depends so much on computers those days that it is a 
pity to uses such insecure languages. There have been a lot of good 
languages produced but few of them really caught on.

Whay I like in python is that it has some characteristics that make it 
an easy to pick language, and it can replace C++, VB and Java in many 
areas. But this type fully dynamic types is one of its strength but also 
the number one error made by beginners.

	regards,

	Philippe

-- 
Philippe Fremy
InSeal Technical Director
tel: +33 6 07 98 76 44

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] type inference for python
  2005-02-01 10:54   ` Philippe Fremy
@ 2005-02-01 12:37     ` skaller
  2005-02-01 23:06       ` Philippe Fremy
  0 siblings, 1 reply; 7+ messages in thread
From: skaller @ 2005-02-01 12:37 UTC (permalink / raw)
  To: Philippe Fremy; +Cc: caml-list

On Tue, 2005-02-01 at 21:54, Philippe Fremy wrote:

> But unlike vyper, my goal is not to provide an alternative python 
> implementation with type inference and other goodies.

That wasn't mine either.

> I aim at something like pychecker. That is, a tool that you can run to 
> check the types consistency of a python. 

Perhaps you missed the point of my description. In order to
check the types of something, where there are no type annotations,
you need to use the known types as a starting point: namely
the literals.

Often in Python you will see:

	def f(a):
		return M.x + a

where you don't know the type of a. But what about M.x?

Perhaps you can look up x in M. But attributes are
not declared in Python, let alone their types. What you need
to do is actually find the value of M.x to see what type
it is.

IN order to do this, you observe there is a statement:

	import M

in the program. There is a simple way to find the type of M.x,
and that is to actually *execute* the import statement.

In other words, you have no choice but to implement an
interpreter -- or use the existing one.

More precisely, you could look at the code in M, do flow
analysis, and try to guess at the type of x from that.

But that is much more complicated and difficult. Why bother
to do that when you can just import M and then examine
the module dictionary??

So you see my purpose was not to implement a python interpreter,
I did that because it is *necessary*. There is no better way
to find the types of attributes in modules.

The reason this works is based on the fact that *usually*
in Python, a program does three things:

(a) import some modules
(b) do some initialisations in the mainline
(c) call the main function

It is a bit tricky to distinguish (b) and (c).
The point however, is that (a) is usually expected
to just create a set of values and functions
which the mainline then uses. So if you want to type check
the mainline code, you need to establish the types of the
values and functions in the imported modules as a starting
point... and as mentioned you do that by actually importing
them. This is basically the only possibility: it is reliable
because you don't have to guess about how to do it, the
semantics are defined in the language reference.

In addition, for many modules the typing is known
because it is documented in the Library Reference.

After you have done this, then you must analyse the mainline.

You CANNOT execute it, in the same fashion, to find the types,
because you don't have the input data, it could be a server,
etc etc. But most of the modules, you can expect them
to execute and terminate quickly, since the execution
serves mainly to just to store things in the module dictionary.

The difficulty you will face with 'pycheck' or whatever is that
you really do have to establish with some certainty what
some of the types are, so that a violation can be noted
and warned about.

Otherwise, if whenever you're in doubt you just say
'polymorphic' then clearly every doubtful use will just
lead to saying 'polymorphic' and you won't be able to
warn about *any* misuses .. :)

> It is perfectly acceptable for 
> it to be limited and to report false warning for twisted constructs.

The question is whether you can provide a reasonable signal to noise
ratio. To do that you really have to be right about the intended
typing most of the type.

>  I keep thinking about the interview of a mailman 
> developer, who said that a big problem of python is that eventhough you 
> have a program running for a long time, there might be some hidden place 
> where you did type mistake that will break your whole program with an 
> exception.
> 
> Regarding this, python is no better than C program segfaulting.

C is probably more reliable because it is statically
typed, however there is a loss of reliability in C because the type
system isn't very expressive, and the workaround -- casting --
is used far more often than one would like. Python, on the
other hand, is easier to test (no long compile times),
and dynamic type checks catch errors earlier than in C.

Either way, what you would like to do is a laudible goal,
if you can get it to work .. but i suspect you will find
it much harder than you might think -- I certainly did :)

> But since I am taking the "helper tool" approach, it is ok to make 
> assumptions that will trigger real warning in 90% of the cases and false 
> warning when you trick python too much. 

Sure, but what do you mean 'trick Python'? Writing code
based on the specified semantics -- including exception handling --
is not a trick.

I myself write lots of Python even today (all my build scripts
and utilities are Python). I do things often that you might
think are tricks:

try:
	x = f()
except:
	x = default

I do that one heaps, my code is littered with it,
almost the entire Felix configuration script works
that way.

> I am ready to make a few assumptions as a starter:
> - operations on int/float will return an int or float
> - operation + involves two objects of the same type and return an object 
> of the same type
> - AttributesError are not supposed to be raised

My literate programming tool interscript relies on that heavily.
It is in fact the central technique I use to build polymorphic
weavers (a weaver is a driver for a typesetter). you are
allowed to call the moral equivalent of:

	x.start_emphasise()
	x.output("jhgjhgjhg")
	x.end_emphasise()

and an error finding the start/end emphasise method is just
ignored. If they're found and fail that's an error.

It isn't a bug to leave out the emphasise methods 
for the text weaver -- plain text doesn't support it.
Latex and HTML do. This allows you to encode pretty
features (like emphasis) whether or not the typesetter
supports it -- that is, without knowing which typesetter
the client will want to use for your document.

You also might have some trouble finding this kind of thing
out -- since the weavers are all plugins. They're loaded
dynamically on demand.

Is this trickery? Well, if it is, if this is bad programming,
then why bother using Python? One uses it *because* it
is a dynamic language.

> Did you find a lot of code relying on this idiom ? 

No, most code doesn't do anything fancy. Most Python code
is 'well typed' and reasonbly static. By well typed I mean
a variable is either a single type, or clearly polymorphic.

Unfortunately, 'most code' is easily type checked by a
single test case. The real advantage of a static checker
is weeding out corner cases -- programmers don't get
the ordinary cases wrong that often, and when they
do the first test run usually finds the bug.

> Iterating over lists with variable types is sure tricky. But again, 
> that's not a common programming pattern (or is it my imagination).

The common case is (im Ocaml notation):

	'a option

That is, some value of a fixed type,
or None. That one you have to handle.

Luckily, there's lots of theory for this. None is a bottom
or ground or Error or whatever, so for each plain type

	int, string, ... 

you need to handle

	Some int | None, Some string | None

which are called 'pointed types'.

> You had the constraint of supporting every language feature. I can 
> afford to support only 50% of them. That would already be a big win over 
> the current situation.

Not really. The problem is that doubt propagates exponentially.
So you have to handle quite a bit :)

You probably need to distinguish

(a) monomorphic unpointed type (int, string etc)
(b) monomorphic pointed (allows None too)
(c) dynamic (any type is allowed)
(d) don't know

Don't confuse (c) and (d). Dynamic means you DO know
that the type varies (no further deduction required).
Don't know means you're open to further constraints
as you see more code.

> I am puzzled because the industry uses poor languages to do important 
> things. Our life depends so much on computers those days that it is a 
> pity to uses such insecure languages. There have been a lot of good 
> languages produced but few of them really caught on.

You will get much agreement with that sentiment here :)

> Whay I like in python is that it has some characteristics that make it 
> an easy to pick language, and it can replace C++, VB and Java in many 
> areas. But this type fully dynamic types is one of its strength but also 
> the number one error made by beginners.

I suspect dynamic typing is mostly a workaround
for lack of polymorphism and the lack of sum types (variants).
It does have the advantages of avoiding the difficulties
of find a sound, simple, and expressive type system,
which continues to be a bleeding edge research topic :)

If you're serious, and you think programmers will take
your tool seriously, you might consider establishing
a protocol for type annotations -- in comments for
example, perhaps like:

	def f(a): #a:int
	  x = y + z #z:int

Using such annotations, your tool could be helped
quite a bit. In particular, when you can't determine a
type, the diagnostic is a hint to add an annotation.
Software houses might then require enough annotations
as part of their coding standards to shut your tool up :)

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] type inference for python
  2005-02-01 12:37     ` skaller
@ 2005-02-01 23:06       ` Philippe Fremy
  0 siblings, 0 replies; 7+ messages in thread
From: Philippe Fremy @ 2005-02-01 23:06 UTC (permalink / raw)
  Cc: caml-list

	Hi Skaller,

Thank you for your very detailed explanation. You have obviously 
dedicated a lot of thought to the problem. I know understand that it is 
indeed necessary to almost rerun the interpreter to check the code. 
Actually, that's what pychecker does.

But even running the interprter would probably not be enough. I knew 
when asking that the problem was probably beyound my reach. Now, I know 
why :-).

> Sure, but what do you mean 'trick Python'? Writing code
> based on the specified semantics -- including exception handling --
> is not a trick.

Actually, I realise that there are many usage patterns for python. Some 
people like you use it for its highly dynamic nature. I use it mainly as 
an easy Java/C++ replacement and it would be ok for me to drop many of 
the dynamic aspect, in order to get more type and execution safety.

> Unfortunately, 'most code' is easily type checked by a
> single test case. 

Well, it is so easy to make a small mistake with big consequences. An 
intern just wrote some code for me where he was supposed to return a 
list. In one case, he would return a None instead of an empty list, and 
broke the program the whole concept (always review the code written by 
intenrs :-).

There should be a way to avoid that kind of silly mistakes.

> you might consider establishing a protocol for type annotations 

This is already being discussed in python-dev by more advanced 
programmer than I am. And raises a lot of controversy on how to do it 
and whether it will be useful.

Anyway, discussing the topic gave me a higher view of the problem. 
Python is currently a single solution (CPython) that lends itself to 
many programming paradigm, but in an unsatisfactory manners for some of 
them.

I would like to get rid of the dynamically nature of the language 
sometimes, which some people consider equivalent to killing it. That's 
how I would react if somebody suggested to "remove all the OO crap of 
python" and use it like a good pascal replacement. Still, it would be a 
valid use.

The good news is that a solution is coming for this problem: pypy. This 
generated implementation of python should allows to build python 
interpreter with different feature set. They also have a nice program 
flow chart where they can infer types and optimise code to a wild level.

So, I just have to find myself another fun project :-). Thank you for 
your answers.

     regards,

     Philippe

-- 
Philippe Fremy
InSeal Technical Director
tel: +33 6 07 98 76 44

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-02-01 23:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-01  5:26 type inference for python Philippe Fremy
2005-02-01  6:56 ` [Caml-list] " Richard Cole
2005-02-01  6:58 ` skaller
2005-02-01 10:54   ` Philippe Fremy
2005-02-01 12:37     ` skaller
2005-02-01 23:06       ` Philippe Fremy
2005-02-01  7:25 ` Nicolas Cannasse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).