From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 51750BC88 for ; Tue, 1 Feb 2005 13:37:37 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j11Cbald019725 for ; Tue, 1 Feb 2005 13:37:36 +0100 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id NAA00671 for ; Tue, 1 Feb 2005 13:37:36 +0100 (MET) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j11CbWeR018675 for ; Tue, 1 Feb 2005 13:37:35 +0100 Received: from [192.168.1.200] (ppp212-197.lns2.syd3.internode.on.net [203.122.212.197]) by smtp3.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id j11CbQMe069134; Tue, 1 Feb 2005 23:07:28 +1030 (CST) Subject: Re: [Caml-list] type inference for python From: skaller Reply-To: skaller@users.sourceforge.net To: Philippe Fremy Cc: caml-list In-Reply-To: <41FF5FD6.5050008@inseal.com> References: <41FF1305.30308@inseal.com> <1107241087.23973.242.camel@pelican.wigram> <41FF5FD6.5050008@inseal.com> Content-Type: text/plain Message-Id: <1107261444.23973.324.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 01 Feb 2005 23:37:26 +1100 Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 41FF7810.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 41FF780C.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 inference:01 sourceforge:01 wrote:01 inference:01 annotations:01 literals:01 def:01 semantics:01 terminate:01 statically:01 semantics:01 literate:01 bug:01 checker:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: On Tue, 2005-02-01 at 21:54, Philippe Fremy wrote: > But unlike vyper, my goal is not to provide an alternative python > implementation with type inference and other goodies. That wasn't mine either. > I aim at something like pychecker. That is, a tool that you can run to > check the types consistency of a python. Perhaps you missed the point of my description. In order to check the types of something, where there are no type annotations, you need to use the known types as a starting point: namely the literals. Often in Python you will see: def f(a): return M.x + a where you don't know the type of a. But what about M.x? Perhaps you can look up x in M. But attributes are not declared in Python, let alone their types. What you need to do is actually find the value of M.x to see what type it is. IN order to do this, you observe there is a statement: import M in the program. There is a simple way to find the type of M.x, and that is to actually *execute* the import statement. In other words, you have no choice but to implement an interpreter -- or use the existing one. More precisely, you could look at the code in M, do flow analysis, and try to guess at the type of x from that. But that is much more complicated and difficult. Why bother to do that when you can just import M and then examine the module dictionary?? So you see my purpose was not to implement a python interpreter, I did that because it is *necessary*. There is no better way to find the types of attributes in modules. The reason this works is based on the fact that *usually* in Python, a program does three things: (a) import some modules (b) do some initialisations in the mainline (c) call the main function It is a bit tricky to distinguish (b) and (c). The point however, is that (a) is usually expected to just create a set of values and functions which the mainline then uses. So if you want to type check the mainline code, you need to establish the types of the values and functions in the imported modules as a starting point... and as mentioned you do that by actually importing them. This is basically the only possibility: it is reliable because you don't have to guess about how to do it, the semantics are defined in the language reference. In addition, for many modules the typing is known because it is documented in the Library Reference. After you have done this, then you must analyse the mainline. You CANNOT execute it, in the same fashion, to find the types, because you don't have the input data, it could be a server, etc etc. But most of the modules, you can expect them to execute and terminate quickly, since the execution serves mainly to just to store things in the module dictionary. The difficulty you will face with 'pycheck' or whatever is that you really do have to establish with some certainty what some of the types are, so that a violation can be noted and warned about. Otherwise, if whenever you're in doubt you just say 'polymorphic' then clearly every doubtful use will just lead to saying 'polymorphic' and you won't be able to warn about *any* misuses .. :) > It is perfectly acceptable for > it to be limited and to report false warning for twisted constructs. The question is whether you can provide a reasonable signal to noise ratio. To do that you really have to be right about the intended typing most of the type. > I keep thinking about the interview of a mailman > developer, who said that a big problem of python is that eventhough you > have a program running for a long time, there might be some hidden place > where you did type mistake that will break your whole program with an > exception. > > Regarding this, python is no better than C program segfaulting. C is probably more reliable because it is statically typed, however there is a loss of reliability in C because the type system isn't very expressive, and the workaround -- casting -- is used far more often than one would like. Python, on the other hand, is easier to test (no long compile times), and dynamic type checks catch errors earlier than in C. Either way, what you would like to do is a laudible goal, if you can get it to work .. but i suspect you will find it much harder than you might think -- I certainly did :) > But since I am taking the "helper tool" approach, it is ok to make > assumptions that will trigger real warning in 90% of the cases and false > warning when you trick python too much. Sure, but what do you mean 'trick Python'? Writing code based on the specified semantics -- including exception handling -- is not a trick. I myself write lots of Python even today (all my build scripts and utilities are Python). I do things often that you might think are tricks: try: x = f() except: x = default I do that one heaps, my code is littered with it, almost the entire Felix configuration script works that way. > I am ready to make a few assumptions as a starter: > - operations on int/float will return an int or float > - operation + involves two objects of the same type and return an object > of the same type > - AttributesError are not supposed to be raised My literate programming tool interscript relies on that heavily. It is in fact the central technique I use to build polymorphic weavers (a weaver is a driver for a typesetter). you are allowed to call the moral equivalent of: x.start_emphasise() x.output("jhgjhgjhg") x.end_emphasise() and an error finding the start/end emphasise method is just ignored. If they're found and fail that's an error. It isn't a bug to leave out the emphasise methods for the text weaver -- plain text doesn't support it. Latex and HTML do. This allows you to encode pretty features (like emphasis) whether or not the typesetter supports it -- that is, without knowing which typesetter the client will want to use for your document. You also might have some trouble finding this kind of thing out -- since the weavers are all plugins. They're loaded dynamically on demand. Is this trickery? Well, if it is, if this is bad programming, then why bother using Python? One uses it *because* it is a dynamic language. > Did you find a lot of code relying on this idiom ? No, most code doesn't do anything fancy. Most Python code is 'well typed' and reasonbly static. By well typed I mean a variable is either a single type, or clearly polymorphic. Unfortunately, 'most code' is easily type checked by a single test case. The real advantage of a static checker is weeding out corner cases -- programmers don't get the ordinary cases wrong that often, and when they do the first test run usually finds the bug. > Iterating over lists with variable types is sure tricky. But again, > that's not a common programming pattern (or is it my imagination). The common case is (im Ocaml notation): 'a option That is, some value of a fixed type, or None. That one you have to handle. Luckily, there's lots of theory for this. None is a bottom or ground or Error or whatever, so for each plain type int, string, ... you need to handle Some int | None, Some string | None which are called 'pointed types'. > You had the constraint of supporting every language feature. I can > afford to support only 50% of them. That would already be a big win over > the current situation. Not really. The problem is that doubt propagates exponentially. So you have to handle quite a bit :) You probably need to distinguish (a) monomorphic unpointed type (int, string etc) (b) monomorphic pointed (allows None too) (c) dynamic (any type is allowed) (d) don't know Don't confuse (c) and (d). Dynamic means you DO know that the type varies (no further deduction required). Don't know means you're open to further constraints as you see more code. > I am puzzled because the industry uses poor languages to do important > things. Our life depends so much on computers those days that it is a > pity to uses such insecure languages. There have been a lot of good > languages produced but few of them really caught on. You will get much agreement with that sentiment here :) > Whay I like in python is that it has some characteristics that make it > an easy to pick language, and it can replace C++, VB and Java in many > areas. But this type fully dynamic types is one of its strength but also > the number one error made by beginners. I suspect dynamic typing is mostly a workaround for lack of polymorphism and the lack of sum types (variants). It does have the advantages of avoiding the difficulties of find a sound, simple, and expressive type system, which continues to be a bleeding edge research topic :) If you're serious, and you think programmers will take your tool seriously, you might consider establishing a protocol for type annotations -- in comments for example, perhaps like: def f(a): #a:int x = y + z #z:int Using such annotations, your tool could be helped quite a bit. In particular, when you can't determine a type, the diagnostic is a hint to add an annotation. Software houses might then require enough annotations as part of their coding standards to shut your tool up :) -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net