(This an adaptation of a talk I gave at PiterPy. Slides and video are available in russian)
Python gives us deep introspecting and metaprogramming capabilities including bytecode and AST support in the standard library. This facilitates unobvious ways to write code that writes code or otherwise alters its behavior. I will show several different examples of how it could be used for fun and profit.
Warming up
Look at this code:
1 2 |
|
It maps a list through _ + 1
and gets a new list with each element incremented.
This makes sense mnemonically, but can this code actually work in python?
And sure it can. In fact this is rather simple trick with operator overloading:
1 2 3 4 5 |
|
Here we create a class that produces lambda upon addition and then creates its instance named _
.
Just using common interface in an unusual way. We can even go all loose and make something like this to work:
1 2 3 4 5 |
|
Returning lambda, obviously, won’t work anymore. However, we can return a callable object, which also returns callables as operations results. We are also intercepting attribute access here, shouldn’t surprise you after everything you’ve seen.
Honestly, I was too shy to use this in production. And you usually need only one or two lambdas in a file anyway, too few to bring an import, a dependency and all the complications. But suppose you are writing tests for functional support library, then you’ll need to create lots of small functions to pass them to your ones. A library like this would be useful in that setting.
Anyway, if you still think this idea is too weird then you should know that there are at least 3 libraries implementing it:
But let’s move to something more interesting.
Pattern Matching
Now look at this and think if it is also feasible in python. This is supposed to recursively calculate a product of all the list elements:
1 2 3 |
|
If you familiar with pattern matching then the code makes sense:
- product is 1 for empty list,
- product is (first element) times (product of the rest of the list) otherwise.
On the other hand, python doesn’t work this way. x
and xs
are never defined and product()
has no arguments. And yes, this really doesn’t work. This, however, does:
1 2 3 4 5 6 |
|
@patterns
here rewrites function into something we meant. It can’t obviously be implemented as ordinary decorator, since calling original function will crash. So @patterns
reads function code instead, parses it into abstract syntax tree, transforms that to a meaningful one and then compiles back:
Code → AST → New AST → New Code
And after transformation we get something like:
1 2 3 4 5 6 |
|
Which looks like normal python code, it is also much more verbose. That’s why we got into this whole pattern matching thing in the first place.
And again it was too awkward to use all this magic in production code, but I used code inspired by that:
1 2 3 4 5 6 7 8 9 |
|
This actually could be useful for many things: recursive functions, chatbots, binary protocol implementations and other occasions when you resolve to long if-elses or switch-cases. The general idea, however, comes beyond pattern matching.
AST Applications
This technique — abstract syntax tree transformation — is much more broadly useful. And the main reason is that trees capture language structure. Another trees virtue is that they are data structures, which are much more hackable than code strings.
Anyway, these are some things they facilitate:
Language extensions. With pattern matching being only one of them. Other examples are optional (or not) static typing, macros, … It is only limited by your imagination and tree transformation skills. You obviously need to stay within syntax though, unless you go for separate parser and AST, but this is a separate topic.
Optimizations. This is less obvious but we can inline calls, precalculate constant expressions, even make tail call optimization.
Code analysis. It’s much easier to analyze tree than just code string ‘cause it captures some of semantics of a language. We can implement some linters or editor plugins with the help of AST. Generally you won’t need to transform a tree for analysis, but imagine some linter suggesting code changes specific to your particular fragment not general advice. That would be cool.
Translation. To JavaScript to run things in browser, to SQL to automatically generate queries or stored procedures, to Python 3 after all.
These are various ways it flows:
Code → AST → New AST → New Code
Code → AST → JavaScript
Bytecode → AST → SQL
Looks like we need to dig a bit more into AST.
AST
First we need a way to get a tree. The easiest way is to first get source and then parse it. Two standard modules will help us with that:
1 2 3 4 |
|
Sometimes you can’t get function source, most common example — inspect.getsource()
doesn’t work with lambdas. Then you can still get to AST by decompiling bytecode:
1 2 3 |
|
We are using third party module meta here, and despite saying that we decompile function it really does inspect and decompile its bytecode. Anyway, we’ve got our AST, let’s look at it:
This is quite a hairy tree for something as simple as lambda x: x + 1
. But if we look at it a bit
more closer we can recognize all the corresponding parts. Left branch is arguments, which have single argument named x
, but no kwarg, vararg or defaults. Right branch is a return node returning a result of binary operation with an op of addition and operands of a name x
and a number 1
.
That makes sense, this is not however what we see as a result in python REPL:
1 2 |
|
Yeah, it’s just a Lambda
object. We can look up its attributes to get other nodes:
1 2 |
|
And this is what python AST actually is: a collection of objects stored in the attributes of each other. In addition I need to say that this objects don’t really have behaviour, they are just data. You can find full list of available nodes and their attributes in ast module documentation.
Inspecting AST this way could be too tiresome, we need some way to get a bigger picture.
The simplest thing we have is ast.dump()
utility:
1 2 3 4 |
|
This is the same thing we saw in the picture, it is also useless for any non-trivial case. One more third-party library, astor, could help us out:
1 2 3 4 |
|
The same plus indents. It is already useful when you need to look at the AST closer, but for quick print debugging python code would be even better:
1 2 |
|
Parentheses are added everywhere to avoid dealing with operator precedence, but otherwise it’s the same code we parsed.
And the last thing we usually do to AST — compile it back and execute:
1 2 3 |
|
Functions used here, compile()
and exec()
, are built-ins. And code
is the same object type you can find introspecting func.__code__
. By executing in a context we set all the names defined into that dict:
1 2 |
|
So if all these were happening in a decorator we can get that function out of the context and substitute original one. Only the meat is left — walking the tree.
AST Traversal
Built-in ast module continues to aid us in this journey. It provides several utilities to ease tree walking and transformation. The lower level ones let us introspect nodes contents and iterate by them:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
First of higher level ones is NodeVisitor
— a base class implementing visitor pattern, the idea is to subclass it and define methods like visit_<NodeClass>()
:
1 2 3 4 5 6 |
|
This simple example finds all the number literals in given code. Just an example, but it could be grown into some magic number finding tool and wrapped into linter or code editor plugin.
The second base class ast provides is NodeTransformer
. It works almost the same, but you can change a tree by returning new nodes from visit methods:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
This one precalculates any expressions on number literals, quite neat for just a bunch of lines.
And last but not least we can use old and good recursive tree walking. This snippet makes python function return last expression in a way Ruby, CoffeScript and Clojure do:
1 2 3 4 5 6 7 |
|
Note this code is simplified and hence broken, e.g. it will insert return
into the end of while loop if there is an expression there.
Going Forward
This was an overview of what you can do with AST, and there were lots of things. This is, however, only a preface to the actual story. Next time I will concentrate on translation and will describe it in detail capturing a real use case.
P.S. Second part is available here.