PPyA: Python Assembler
Posted Fri, Apr 18, 2008 in:Over the last few of days I’ve hacked together a Python Assembler/Disassembler. I’ve called it PPya (pronounced like “papaya,” the fruit) Paul’s Python assembler. The ‘a’ is left lowercase because it looks better that way.
Each of those days I started to write up this blog post but then got distracted working on it some more
It’s at the point now where it is fairly usable, both as a learning tool and as a tool for writing Python modules in assembly if you feel so inclined.
If you want to check it out, the gitweb project page is here: https://github.com/pib/papaya
or you can git clone it:
git clone https://github.com/pib/papaya
PPya Overview
A .pya file consists of a series of bytecodes (well, strings representing them, anyway) followed by parameters for those instructions which take parameters. When assembled, these parameters are converted to indices into a tuple in a python Code object, one of co_names, co_consts, co_varnames, co_cellvars, or co_freevars.
See the Python bytecode documentation for a list of all the bytecodes available. As I pointed out in my previous blog post, and a Python issue report, the IMPORT_NAME opcode actually takes two parameters off of the stack.
In addition to the actual bytecodes, there are a few meta-bytecodes which I’ve added. When calling LOAD_CONST, you can use the word “function” followed by a comma-separated parameter list to indicate that this opcode should reference a code object. The following lines up until a line consisting of the word “end” will be compiled and put into the co_consts tuple as a nested code object.
A line which ends with a colon is considered to be a label for jumping. A JUMP_* bytecode with that same label (minus the colon) will be converted to the address or offset to the instruction immediately after the corresponding label.
The final meta-opcode is “stack”. The stack opcode takes an integer as an argument and indicates the required stack size for the generated code object. If you LOAD_CONST 4 items, then your stack size needs to be at least 4. In future versions I plan on generating this based on an analysis of the bytecodes, but for now it needs to be set manually in each module and function. If you don’t set this number high enough the Python interpreter will segfault!
I haven’t added support for *args and *kwargs yet, but it’s at the top of my todo list for the project.
Also, the CALL_FUNCTION instructions use the two bytes of their parameter to indicate the number of positional and the number of keyword arguments, respectively, but the current version just treats the whole thing like a single integer. Separating these parameters for FUNCTION_CALL instructions is second on the todo list.
I don’t store the name of functions in their respective code objects yet, since the name is only implicitly given in the STORE_* instruction that comes after the function is defined. I’ll probably extend the function syntax to allow the giving of a name to a function, and default it to “
Lastly, I don’t set the flag indicating if a function is a generator function at the moment.
Usage Examples
A simple example
Here’s an example of the sort of thing you can do with this tool:
Imagine you have a simple python module johnny5.py in the current directory:
print 'No Disassemble!'
Compile that code (run the interpreter and import it, or use python -m compileall .), then pass it in as the only parameter to assembler.py, which will create johnny5.pyd:
stack 1
LOAD_CONST 'No Disassemble!'
PRINT_ITEM
PRINT_NEWLINE
LOAD_CONST None
RETURN_VALUE
If you then change that second line:
LOAD_CONST 'Yes Disassemble! Number 5 ready for orders.'
and rename the file to johnny5.pya (it’s saved as .pyd to avoid overwriting an existing .pya file), you can now run python johnny5.pyc and get this:
Yes Disassemble! Number 5 ready for orders.
Of course, you could have just as easily changed the .py file, but this isn’t the only thing you can do with this tool.
Writing a .pya file directly
You don’t have to generate your .pya from an existing .py file, of course. You can also just write it from scratch (or possibly generate it from another program).
For example, a function to calculate the fibonacci number at a certain index:
# Example: fibonacci generator.
stack 1
LOAD_CONST function n
# if n is 0, return 0
stack 4
LOAD_FAST n
LOAD_CONST 0
COMPARE_OP ==
JUMP_IF_FALSE not_zero
POP_TOP
LOAD_CONST 0
RETURN_VALUE
# else if n is 1, return 1
not_zero:
POP_TOP
LOAD_FAST n
LOAD_CONST 1
COMPARE_OP ==
JUMP_IF_FALSE not_one
POP_TOP
LOAD_CONST 1
RETURN_VALUE
# else n is fib(n-1) + fib(n-2)
not_one:
POP_TOP
LOAD_GLOBAL fib
LOAD_FAST n
LOAD_CONST 1
BINARY_SUBTRACT
CALL_FUNCTION 1
LOAD_GLOBAL fib
LOAD_FAST n
LOAD_CONST 2
BINARY_SUBTRACT
CALL_FUNCTION 1
BINARY_ADD
RETURN_VALUE
end
MAKE_FUNCTION 0
STORE_NAME fib
LOAD_CONST None
RETURN_VALUE
This is almost identical to the code which would be generated by the equivilant Python code, except the Python compiler seems to throw in a couple of extra jumps after the first two RETURN_VALUEs.
An interesting thing you can do with Python bytecode is use the stack to avoid having to store values in variables. For example, calculating the factorial (in a kind of silly manner) of a number without storing to and reading from variables (except the parameter, of course):
stack 1
LOAD_CONST function n
# calculate factorial, using only the stack
stack 4
LOAD_CONST 1
LOAD_FAST n
SETUP_LOOP ret
while:
DUP_TOP
LOAD_CONST 1
COMPARE_OP >
JUMP_IF_FALSE done
POP_TOP
DUP_TOP
ROT_THREE
INPLACE_MULTIPLY
ROT_TWO
LOAD_CONST 1
INPLACE_SUBTRACT
JUMP_ABSOLUTE while
done:
POP_TOP
POP_TOP
POP_BLOCK
ret:
RETURN_VALUE
end
MAKE_FUNCTION 0
STORE_NAME fact
LOAD_CONST None
RETURN_VALUE
If you wrote a similar function in Python:
def fact(n):
product = 1
while n > 1:
product *= n
n -= 1
return product
You’d get the following bytecode:
stack 1
LOAD_CONST function n
stack 2
LOAD_CONST 1
STORE_FAST product
SETUP_LOOP label2
label0:
LOAD_FAST n
LOAD_CONST 1
COMPARE_OP >
JUMP_IF_FALSE label1
POP_TOP
LOAD_FAST product
LOAD_FAST n
INPLACE_MULTIPLY
STORE_FAST product
LOAD_FAST n
LOAD_CONST 1
INPLACE_SUBTRACT
STORE_FAST n
JUMP_ABSOLUTE label0
label1:
POP_TOP
POP_BLOCK
label2:
LOAD_FAST product
RETURN_VALUE
end
MAKE_FUNCTION 0
STORE_NAME fact
LOAD_CONST None
RETURN_VALUE
Running each of these with the following code:
times = 100000
from timeit import Timer
t = Timer("f.fact(100)", "import f")
print t.timeit(times)
t = Timer("fact.fact(100)", "import fact")
print t.timeit(times)
Gives the following output:
8.19782996178
7.5446908474
Successive runs had similar results, showing that the stack-only version is somewhat faster. It’s possible that for less trivial functions this might offer even more of a benefit.