PPyA: Python AssemblerPosted Fri, Apr 18, 2008 in:
Over the last few of days I’ve hacked together a Python Assembler/Disassembler. I’ve called it PPya (pronounced like “papaya,” the fruit) Paul’s Python assembler. The ‘a’ is left lowercase because it looks better that way.
Each of those days I started to write up this blog post but then got distracted working on it some more
It’s at the point now where it is fairly usable, both as a learning tool and as a tool for writing Python modules in assembly if you feel so inclined.
If you want to check it out, the gitweb project page is here: https://github.com/pib/papaya
or you can git clone it:
git clone https://github.com/pib/papaya
A .pya file consists of a series of bytecodes (well, strings representing them, anyway) followed by parameters for those instructions which take parameters. When assembled, these parameters are converted to indices into a tuple in a python Code object, one of co_names, co_consts, co_varnames, co_cellvars, or co_freevars.
See the Python bytecode documentation for a list of all the bytecodes available. As I pointed out in my previous blog post, and a Python issue report, the IMPORT_NAME opcode actually takes two parameters off of the stack.
In addition to the actual bytecodes, there are a few meta-bytecodes which I’ve added. When calling LOAD_CONST, you can use the word “function” followed by a comma-separated parameter list to indicate that this opcode should reference a code object. The following lines up until a line consisting of the word “end” will be compiled and put into the co_consts tuple as a nested code object.
A line which ends with a colon is considered to be a label for jumping. A JUMP_* bytecode with that same label (minus the colon) will be converted to the address or offset to the instruction immediately after the corresponding label.
The final meta-opcode is “stack”. The stack opcode takes an integer as an argument and indicates the required stack size for the generated code object. If you LOAD_CONST 4 items, then your stack size needs to be at least 4. In future versions I plan on generating this based on an analysis of the bytecodes, but for now it needs to be set manually in each module and function. If you don’t set this number high enough the Python interpreter will segfault!
I haven’t added support for *args and *kwargs yet, but it’s at the top of my todo list for the project.
Also, the CALL_FUNCTION instructions use the two bytes of their parameter to indicate the number of positional and the number of keyword arguments, respectively, but the current version just treats the whole thing like a single integer. Separating these parameters for FUNCTION_CALL instructions is second on the todo list.
I don’t store the name of functions in their respective code objects yet, since the name is only implicitly given in the STORE_* instruction that comes after the function is defined. I’ll probably extend the function syntax to allow the giving of a name to a function, and default it to “
Lastly, I don’t set the flag indicating if a function is a generator function at the moment.
A simple example
Here’s an example of the sort of thing you can do with this tool:
Imagine you have a simple python module johnny5.py in the current directory:
print 'No Disassemble!'
Compile that code (run the interpreter and import it, or use python -m compileall .), then pass it in as the only parameter to assembler.py, which will create johnny5.pyd:
stack 1 LOAD_CONST 'No Disassemble!' PRINT_ITEM PRINT_NEWLINE LOAD_CONST None RETURN_VALUE
If you then change that second line:
LOAD_CONST 'Yes Disassemble! Number 5 ready for orders.'
and rename the file to johnny5.pya (it’s saved as .pyd to avoid overwriting an existing .pya file), you can now run python johnny5.pyc and get this:
Yes Disassemble! Number 5 ready for orders.
Of course, you could have just as easily changed the .py file, but this isn’t the only thing you can do with this tool.
Writing a .pya file directly
You don’t have to generate your .pya from an existing .py file, of course. You can also just write it from scratch (or possibly generate it from another program).
For example, a function to calculate the fibonacci number at a certain index:
# Example: fibonacci generator. stack 1 LOAD_CONST function n # if n is 0, return 0 stack 4 LOAD_FAST n LOAD_CONST 0 COMPARE_OP == JUMP_IF_FALSE not_zero POP_TOP LOAD_CONST 0 RETURN_VALUE # else if n is 1, return 1 not_zero: POP_TOP LOAD_FAST n LOAD_CONST 1 COMPARE_OP == JUMP_IF_FALSE not_one POP_TOP LOAD_CONST 1 RETURN_VALUE # else n is fib(n-1) + fib(n-2) not_one: POP_TOP LOAD_GLOBAL fib LOAD_FAST n LOAD_CONST 1 BINARY_SUBTRACT CALL_FUNCTION 1 LOAD_GLOBAL fib LOAD_FAST n LOAD_CONST 2 BINARY_SUBTRACT CALL_FUNCTION 1 BINARY_ADD RETURN_VALUE end MAKE_FUNCTION 0 STORE_NAME fib LOAD_CONST None RETURN_VALUE
This is almost identical to the code which would be generated by the equivilant Python code, except the Python compiler seems to throw in a couple of extra jumps after the first two RETURN_VALUEs.
An interesting thing you can do with Python bytecode is use the stack to avoid having to store values in variables. For example, calculating the factorial (in a kind of silly manner) of a number without storing to and reading from variables (except the parameter, of course):
stack 1 LOAD_CONST function n # calculate factorial, using only the stack stack 4 LOAD_CONST 1 LOAD_FAST n SETUP_LOOP ret while: DUP_TOP LOAD_CONST 1 COMPARE_OP > JUMP_IF_FALSE done POP_TOP DUP_TOP ROT_THREE INPLACE_MULTIPLY ROT_TWO LOAD_CONST 1 INPLACE_SUBTRACT JUMP_ABSOLUTE while done: POP_TOP POP_TOP POP_BLOCK ret: RETURN_VALUE end MAKE_FUNCTION 0 STORE_NAME fact LOAD_CONST None RETURN_VALUE
If you wrote a similar function in Python:
def fact(n): product = 1 while n > 1: product *= n n -= 1 return product
You’d get the following bytecode:
stack 1 LOAD_CONST function n stack 2 LOAD_CONST 1 STORE_FAST product SETUP_LOOP label2 label0: LOAD_FAST n LOAD_CONST 1 COMPARE_OP > JUMP_IF_FALSE label1 POP_TOP LOAD_FAST product LOAD_FAST n INPLACE_MULTIPLY STORE_FAST product LOAD_FAST n LOAD_CONST 1 INPLACE_SUBTRACT STORE_FAST n JUMP_ABSOLUTE label0 label1: POP_TOP POP_BLOCK label2: LOAD_FAST product RETURN_VALUE end MAKE_FUNCTION 0 STORE_NAME fact LOAD_CONST None RETURN_VALUE
Running each of these with the following code:
times = 100000 from timeit import Timer t = Timer("f.fact(100)", "import f") print t.timeit(times) t = Timer("fact.fact(100)", "import fact") print t.timeit(times)
Gives the following output:
Successive runs had similar results, showing that the stack-only version is somewhat faster. It’s possible that for less trivial functions this might offer even more of a benefit.