PPyA: Python Assembler

Over the last few of days I’ve hacked together a Python Assembler/Disassembler. I’ve called it PPya (pronounced like “papaya,” the fruit) Paul’s Python assembler. The ‘a’ is left lowercase because it looks better that way.

Each of those days I started to write up this blog post but then got distracted working on it some more

It’s at the point now where it is fairly usable, both as a learning tool and as a tool for writing Python modules in assembly if you feel so inclined.

If you want to check it out, the gitweb project page is here: https://github.com/pib/papaya

or you can git clone it:

git clone https://github.com/pib/papaya

PPya Overview

A .pya file consists of a series of bytecodes (well, strings representing them, anyway) followed by parameters for those instructions which take parameters. When assembled, these parameters are converted to indices into a tuple in a python Code object, one of co_names, co_consts, co_varnames, co_cellvars, or co_freevars.

See the Python bytecode documentation for a list of all the bytecodes available. As I pointed out in my previous blog post, and a Python issue report, the IMPORT_NAME opcode actually takes two parameters off of the stack.

In addition to the actual bytecodes, there are a few meta-bytecodes which I’ve added. When calling LOAD_CONST, you can use the word “function” followed by a comma-separated parameter list to indicate that this opcode should reference a code object. The following lines up until a line consisting of the word “end” will be compiled and put into the co_consts tuple as a nested code object.

A line which ends with a colon is considered to be a label for jumping. A JUMP_* bytecode with that same label (minus the colon) will be converted to the address or offset to the instruction immediately after the corresponding label.

The final meta-opcode is “stack”. The stack opcode takes an integer as an argument and indicates the required stack size for the generated code object. If you LOAD_CONST 4 items, then your stack size needs to be at least 4. In future versions I plan on generating this based on an analysis of the bytecodes, but for now it needs to be set manually in each module and function. If you don’t set this number high enough the Python interpreter will segfault!

I haven’t added support for *args and *kwargs yet, but it’s at the top of my todo list for the project.

Also, the CALL_FUNCTION instructions use the two bytes of their parameter to indicate the number of positional and the number of keyword arguments, respectively, but the current version just treats the whole thing like a single integer. Separating these parameters for FUNCTION_CALL instructions is second on the todo list.

I don’t store the name of functions in their respective code objects yet, since the name is only implicitly given in the STORE_* instruction that comes after the function is defined. I’ll probably extend the function syntax to allow the giving of a name to a function, and default it to “” if one isn’t supplied.

Lastly, I don’t set the flag indicating if a function is a generator function at the moment.

Usage Examples

A simple example

Here’s an example of the sort of thing you can do with this tool:

Imagine you have a simple python module johnny5.py in the current directory:

    print 'No Disassemble!'

Compile that code (run the interpreter and import it, or use python -m compileall .), then pass it in as the only parameter to assembler.py, which will create johnny5.pyd:

    stack 1
        LOAD_CONST 'No Disassemble!'
        PRINT_ITEM
        PRINT_NEWLINE
        LOAD_CONST None
        RETURN_VALUE

If you then change that second line:

    LOAD_CONST 'Yes Disassemble! Number 5 ready for orders.'

and rename the file to johnny5.pya (it’s saved as .pyd to avoid overwriting an existing .pya file), you can now run python johnny5.pyc and get this:

Yes Disassemble! Number 5 ready for orders.

Of course, you could have just as easily changed the .py file, but this isn’t the only thing you can do with this tool.

Writing a .pya file directly

You don’t have to generate your .pya from an existing .py file, of course. You can also just write it from scratch (or possibly generate it from another program).

For example, a function to calculate the fibonacci number at a certain index:

    # Example: fibonacci generator.
    	stack 1
    	LOAD_CONST function n
    		# if n is 0, return 0
    		stack 4
    		LOAD_FAST n
    		LOAD_CONST 0
    		COMPARE_OP ==
    		JUMP_IF_FALSE not_zero
    		POP_TOP
    		LOAD_CONST 0
    		RETURN_VALUE
    		
    		# else if n is 1, return 1
    	not_zero:
    		POP_TOP
    		LOAD_FAST n
    		LOAD_CONST 1
    		COMPARE_OP ==
    		JUMP_IF_FALSE not_one
    		POP_TOP
    		LOAD_CONST 1
    		RETURN_VALUE
    		
    		# else n is fib(n-1) + fib(n-2)
    	not_one:
    		POP_TOP
    		LOAD_GLOBAL fib
    		LOAD_FAST n
    		LOAD_CONST 1
    		BINARY_SUBTRACT
    		CALL_FUNCTION 1
    		
    		LOAD_GLOBAL fib
    		LOAD_FAST n
    		LOAD_CONST 2
    		BINARY_SUBTRACT
    		CALL_FUNCTION 1
    		BINARY_ADD
    		
    		RETURN_VALUE
    	end
    	MAKE_FUNCTION 0
    	STORE_NAME fib
    	LOAD_CONST None
    	RETURN_VALUE

This is almost identical to the code which would be generated by the equivilant Python code, except the Python compiler seems to throw in a couple of extra jumps after the first two RETURN_VALUEs.

An interesting thing you can do with Python bytecode is use the stack to avoid having to store values in variables. For example, calculating the factorial (in a kind of silly manner) of a number without storing to and reading from variables (except the parameter, of course):

    stack 1
    	LOAD_CONST function n
    		# calculate factorial, using only the stack
    		stack 4
    		LOAD_CONST 1
    		LOAD_FAST n
    		
    		SETUP_LOOP ret
    	while:	
    		DUP_TOP
    		LOAD_CONST 1
    		COMPARE_OP >
    		JUMP_IF_FALSE done
    		POP_TOP
    		DUP_TOP
    		ROT_THREE
    		INPLACE_MULTIPLY
    		ROT_TWO
    		LOAD_CONST 1
    		INPLACE_SUBTRACT
    		JUMP_ABSOLUTE while
    	done:
    		POP_TOP
    		POP_TOP
    		POP_BLOCK
    	ret:
    		RETURN_VALUE
    	end
    	MAKE_FUNCTION 0
    	STORE_NAME fact
    	LOAD_CONST None
    	RETURN_VALUE

If you wrote a similar function in Python:

    def fact(n):
    	product = 1
    	while n > 1:
    		product *= n
    		n -= 1
    	return product

You’d get the following bytecode:

    stack 1
        LOAD_CONST function n
            stack 2
            LOAD_CONST 1
            STORE_FAST product
            SETUP_LOOP label2
        label0:
            LOAD_FAST n
            LOAD_CONST 1
            COMPARE_OP >
            JUMP_IF_FALSE label1
            POP_TOP
            LOAD_FAST product
            LOAD_FAST n
            INPLACE_MULTIPLY
            STORE_FAST product
            LOAD_FAST n
            LOAD_CONST 1
            INPLACE_SUBTRACT
            STORE_FAST n
            JUMP_ABSOLUTE label0
        label1:
            POP_TOP
            POP_BLOCK
    	label2:
            LOAD_FAST product
            RETURN_VALUE
        end
        MAKE_FUNCTION 0
        STORE_NAME fact
        LOAD_CONST None
        RETURN_VALUE

Running each of these with the following code:

    times = 100000
        from timeit import Timer
        t = Timer("f.fact(100)", "import f")
        print t.timeit(times)
        
        t = Timer("fact.fact(100)", "import fact")
        print t.timeit(times)

Gives the following output:

8.19782996178

7.5446908474

Successive runs had similar results, showing that the stack-only version is somewhat faster. It’s possible that for less trivial functions this might offer even more of a benefit.

blog comments powered by Disqus