Dead code in Python-generated bytecode
Posted Tue, Apr 22, 2008 in:So I’ve made a couple of changes to Papaya (yeah, it’s called Papaya now):
As suggested by Phillip J. Eby, rather than generating the bytecode myself, I’m now using BytecodeAssembler, which has shortened and simplified my code a bit (though honestly not as much as I originally thought it would). I had already considered doing this before I wrote it all myself, but I wanted to get the educational benefit of doing it all from scratch.
I’ve changed the syntax for function definitions to match that of Python’s (minus the closing ‘:‘), which also means that I’ve added support for *args and **kwargs parameters. Also, since I’m using BytecodeAssembler, you get the automatic parameter unpacking described here when using nested positional arguments. Of course, this will currently get duplicated if you decompile and then recompile code. I haven’t decided what to do about this yet.
You no longer need to specify a label for any of the SETUP_* instructions, since BytecodeAssembler handles this for you as well.
I added a setup.py file, which uses ez_setup and can build a .egg file and other fancy things. I will add this project into pypi as soon as I resolve the issue I’m about to talk about below.
You no longer need to specify the stack size of a given block of code, it will be calculated for you by BytecodeAssembler.
So, due to my use of BytecodeAssembler, I get free stack size calculations, but I get another feature which is somewhat annoying: dead-code prevention.
Why is this annoying? Because the Python compiler generates dead code all the time.
What this means is, if you decompile any non-trivial (and some quite-trivial) .pyc files created by Python, and then try to recompile then, then it will fail with an “AssertionError: Unknown stack size at this location” message.
For example, take the following, very simple .py file:
while True:
if True:
continue
break
This is disassembled into the following:
SETUP_LOOP
label0:
LOAD_NAME True
JUMP_IF_FALSE label3
POP_TOP
LOAD_NAME True
JUMP_IF_FALSE label1
POP_TOP
JUMP_ABSOLUTE label0
JUMP_FORWARD label2
label1:
POP_TOP
label2:
BREAK_LOOP
JUMP_ABSOLUTE label0
label3:
POP_TOP
POP_BLOCK
LOAD_CONST None
RETURN_VALUE
Note the double JUMP. This is generated any time you have a continue statement, despite the fact that the second jump cannot ever be run. Also unecessary is the JUMP_ABSOLUTE after BREAK_LOOP.
Both of these cause an error in BytecodeAssembler because it has no context from which to determine the stack size at that point. Of course, that doesn’t really matter since the code will never be run.
I’m currently stumped as to the best way to solve this issue, and I’m tired and don’t want to think about it any more. :(