Dead code in Python-generated bytecodePosted Tue, Apr 22, 2008 in:
So I’ve made a couple of changes to Papaya (yeah, it’s called Papaya now):
As suggested by Phillip J. Eby, rather than generating the bytecode myself, I’m now using BytecodeAssembler, which has shortened and simplified my code a bit (though honestly not as much as I originally thought it would). I had already considered doing this before I wrote it all myself, but I wanted to get the educational benefit of doing it all from scratch.
I’ve changed the syntax for function definitions to match that of Python’s (minus the closing ‘:‘), which also means that I’ve added support for *args and **kwargs parameters. Also, since I’m using BytecodeAssembler, you get the automatic parameter unpacking described here when using nested positional arguments. Of course, this will currently get duplicated if you decompile and then recompile code. I haven’t decided what to do about this yet.
You no longer need to specify a label for any of the SETUP_* instructions, since BytecodeAssembler handles this for you as well.
I added a setup.py file, which uses ez_setup and can build a .egg file and other fancy things. I will add this project into pypi as soon as I resolve the issue I’m about to talk about below.
You no longer need to specify the stack size of a given block of code, it will be calculated for you by BytecodeAssembler.
So, due to my use of BytecodeAssembler, I get free stack size calculations, but I get another feature which is somewhat annoying: dead-code prevention.
Why is this annoying? Because the Python compiler generates dead code all the time.
What this means is, if you decompile any non-trivial (and some quite-trivial) .pyc files created by Python, and then try to recompile then, then it will fail with an “AssertionError: Unknown stack size at this location” message.
For example, take the following, very simple .py file:
while True: if True: continue break
This is disassembled into the following:
SETUP_LOOP label0: LOAD_NAME True JUMP_IF_FALSE label3 POP_TOP LOAD_NAME True JUMP_IF_FALSE label1 POP_TOP JUMP_ABSOLUTE label0 JUMP_FORWARD label2 label1: POP_TOP label2: BREAK_LOOP JUMP_ABSOLUTE label0 label3: POP_TOP POP_BLOCK LOAD_CONST None RETURN_VALUE
Note the double JUMP. This is generated any time you have a continue statement, despite the fact that the second jump cannot ever be run. Also unecessary is the JUMP_ABSOLUTE after BREAK_LOOP.
Both of these cause an error in BytecodeAssembler because it has no context from which to determine the stack size at that point. Of course, that doesn’t really matter since the code will never be run.
I’m currently stumped as to the best way to solve this issue, and I’m tired and don’t want to think about it any more. :(