Python IMPORT_NAME bytecode mystery
Posted Mon, Apr 14, 2008 in:I’ve been messing around with Python bytecode this weekend, which is why there was no Sunday post (as I’m writing this it’s very early Monday morning…).
There’s some fun stuff to wrap your head around when it comes to Python bytecode. The structure of the virtual machine, the order of bytes in bytecode arguments, the instructions which require magical numbers to be pushed onto the stack before being called.
I’ll go into more detail about what I’m doing mucking about with the Python internals tomorrow (later today, ugh! I need to go to bed!), but for now I’ll share one of the little mysteries that I’ve run into and managed to figure out.
IMPORT_NAME
IMPORT_NAME is the opcode you use to import another module. The description from the Python bytecode docs is this:
IMPORT_NAME namei
Imports the module co_names[namei]. The module object is pushed onto the stack. The current namespace is not affected: for a proper import statement, a subsequent STORE_FAST instruction modifies the namespace.
Basically, each code object has a tuple of names, namei is an index into that list of names, pointing to the name of the module you want to import. When this instruction is run you end up with the module on the stack, and you can then bind it to a name and access items within it.
Not mentioned here is the fact that you have to push two extra parameters onto the stack before calling IMPORT_NAME, otherwise the Python interpreter segfaults when it hits that instruction.
import sys
is compiled by the python compiler into the following (as output by dis.disassemble):
0 LOAD_CONST 1 (-1)
3 LOAD_CONST 0 (None)
6 IMPORT_NAME 0 (sys)
9 STORE_FAST 0 (sys)
I tried changing the -1 to a different number and I got “ValueError: Attempted relative import in non-package”
Looking into Python/ceval.c in the Python interpreter sourcecode, I see that these two extra parameters are passed in as the last two parameters to the builtin __import__ function, fromlist and level.
According to help(__import__) the -1 for level indicates it should try both absolute and relative imports, and the fromlist is empty because this isn’t a “from sys import …” statement.
from sys import argv, subversion, byteorder
compiles into
0 LOAD_CONST 1 (-1)
3 LOAD_CONST 2 (('argv', 'subversion', 'byteorder'))
6 IMPORT_NAME 0 (sys)
9 IMPORT_FROM 1 (argv)
12 STORE_FAST 0 (argv)
15 IMPORT_FROM 2 (subversion)
18 STORE_FAST 1 (subversion)
21 IMPORT_FROM 3 (byteorder)
24 STORE_FAST 2 (byteorder)
27 POP_TOP
It’s not immediately obvious why it needs to do the extra calls to IMPORT_FROM, but the reason seems to be that __import__ doesn’t actually do anything with the fromlist argument.
Anyway, I need to sleep now.
Note to self: submit a Python bugtracker issue about the undocumented required parameters. done, issue 2631
Update:
I also posted this as a response to Thomas Lee’s response, but duplicated it here for easier finding.
After looking at the docs for __import__, it’s interesting to see that the values in fromlist aren’t actually used, but it is significant whether the fromlist is empty or non-empty.
When the name variable is of the form package.module, normally, the top-level package (the name up till the first dot) is returned, not the module named by name. However, when a non-empty fromlist argument is given, the module named by name is returned. This is done for compatibility with the bytecode generated for the different kinds of import statement; when using “import spam.ham.eggs”, the top-level package spam must be placed in the importing namespace, but when using “from spam.ham import eggs”, the spam.ham subpackage must be used to find the eggs variable.
I suppose the reason that fromlist isn’t just changed to a boolean is that it might be significant in a setup where __import__ is redefined for custom importing, like Thomas said.