A Simple Lisp Parser in Python

A managed to sneak in a little time recently and write a simple Lisp s-expression parser in Python.

I started out with a version that just understands parentheses, atoms, and strings, but it was just a couple of extra lines to add in support for the quote “operator”, that I went ahead and threw that in there, too.

I’ll go ahead and go through it bit-by-bit here, though I think it’s pretty straightforward.

    from string import whitespace
    
    atom_end = set('()"\'') | set(whitespace)

When any of these things are encountered while reading an atom, the atom should be ended. This means no need for whitespace betwen an atom and opening or closing parentheses or quotes.

    def parse(sexp):
        stack, i, length = [[]], 0, len(sexp)
        while i < length:

Rather than write this recursively, I am using a list as a stack and a simple while-loop to go through the contents of the string. I start with an empty list on the stack, which will hold the entire parsed sexp when the parsing is done.

    c = sexp[i]
            reading = type(stack[-1])

Each character is put into c and the type of the current item being read is checked. Rather than using some variable to keep track of the current state of the parser, I just used a distinct Python type for each of the possible lisp types to be read: a list for a list, a string for a string, and a string inside a tuple for an atom. Using this system, I just have to check what type the top element in the stack is, and I’ll know what I’m currently reading.

    if reading == list:
                if   c == '(': stack.append([])
                elif c == ')': 
                    stack[-2].append(stack.pop())
                    if stack[-1][0] == ('quote',): stack[-2].append(stack.pop())
                elif c == '"': stack.append('')
                elif c == "'": stack.append([('quote',)])
                elif c in whitespace: pass
                else: stack.append((c,))

When reading a list:

    elif reading == str:
                if   c == '"': 
                    stack[-2].append(stack.pop())
                    if stack[-1][0] == ('quote',): stack[-2].append(stack.pop())
                elif c == '\\': 
                    i += 1
                    stack[-1] += sexp[i]
                else: stack[-1] += c

When reading a string:

    elif reading == tuple:
                if   c in atom_end: 
                    atom = stack.pop()
                    if atom[0][0].isdigit(): stack[-1].append(eval(atom[0]))
                    else: stack[-1].append(atom)
                    if stack[-1][0] == ('quote',): stack[-2].append(stack.pop())
                    continue
                else: stack[-1] = ((stack[-1][0] + c),)

When reading an atom:

    i += 1
        return stack.pop()

At the end here, I just take whatever’s on top of the stack (which should be the only thing on the stack if a proper s-expression was passed in), and return that.

I’m going to continue working on and improving this parser, for use in another project of mine, so I will be adding more functionality.

I was thinking of doing a s-expression to Python bytecode compiler, but someone already wrote one, and they even took the name I was going to use: sexpy.

I think the first thing I’ll be adding is the capability for the parser to yield sub-sexps so the entire thing doesn’t need to be parsed in one go before returning anything. Another thing I’ll most likely add is some checking for invalid sexps, something I left out for clarity in my initial version.

For those who are interested, here’s the complete version of sexp.py

blog comments powered by Disqus