Posts Tagged ‘lisp’

A Simple Lisp Parser in Python

Monday, November 23rd, 2009

A managed to sneak in a little time recently and write a simple Lisp s-expression parser in Python.

I started out with a version that just understands parentheses, atoms, and strings, but it was just a couple of extra lines to add in support for the quote “operator”, that I went ahead and threw that in there, too.

I’ll go ahead and go through it bit-by-bit here, though I think it’s pretty straightforward.

from string import whitespace
 
atom_end = set('()"\'') | set(whitespace)

When any of these things are encountered while reading an atom, the atom should be ended. This means no need for whitespace betwen an atom and opening or closing parentheses or quotes.

def parse(sexp):
    stack, i, length = [[]], 0, len(sexp)
    while i < length:

Rather than write this recursively, I am using a list as a stack and a simple while-loop to go through the contents of the string. I start with an empty list on the stack, which will hold the entire parsed sexp when the parsing is done.

        c = sexp[i]
        reading = type(stack[-1])

Each character is put into c and the type of the current item being read is checked. Rather than using some variable to keep track of the current state of the parser, I just used a distinct Python type for each of the possible lisp types to be read: a list for a list, a string for a string, and a string inside a tuple for an atom. Using this system, I just have to check what type the top element in the stack is, and I’ll know what I’m currently reading.

        if reading == list:
            if   c == '(': stack.append([])
            elif c == ')': 
                stack[-2].append(stack.pop())
                if stack[-1][0] == ('quote',): stack[-2].append(stack.pop())
            elif c == '"': stack.append('')
            elif c == "'": stack.append([('quote',)])
            elif c in whitespace: pass
            else: stack.append((c,))

When reading a list:

  • If an opening parenthesis is encountered, add a new list to the stack
  • If a closing parenthesis is encountered, take the list from the top of the stack and append it to the next list down in the stack. Check if the last thing read was a quote, and if so, add this list as the parameter to the quote call.
  • If an opening double-quote is encountered, add an empty string to the stack.
  • If a single-quote is found, add a list with a quote symbol at the beginning of it.
  • If c is whitespace, ignore it.
  • Otherwise, start an atom with whatever other character was found.
        elif reading == str:
            if   c == '"': 
                stack[-2].append(stack.pop())
                if stack[-1][0] == ('quote',): stack[-2].append(stack.pop())
            elif c == '\\': 
                i += 1
                stack[-1] += sexp[i]
            else: stack[-1] += c

When reading a string:

  • If a closing double-quote is found, end the string, and check for a previous quote call like above (it doesn’t really make sense to quote a string, but this isn’t really the place to check for that, is it?)
  • If a backslash is found, put whatever the following character is in the string. This is a simplistic way to at least allow double-quotes to be escaped, but I don’t have anything allowing for more complex escape-strings in here yet.
  • Anything else, add to the string.
        elif reading == tuple:
            if   c in atom_end: 
                atom = stack.pop()
                if atom[0][0].isdigit(): stack[-1].append(eval(atom[0]))
                else: stack[-1].append(atom)
                if stack[-1][0] == ('quote',): stack[-2].append(stack.pop())
                continue
            else: stack[-1] = ((stack[-1][0] + c),)

When reading an atom:

  • If the end of the atom has been reached, check if the atom is a number (this code just assumes that if the first character of the atom is numeric, that it’s a number). If it is, I cheat and use eval to get its value, if it’s not, I leave the atom as a tuple-wrapped string.
  • Anything else, add as a character to the atom.
        i += 1
    return stack.pop()

At the end here, I just take whatever’s on top of the stack (which should be the only thing on the stack if a proper s-expression was passed in), and return that.

I’m going to continue working on and improving this parser, for use in another project of mine, so I will be adding more functionality.

I was thinking of doing a s-expression to Python bytecode compiler, but someone already wrote one, and they even took the name I was going to use: sexpy.

I think the first thing I’ll be adding is the capability for the parser to yield sub-sexps so the entire thing doesn’t need to be parsed in one go before returning anything. Another thing I’ll most likely add is some checking for invalid sexps, something I left out for clarity in my initial version.

For those who are interested, here’s the complete version of sexp.py

The A++ Programming Language

Monday, July 13th, 2009

aplusplusimage

Introduction to A++

Update: Apologies to anyone who’s sensibilities were offended by the claim that A++ is a “purely functional” language. This was a mistake on my part. While the only thing (with the exceptions mentioned below) you can really manipulate with A++ is functions, you could perhaps say that A++ is a “purely function-oriented” language, but since it does offer destructive assignment it is not “purely functional,” but more of a baby-Scheme.

I’m going to go ahead and claim temporary sleep-deprivation-induced insanity, and offer a correction below.

Despite having the same name with just one more plus, A++ is not related in any way to A+. A++ is a truly tiny, purely functional programming language language where (almost) everything is a function, whereas A+ is anything but tiny and is far from being purely functional represents almost everything as arrays.

A++, which is short for Abstraction + reference + synthesis, is an abstraction of the lambda calculus, with a syntax that is even simpler (and thankfully contains only symbols which are on a standard keyboard). This simplicity is due to the fact that it borrows its syntax from Lisp and Scheme, requiring parentheses to surround each expression.

The addition that A++ makes to lambda calculus is the ability to explicitly assign names to objects (functions or values), something which lambda calculus only supports through binding via function calls.

A++ is intended to be a “learning instrument for programming” and thus doesn’t try to add any extensions to the language which would make it too useful for everyday programming.

Due to its simplicity, it is a great language for getting to understand concepts which would normally be overshadowed by all the other stuff in a language.

A++ is a Lisp-1, with a single name space for both functions and data. Indeed, in A++ there is no operator to define a function, one simply defines a symbol to map to a lambda expression.

Since A++ uses lazy-evaluation, control structures such as if statements and while loops can be created with just lambdas, without having to add macros to the language.

There are some primitive functions built into the interpreter to allow for converting church numerals into regular numbers, and for displaying booleans and numbers. There is support for primitive integers, strings, and symbolic constants. However, the support for primitive integers is limited to the minimum needed for converting from a church-encoded integer to a primitive integer, so the primitive integer can be printed, and the support for strings is only in place to allow for loading of external files into the interpreter. This means that in the language, you have to work exclusively with church numerals, and the only really useful primitive (aside from lambdas and define) are the symbolic constants.

All the standard lisp stuff, like car, cdr, map, filter, boolean operators (true and false, once again, defined in church encoding), various predicates (like nullp and zerop), while, and for-each are all built in the A++ language itself, and loaded automatically from a file when the interpreter is started, rather than being built-in functions.

So, when I said this was a purely functional language everything in this language is constructed using functions, I wasn’t kidding. The only difference from what most people would consider a purely functional language, is the fact that you can re-define a new value to the same name. Of course, without the ability to do this, closures would be of limited usefulness.

My First A++ Program

I had a hard time coming up with an idea for a program to implement in A++. Mainly, anything more complex I wanted to do, I couldn’t because of the very limited interaction with the outside world allowed by the A++ interpreter.

I looked at the list of solutions by programming task at Rosetta Code, and decided to implement the Roman Numerals task.

I based my solution off the first C solution for roman numerals. Due to the lazy-evaluation provided by A++, I was able to translate the macros used almost exactly.

While A++ makes possible functional and object-oriented programming, this example is imperative in nature, since it is copied more or less exactly from a C implementation of the same algorithm.

(define roman 
    (lambda(n)
 
      (define iftrue      ; standard if function requires an "else" bit
          (lambda(b t)
            (if b t nil)))
 
      (define digit 
          (lambda(loop num c)
            (loop (gep n num) 
               (lambda()
                 (print c)
                 (define n (sub n num))))))
      (define digits
          (lambda(loop num c1 c2)
            (loop (gep n num)
               (lambda()
                 (print c1)
                 (print c2)
                 (define n (sub n num))))))
 
      (define hundred (mult ten ten))
 
      (digit  while   (mult ten hundred) 'M   )
      (digits iftrue (mult nine hundred) 'C 'M)
      (digit  iftrue (mult five hundred) 'D   )
      (digits iftrue (mult four hundred) 'C 'D)
      (digit  while              hundred 'C   )
      (digits iftrue     (mult nine ten) 'X 'C)
      (digit  iftrue     (mult five ten) 'L   )
      (digits iftrue     (mult four ten) 'X 'L)
      (digit  while                  ten 'X   )
      (digits iftrue                nine 'I 'X)
      (digit  iftrue                five 'V   )
      (digits iftrue                four 'I 'V)
      (digit  while                  one 'I   )))

You can download the A++ roman numeral program if you’d like.

Sadly, A++ offers no way to build up a string for printing or to print multiple items on a single line, so each of the numerals will be printed on its own line.

Also, I chose this particular implementation because A++ doesn’t provide a function to perform division!

continue on to learn more about A++

(more…)