Archive for the ‘Python’ Category

A Simple Lisp Parser in Python

Monday, November 23rd, 2009

A managed to sneak in a little time recently and write a simple Lisp s-expression parser in Python.

I started out with a version that just understands parentheses, atoms, and strings, but it was just a couple of extra lines to add in support for the quote “operator”, that I went ahead and threw that in there, too.

I’ll go ahead and go through it bit-by-bit here, though I think it’s pretty straightforward.

from string import whitespace
 
atom_end = set('()"\'') | set(whitespace)

When any of these things are encountered while reading an atom, the atom should be ended. This means no need for whitespace betwen an atom and opening or closing parentheses or quotes.

def parse(sexp):
    stack, i, length = [[]], 0, len(sexp)
    while i < length:

Rather than write this recursively, I am using a list as a stack and a simple while-loop to go through the contents of the string. I start with an empty list on the stack, which will hold the entire parsed sexp when the parsing is done.

        c = sexp[i]
        reading = type(stack[-1])

Each character is put into c and the type of the current item being read is checked. Rather than using some variable to keep track of the current state of the parser, I just used a distinct Python type for each of the possible lisp types to be read: a list for a list, a string for a string, and a string inside a tuple for an atom. Using this system, I just have to check what type the top element in the stack is, and I’ll know what I’m currently reading.

        if reading == list:
            if   c == '(': stack.append([])
            elif c == ')': 
                stack[-2].append(stack.pop())
                if stack[-1][0] == ('quote',): stack[-2].append(stack.pop())
            elif c == '"': stack.append('')
            elif c == "'": stack.append([('quote',)])
            elif c in whitespace: pass
            else: stack.append((c,))

When reading a list:

  • If an opening parenthesis is encountered, add a new list to the stack
  • If a closing parenthesis is encountered, take the list from the top of the stack and append it to the next list down in the stack. Check if the last thing read was a quote, and if so, add this list as the parameter to the quote call.
  • If an opening double-quote is encountered, add an empty string to the stack.
  • If a single-quote is found, add a list with a quote symbol at the beginning of it.
  • If c is whitespace, ignore it.
  • Otherwise, start an atom with whatever other character was found.
        elif reading == str:
            if   c == '"': 
                stack[-2].append(stack.pop())
                if stack[-1][0] == ('quote',): stack[-2].append(stack.pop())
            elif c == '\\': 
                i += 1
                stack[-1] += sexp[i]
            else: stack[-1] += c

When reading a string:

  • If a closing double-quote is found, end the string, and check for a previous quote call like above (it doesn’t really make sense to quote a string, but this isn’t really the place to check for that, is it?)
  • If a backslash is found, put whatever the following character is in the string. This is a simplistic way to at least allow double-quotes to be escaped, but I don’t have anything allowing for more complex escape-strings in here yet.
  • Anything else, add to the string.
        elif reading == tuple:
            if   c in atom_end: 
                atom = stack.pop()
                if atom[0][0].isdigit(): stack[-1].append(eval(atom[0]))
                else: stack[-1].append(atom)
                if stack[-1][0] == ('quote',): stack[-2].append(stack.pop())
                continue
            else: stack[-1] = ((stack[-1][0] + c),)

When reading an atom:

  • If the end of the atom has been reached, check if the atom is a number (this code just assumes that if the first character of the atom is numeric, that it’s a number). If it is, I cheat and use eval to get its value, if it’s not, I leave the atom as a tuple-wrapped string.
  • Anything else, add as a character to the atom.
        i += 1
    return stack.pop()

At the end here, I just take whatever’s on top of the stack (which should be the only thing on the stack if a proper s-expression was passed in), and return that.

I’m going to continue working on and improving this parser, for use in another project of mine, so I will be adding more functionality.

I was thinking of doing a s-expression to Python bytecode compiler, but someone already wrote one, and they even took the name I was going to use: sexpy.

I think the first thing I’ll be adding is the capability for the parser to yield sub-sexps so the entire thing doesn’t need to be parsed in one go before returning anything. Another thing I’ll most likely add is some checking for invalid sexps, something I left out for clarity in my initial version.

For those who are interested, here’s the complete version of sexp.py

Live List of the Most Popular Twitter Clients

Friday, August 21st, 2009

I just put up a live list of the most popular Twitter clients. The contents of that page are updated every 60 seconds (the length of time twitter caches their public timeline).

For some reason I woke up at 7 this morning, unable to get back to sleep, and I was randomly wondering how many Twitter clients were out there, and which were the most popular. So I decided to find out.

After a bit of hacking around, I came up with the following Python script:

from urllib import urlopen
from contextlib import closing
import pickle
import json
 
try:
    clientlist = pickle.load(open('clientlist.pickle'))
except:
    clientlist = {}
 
with closing(urlopen('http://twitter.com/statuses/public_timeline.json')) as f:
    json_str = f.read()
 
tweets = json.loads(json_str)
 
for tweet in tweets:
    source = tweet['source']
    clientlist[source] = clientlist.get(source, 0) + 1
 
with open('clientlist.pickle', 'w') as f:
    pickle.dump(clientlist, f)
 
total = 0
for count in clientlist.itervalues():
    total += count
 
with open('clientlist.html', 'w') as f:
    f.write('<html><head><title>Twitter Client list</title></head><body>')
    f.write(''.join(['<h1>Twitter Client Popularity</h1>',
                     '<div style="float: left; margin-right: 20px;">Out of <em>', str(total), 
                     '</em> Twitter messages, the following were the clients used.']))
    f.write('<table><thead><tr><th>Client</th><th>Tweet Count</th><th>% of total</th></thead><tbody>')
 
    for (link, count) in sorted(clientlist.iteritems(), key=lambda item: item[1], reverse=True):
        f.write(''.join(['<tr><td>', link, '</td><td>', str(count), '</td><td>', 
                         str(round(100.0 * count / total, 2)), '</td>']))
 
    f.write('</tbody></table></div>')
 
    f.write('</body></html>')

The script writes out a pickle file with the list of counts for each client, so multiple runs will generate a cumulative result (don’t run it more than once every 60 seconds, or you will add the same results in more than once).

It then writes out an .html file with a nice sorted list of the most common Twitter clients (well, all the twitter clients it’s seen, sorted by most common first).

I’ve set up a cron job on this server to update the file linked to above every minute, so as time goes on the list will become a more and more accurate representation of what people are using to tweet.

500 Programming Languages: Python

Saturday, August 1st, 2009

sillywalk

Python is one of my favorite programming languages. It’s almost always the first one that I reach for when I have a programming task I’d like to try out, and often enough, it’s the final language of that task, too.

Introduction to Python

On the official Python website, Python is described as “a dynamic object-oriented programming languages that can be used for many kinds of software development.”

They also claim that “Many Python programmers report substantial productivity gains and feel the language encourages the development of higher quality, more maintainable code.”

I am inclined to agree.

Though Python is a true “everything is an object” OO language, you can do more than just Object-Oriented programming with it. You can do plain-ol’ procedural programming, functional programming, Object-Oriented programming, and probably some other types of programming if you wanted.

One of the nicest things about Python is the “batteries included” approach, which means that Python comes with a ton of modules that do a large amount of what you want to do, right out of the box (or tarball, or installer…I don’t know if you can actually get Python in a box). Does your app need to write a cgi app? Or even better, a wsgi app? Web client? Web server??

Yep, it has all those, and many, many more.

There are many, more thorough introductions to Python out there, so I won’t attempt to cover the language in too much more depth, but I will show you a couple of things that really capture the spirit of Python:

Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import braces
  File "<stdin>", line 1
SyntaxError: not a chance
>>> import this
The Zen of Python, by Tim Peters
 
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
>>> import __hello__
Hello world...
>>> 
</stdin>

My First Python Program

This is far from my first program in Python. I first discovered Python when I was in High School, so possibly as early as 1999-2000. I had discovered Blender, a free 3D editing application, and it had Python embedded within it, for use in creating extensions and scripting games for the built-in game engine. Back then, I still thought I was going to be a video game developer, so I made some effort to learn Python so I could write games in Blender.

Anyway, enough history. :)

For this program, I implemented a Brainfuck-to-Python compiler. I won’t go into Brainfuck right now because I will be doing a post on it next.

This program is nowhere near demonstrating all of, or even a large number of, Python’s features, but it does demonstrate how easy it is to write a quick-and-easy program.

I threw this compiler together in less than 5 minutes, and it functioned properly on my first try (the Brainfuck hello world from Wikipedia).

The only addition I had to make later was handling the case of end-of-file on the input.

def compile(program):
    indent = 4
    code = [
        'def compiled(input=None, output=None):',
        '    import sys',
        '    if not input: input = sys.stdin',
        '    if not output: output = sys.stdout',
        '    i = 0',
        '    a = [0]*30000'
        ]
    commands = {
        '>': 'i += 1',
        '< ': 'i -= 1',
        '+': 'a[i] += 1',
        '-': 'a[i] -= 1',
        '.': 'output.write(chr(a[i]))',
        ',': 'a[i] = ord(input.read(1) or "\0")',
        '[': 'while a[i]:',
        ']': ''
        }
    for command in program:
        line = commands.get(command, None)
        if line: code.append((' ' * indent) + line)
 
        if command == '[':   indent += 4
        elif command == ']': indent -= 4
    exec '\n'.join(code)
    return compiled

This function takes a string containing BF code as its one argument and returns a function which will run that code when called.

The function returned will have two optional arguments, input and output, which are expected to be file-like objects (at least implementing read and write, respectively), which allows for passing in specific input and/or handling the output from within a python program.

It’s kind of a dirty way to do it, building Python source, an then running it with exec, but it gets the job done. I could also save the function to a compiled .pyc file, so you could later import it directly, but that would require a bit more code and ruin the simplicity of it all. I could also have generated input for my Python assembler, and used that to generate a compiled .pyc file, but like I said, I was going for simplicity here, and encouraging people to learn Python, not necessarily Python bytecodes.

You can download the Brainfuck-to-Python Compiler if you’d like. You can download the Brainfuck-to-Python Compiler if you’d like.

continue on to learn more about Python

(more…)

q, textual queue manager

Tuesday, April 21st, 2009

I wanted a quick, command-line way to handle a list of to-do items and to show me the next item that I want to work on, so I hacked together a quick little program that manages queues for you.

It’s handy for those times where I think of something to do while working on a project, but I am already working on something. With this I can just append it to the end of my queue for that project and forget about it until later.

The code can be gotten from bitbucket, if you are interested.

You can use it like so:

Push items to the top of a queue:

$ q todo push 'do that one thing'
do that one thing
$ q todo push 'do that other thing'
do that other thing

See what’s at the top of a queue:

$ q todo
do that other thing

Pop items from the top of a queue

$ q todo pop
do that one thing

Append items to the end of a queue:

$ q todo append 'do some more things'
do that one thing

The help q prints out if you call it with no parameters:

$ ./q.py 
usage: ./q.py queuename [command [params]]
  commands:
    push       Add one or more lines to the queue
    all        Show all the items in the queue
    append     Add one or more lines to the end of the queue
    pop        Remove the top item from the specified queue and push it onto the .done queue for that file
    next       (default) Return the top item in the queue

Other commands I plan on adding include “rot” to swap the current top item with the next item (or with a numerical parameter to move it even further down).

I’ve been experimenting around with Mercurial, and since that was also written in Python, I was wondering what it would take to rewrite this as a hg plugin. It could be handy to use something like this to manage a todo file. It could add some output to show what todo items were marked as done for each commit and perhaps also do a pre-commit hook where it pulls any new lines starting with “TODO:” and adds them to the todo file.

Untiny that url!

Saturday, April 11th, 2009

There has been some talk about and arguments against and responses to issues about using rev=”cononical” for referencing shorter URLs instead of the automated use of TinyURL when posting to sites like Twitter.

I must say that I agree with Ben Ramsey (see “arguments agains” above) in suggesting we use rel=”alternate shorter” instead.

I also like the idea that Chris Shiflett had of using a HTTP header and a HEAD request to make it so you neither have to retrieve the entire requested page nor parse any HTML. I’d stick with Ben’s suggestion, however, and make the header something like “X-Alternate-Shorter:”, rather than “X-Rev-Canonical”. What’s the harm in calling it something that actually makes sense?

The idea of using HTTP HEAD requests to solve the problem inspired me to come up with a more immediate solution to one of the problems introduced by using url shortening services: uncertainty about where a URL leads.

This problem can be solved on the client side, which requires no work on the part of Twitter (meaning this is more likely to be put into use sooner).

Since most URL shortening services use an HTTP redirect to do their job, all it takes is a HEAD request to the tiny URL in question, and then a look at whatever “Location:” header is returned to see what the real URL is. In fact, you don’t even really need to do a HEAD request in most cases, since most URL shortening services don’t return any body, since they are just redirecting you anyway.

Read on for more information and implementations of an untinyurl function in various languages.

(more…)

MPD shuffle-rest

Saturday, November 8th, 2008

When I listen to music, I generally like to shuffle my playlist. Sometimes I’ll add new music to my playlist, and I want to shuffle that music into my playlist as well.

The problem is that I don’t want to listen to music that I’ve already listened to recently, but if I shuffle the playlist again, that will probably happen. I use MPD as my music player, and the client I’m using doesn’t have the functionality I want (shuffling just the songs I haven’t listened to yet).

Today I hacked together a quick solution to the issue using Python and python-mpd. You can check it out here: shuffle_range.py

(more…)

Python HTML Layout Engine Progress

Saturday, September 13th, 2008

I’ve made some progress on my Python web browser. It’s nothing earth-shattering at the moment, but it does take all the text from a web page and render it.

It currently treats each element (including the ones in the head, actually, I need to fix that) as an inline text element. It doesn’t quite do proper whitespace compression between elements, either, leading to some multiple spaces in certain places. What it does do nicely is the splitting on lines in reasonable places.

The part I’m working on next is the application of CSS rules to the document. I’m considering a couple of different possibilities for methods of walking the DOM tree and cascading the rules into each element. Either way it ends up boiling down to walking the tree and matching CSS selectors against each element, and applying rules for those elements which match, of course taking into account the specificity of the matching selector to make sure the proper rule ends up taking precidence.

I haven’t sat down and figured the bit O of them yet, but I think it’s going to be a memory vs. speed decision.

Once I have styles applying to elements, I’ll probably work on getting all the standard HTML4 CSS rules rendering properly. After that, it will be on the more thorough block element handling, replaced elements (images, form elements), and probably psuedo-classes and psuedo-selectors.

After that, I dunno… Acid2?

Anyway, that’s me getting ahead of myself. I mean, it doesn’t even render block elements yet (since it has no way of setting an elment to be a block element, due to the whole no CSS being applied yet thing).

If you are curious, you can check it out from my public git repo.

I warn you, the code is not really commented too much (except for in the layout section, there’s a whole outline of how that’s all supposed to work in there). If you want to see the magic of how it renders now, go ahead and run getgoogle.py (which, ironically, doesn’t even get Google at this point, since Google has some JavaScript which it wants to render as text…). You’ll need pygame installed to run it.

I’ll save you some time, though. It looks like this:

But hopefully not for long :)

Storing Hierarchical Data in CouchDB

Friday, July 4th, 2008

Much to my surprise, my last post generated more traffic in a single day than my blog has ever gotten in a single month. Apparently people are quite interested in making web applications with Python. I’ve started on part two, but since so many people showed interest I want to spend more time on it than I spent on the last one. So instead, you get this post.

So I’ve been fiddling around with CouchDB lately. Since it’s common to store tree-based data, and it’s kind of a pain to do so in your standard relational DB, I thought it would be a good exercise to see how hard it is to store hierarchical data in CouchDB.

Turns out it’s pretty easy.

(more…)

Building a Python Web Application, Part 1

Thursday, June 26th, 2008
Edit: I’ve cleaned up the longer example, using Python’s string.Template module for the templates. I’ve also set up a git repo for the source that will go along with posts to this series: Python Webapp Gitweb

Recently, I’ve been interested in writing web applications in Python, and one of the fun things that I discovered was the Python Web Server Gateway Interface, which is a standard interface for Python web servers, web applications, and something called middleware which can sit between the two.

One of the coolest things about WSGI is the fact that you now don’t have to decide on a specific web server before you start coding. In fact, the Python wsgiref module comes with a built-in simple web server which allows you to start coding up your web application with nothing but a bare install of Python 2.5 (or higher, of course)!

There are plenty of overviews of WSGI out there, so I won’t bother creating yet another in-depth explanation. What I will do, though, is show you how easy it is to get started.

Your basic “Hello, World!” application can be accomplished, server and all, with as little as the following:

(more…)

Dead code in Python-generated bytecode

Tuesday, April 22nd, 2008

So I’ve made a couple of changes to Papaya (yeah, it’s called Papaya now):

  • As suggested by Phillip J. Eby, rather than generating the bytecode myself, I’m now using BytecodeAssembler, which has shortened and simplified my code a bit (though honestly not as much as I originally thought it would). I had already considered doing this before I wrote it all myself, but I wanted to get the educational benefit of doing it all from scratch.

  • I’ve changed the syntax for function definitions to match that of Python’s (minus the closing ‘:’), which also means that I’ve added support for *args and **kwargs parameters. Also, since I’m using BytecodeAssembler, you get the automatic parameter unpacking described here when using nested positional arguments. Of course, this will currently get duplicated if you decompile and then recompile code. I haven’t decided what to do about this yet.

  • You no longer need to specify a label for any of the SETUP_* instructions, since BytecodeAssembler handles this for you as well.

  • I added a setup.py file, which uses ez_setup and can build a .egg file and other fancy things. I will add this project into pypi as soon as I resolve the issue I’m about to talk about below.

  • You no longer need to specify the stack size of a given block of code, it will be calculated for you by BytecodeAssembler.

So, due to my use of BytecodeAssembler, I get free stack size calculations, but I get another feature which is somewhat annoying: dead-code prevention.

Why is this annoying? Because the Python compiler generates dead code all the time.

What this means is, if you decompile any non-trivial (and some quite-trivial) .pyc files created by Python, and then try to recompile then, then it will fail with an “AssertionError: Unknown stack size at this location” message.

For example, take the following, very simple .py file:

while True:
	if True:
		continue
	break

This is disassembled into the following:

    SETUP_LOOP 
  label0:
    LOAD_NAME True
    JUMP_IF_FALSE label3
    POP_TOP 
    LOAD_NAME True
    JUMP_IF_FALSE label1
    POP_TOP 
    JUMP_ABSOLUTE label0
    JUMP_FORWARD label2
  label1:
    POP_TOP 
  label2:
    BREAK_LOOP 
    JUMP_ABSOLUTE label0
  label3:
    POP_TOP 
    POP_BLOCK 
    LOAD_CONST None
    RETURN_VALUE

Note the double JUMP. This is generated any time you have a continue statement, despite the fact that the second jump cannot ever be run. Also unecessary is the JUMPABSOLUTE after BREAKLOOP.

Both of these cause an error in BytecodeAssembler because it has no context from which to determine the stack size at that point. Of course, that doesn’t really matter since the code will never be run.

I’m currently stumped as to the best way to solve this issue, and I’m tired and don’t want to think about it any more. :(