Posts Tagged ‘Python’

Live List of the Most Popular Twitter Clients

Friday, August 21st, 2009

I just put up a live list of the most popular Twitter clients. The contents of that page are updated every 60 seconds (the length of time twitter caches their public timeline).

For some reason I woke up at 7 this morning, unable to get back to sleep, and I was randomly wondering how many Twitter clients were out there, and which were the most popular. So I decided to find out.

After a bit of hacking around, I came up with the following Python script:

from urllib import urlopen
from contextlib import closing
import pickle
import json
 
try:
    clientlist = pickle.load(open('clientlist.pickle'))
except:
    clientlist = {}
 
with closing(urlopen('http://twitter.com/statuses/public_timeline.json')) as f:
    json_str = f.read()
 
tweets = json.loads(json_str)
 
for tweet in tweets:
    source = tweet['source']
    clientlist[source] = clientlist.get(source, 0) + 1
 
with open('clientlist.pickle', 'w') as f:
    pickle.dump(clientlist, f)
 
total = 0
for count in clientlist.itervalues():
    total += count
 
with open('clientlist.html', 'w') as f:
    f.write('<html><head><title>Twitter Client list</title></head><body>')
    f.write(''.join(['<h1>Twitter Client Popularity</h1>',
                     '<div style="float: left; margin-right: 20px;">Out of <em>', str(total), 
                     '</em> Twitter messages, the following were the clients used.']))
    f.write('<table><thead><tr><th>Client</th><th>Tweet Count</th><th>% of total</th></thead><tbody>')
 
    for (link, count) in sorted(clientlist.iteritems(), key=lambda item: item[1], reverse=True):
        f.write(''.join(['<tr><td>', link, '</td><td>', str(count), '</td><td>', 
                         str(round(100.0 * count / total, 2)), '</td>']))
 
    f.write('</tbody></table></div>')
 
    f.write('</body></html>')

The script writes out a pickle file with the list of counts for each client, so multiple runs will generate a cumulative result (don’t run it more than once every 60 seconds, or you will add the same results in more than once).

It then writes out an .html file with a nice sorted list of the most common Twitter clients (well, all the twitter clients it’s seen, sorted by most common first).

I’ve set up a cron job on this server to update the file linked to above every minute, so as time goes on the list will become a more and more accurate representation of what people are using to tweet.

Untiny that url!

Saturday, April 11th, 2009

There has been some talk about and arguments against and responses to issues about using rev=”cononical” for referencing shorter URLs instead of the automated use of TinyURL when posting to sites like Twitter.

I must say that I agree with Ben Ramsey (see “arguments agains” above) in suggesting we use rel=”alternate shorter” instead.

I also like the idea that Chris Shiflett had of using a HTTP header and a HEAD request to make it so you neither have to retrieve the entire requested page nor parse any HTML. I’d stick with Ben’s suggestion, however, and make the header something like “X-Alternate-Shorter:”, rather than “X-Rev-Canonical”. What’s the harm in calling it something that actually makes sense?

The idea of using HTTP HEAD requests to solve the problem inspired me to come up with a more immediate solution to one of the problems introduced by using url shortening services: uncertainty about where a URL leads.

This problem can be solved on the client side, which requires no work on the part of Twitter (meaning this is more likely to be put into use sooner).

Since most URL shortening services use an HTTP redirect to do their job, all it takes is a HEAD request to the tiny URL in question, and then a look at whatever “Location:” header is returned to see what the real URL is. In fact, you don’t even really need to do a HEAD request in most cases, since most URL shortening services don’t return any body, since they are just redirecting you anyway.

Read on for more information and implementations of an untinyurl function in various languages.

(more…)

MPD shuffle-rest

Saturday, November 8th, 2008

When I listen to music, I generally like to shuffle my playlist. Sometimes I’ll add new music to my playlist, and I want to shuffle that music into my playlist as well.

The problem is that I don’t want to listen to music that I’ve already listened to recently, but if I shuffle the playlist again, that will probably happen. I use MPD as my music player, and the client I’m using doesn’t have the functionality I want (shuffling just the songs I haven’t listened to yet).

Today I hacked together a quick solution to the issue using Python and python-mpd. You can check it out here: shuffle_range.py

(more…)

Python HTML Layout Engine Progress

Saturday, September 13th, 2008

I’ve made some progress on my Python web browser. It’s nothing earth-shattering at the moment, but it does take all the text from a web page and render it.

It currently treats each element (including the ones in the head, actually, I need to fix that) as an inline text element. It doesn’t quite do proper whitespace compression between elements, either, leading to some multiple spaces in certain places. What it does do nicely is the splitting on lines in reasonable places.

The part I’m working on next is the application of CSS rules to the document. I’m considering a couple of different possibilities for methods of walking the DOM tree and cascading the rules into each element. Either way it ends up boiling down to walking the tree and matching CSS selectors against each element, and applying rules for those elements which match, of course taking into account the specificity of the matching selector to make sure the proper rule ends up taking precidence.

I haven’t sat down and figured the bit O of them yet, but I think it’s going to be a memory vs. speed decision.

Once I have styles applying to elements, I’ll probably work on getting all the standard HTML4 CSS rules rendering properly. After that, it will be on the more thorough block element handling, replaced elements (images, form elements), and probably psuedo-classes and psuedo-selectors.

After that, I dunno… Acid2?

Anyway, that’s me getting ahead of myself. I mean, it doesn’t even render block elements yet (since it has no way of setting an elment to be a block element, due to the whole no CSS being applied yet thing).

If you are curious, you can check it out from my public git repo.

I warn you, the code is not really commented too much (except for in the layout section, there’s a whole outline of how that’s all supposed to work in there). If you want to see the magic of how it renders now, go ahead and run getgoogle.py (which, ironically, doesn’t even get Google at this point, since Google has some JavaScript which it wants to render as text…). You’ll need pygame installed to run it.

I’ll save you some time, though. It looks like this:

But hopefully not for long :)

Storing Hierarchical Data in CouchDB

Friday, July 4th, 2008

Much to my surprise, my last post generated more traffic in a single day than my blog has ever gotten in a single month. Apparently people are quite interested in making web applications with Python. I’ve started on part two, but since so many people showed interest I want to spend more time on it than I spent on the last one. So instead, you get this post.

So I’ve been fiddling around with CouchDB lately. Since it’s common to store tree-based data, and it’s kind of a pain to do so in your standard relational DB, I thought it would be a good exercise to see how hard it is to store hierarchical data in CouchDB.

Turns out it’s pretty easy.

(more…)

Building a Python Web Application, Part 1

Thursday, June 26th, 2008
Edit: I’ve cleaned up the longer example, using Python’s string.Template module for the templates. I’ve also set up a git repo for the source that will go along with posts to this series: Python Webapp Gitweb

Recently, I’ve been interested in writing web applications in Python, and one of the fun things that I discovered was the Python Web Server Gateway Interface, which is a standard interface for Python web servers, web applications, and something called middleware which can sit between the two.

One of the coolest things about WSGI is the fact that you now don’t have to decide on a specific web server before you start coding. In fact, the Python wsgiref module comes with a built-in simple web server which allows you to start coding up your web application with nothing but a bare install of Python 2.5 (or higher, of course)!

There are plenty of overviews of WSGI out there, so I won’t bother creating yet another in-depth explanation. What I will do, though, is show you how easy it is to get started.

Your basic “Hello, World!” application can be accomplished, server and all, with as little as the following:

(more…)

Dead code in Python-generated bytecode

Tuesday, April 22nd, 2008

So I’ve made a couple of changes to Papaya (yeah, it’s called Papaya now):

  • As suggested by Phillip J. Eby, rather than generating the bytecode myself, I’m now using BytecodeAssembler, which has shortened and simplified my code a bit (though honestly not as much as I originally thought it would). I had already considered doing this before I wrote it all myself, but I wanted to get the educational benefit of doing it all from scratch.

  • I’ve changed the syntax for function definitions to match that of Python’s (minus the closing ‘:’), which also means that I’ve added support for *args and **kwargs parameters. Also, since I’m using BytecodeAssembler, you get the automatic parameter unpacking described here when using nested positional arguments. Of course, this will currently get duplicated if you decompile and then recompile code. I haven’t decided what to do about this yet.

  • You no longer need to specify a label for any of the SETUP_* instructions, since BytecodeAssembler handles this for you as well.

  • I added a setup.py file, which uses ez_setup and can build a .egg file and other fancy things. I will add this project into pypi as soon as I resolve the issue I’m about to talk about below.

  • You no longer need to specify the stack size of a given block of code, it will be calculated for you by BytecodeAssembler.

So, due to my use of BytecodeAssembler, I get free stack size calculations, but I get another feature which is somewhat annoying: dead-code prevention.

Why is this annoying? Because the Python compiler generates dead code all the time.

What this means is, if you decompile any non-trivial (and some quite-trivial) .pyc files created by Python, and then try to recompile then, then it will fail with an “AssertionError: Unknown stack size at this location” message.

For example, take the following, very simple .py file:

while True:
	if True:
		continue
	break

This is disassembled into the following:

    SETUP_LOOP 
  label0:
    LOAD_NAME True
    JUMP_IF_FALSE label3
    POP_TOP 
    LOAD_NAME True
    JUMP_IF_FALSE label1
    POP_TOP 
    JUMP_ABSOLUTE label0
    JUMP_FORWARD label2
  label1:
    POP_TOP 
  label2:
    BREAK_LOOP 
    JUMP_ABSOLUTE label0
  label3:
    POP_TOP 
    POP_BLOCK 
    LOAD_CONST None
    RETURN_VALUE

Note the double JUMP. This is generated any time you have a continue statement, despite the fact that the second jump cannot ever be run. Also unecessary is the JUMPABSOLUTE after BREAKLOOP.

Both of these cause an error in BytecodeAssembler because it has no context from which to determine the stack size at that point. Of course, that doesn’t really matter since the code will never be run.

I’m currently stumped as to the best way to solve this issue, and I’m tired and don’t want to think about it any more. :(

PPyA: Python Assembler

Friday, April 18th, 2008

Over the last few of days I’ve hacked together a Python Assembler/Disassembler. I’ve called it PPya (pronounced like “papaya,” the fruit) Paul’s Python assembler. The ‘a’ is left lowercase because it looks better that way.

Each of those days I started to write up this blog post but then got distracted working on it some more

It’s at the point now where it is fairly usable, both as a learning tool and as a tool for writing Python modules in assembly if you feel so inclined.

If you want to check it out, the gitweb project page is here: http://git.paulbonser.com/?p=ppya.git;a=summary

or you can git clone it:

git clone git://git.paulbonser.com/git/ppya.git/

or, if you’re behind a firewall or something

git clone http://git.paulbonser.com/git/ppya.git/

PPya Overview

A .pya file consists of a series of bytecodes (well, strings representing them, anyway) followed by parameters for those instructions which take parameters. When assembled, these parameters are converted to indices into a tuple in a python Code object, one of conames, coconsts, covarnames, cocellvars, or co_freevars.

(more…)

Python IMPORT_NAME bytecode mystery

Monday, April 14th, 2008

I’ve been messing around with Python bytecode this weekend, which is why there was no Sunday post (as I’m writing this it’s very early Monday morning…).

There’s some fun stuff to wrap your head around when it comes to Python bytecode. The structure of the virtual machine, the order of bytes in bytecode arguments, the instructions which require magical numbers to be pushed onto the stack before being called.

I’ll go into more detail about what I’m doing mucking about with the Python internals tomorrow (later today, ugh! I need to go to bed!), but for now I’ll share one of the little mysteries that I’ve run into and managed to figure out.

IMPORT_NAME

IMPORT_NAME is the opcode you use to import another module. The description from the Python bytecode docs is this:

Imports the module conames[namei]. The module object is pushed onto the stack. The current namespace is not affected: for a proper import statement, a subsequent STOREFAST instruction modifies the namespace.

Basically, each code object has a tuple of names, namei is an index into that list of names, pointing to the name of the module you want to import. When this instruction is run you end up with the module on the stack, and you can then bind it to a name and access items within it.

Not mentioned here is the fact that you have to push two extra parameters onto the stack before calling IMPORT_NAME, otherwise the Python interpreter segfaults when it hits that instruction.

import sys

is compiled by the python compiler into the following (as output by dis.disassemble):

    0 LOAD_CONST               1 (-1)
   3 LOAD_CONST               0 (None)
   6 IMPORT_NAME              0 (sys)
   9 STORE_FAST               0 (sys)

I tried changing the -1 to a different number and I got “ValueError: Attempted relative import in non-package”

Looking into Python/ceval.c in the Python interpreter sourcecode, I see that these two extra parameters are passed in as the last two parameters to the builtin _import_ function, fromlist and level.

According to help(_import_) the -1 for level indicates it should try both absolute and relative imports, and the fromlist is empty because this isn’t a “from sys import …” statement.

from sys import argv, subversion, byteorder

compiles into

     0 LOAD_CONST               1 (-1)
    3 LOAD_CONST               2 (('argv', 'subversion', 'byteorder'))
    6 IMPORT_NAME              0 (sys)
    9 IMPORT_FROM              1 (argv)
   12 STORE_FAST               0 (argv)
   15 IMPORT_FROM              2 (subversion)
   18 STORE_FAST               1 (subversion)
   21 IMPORT_FROM              3 (byteorder)
   24 STORE_FAST               2 (byteorder)
   27 POP_TOP

It’s not immediately obvious why it needs to do the extra calls to IMPORTFROM, but the reason seems to be that _import__ doesn’t actually do anything with the fromlist argument.

Anyway, I need to sleep now.

Note to self: submit a Python bugtracker issue about the undocumented required parameters. done, issue 2631

Update:

I also posted this as a response to Thomas Lee’s response, but duplicated it here for easier finding.

After looking at the docs for _import_, it’s interesting to see that the values in fromlist aren’t actually used, but it is significant whether the fromlist is empty or non-empty.

When the name variable is of the form package.module, normally, the top-level package (the name up till the first dot) is returned, not the module named by name. However, when a non-empty fromlist argument is given, the module named by name is returned. This is done for compatibility with the bytecode generated for the different kinds of import statement; when using “import spam.ham.eggs”, the top-level package spam must be placed in the importing namespace, but when using “from spam.ham import eggs”, the spam.ham subpackage must be used to find the eggs variable.

I suppose the reason that fromlist isn’t just changed to a boolean is that it might be significant in a setup where _import_ is redefined for custom importing, like Thomas said.

Mini Stack Language in Python

Thursday, April 10th, 2008

Are you tired of the lack of challenge in programming? Do you pine for a syntax reminiscent of your old HP RPN calculator? Well then do I have the programming language for you!

Over the last few days I’ve been working on a little stack-based language in Python.

I’ve got lots of idea for programming languages, but a lot of them are pretty complex. I don’t really have any experience implementing programming languages, so I figured I’d start small and work my way up.

In this case, small means 135 lines of code (not counting the GPL notice) without many docstrings or comments. It’s small, so you should be able to figure it all out by the code (I hope you can, because I didn’t comment it that much).

Here’s an outline of some of the features of the language:

Programs are written in Reverse Polish Notation. Datum, made up of strings, numbers, or code blocks are pushed onto the stack as they are encountered. When a ‘word’ is encountered, it is executed, and may do stuff to the stack.

There is a consistent syntax, or rather, mostly a lack thereof. The only reserved characters are single and double quotes and curly brackets (‘{‘ and ‘}’), after whitespace. Quote characters can be used as parts of words, even, just not as the first character.

All other language functionality is handled through the calling of words. There is no distinction made between built-in words and user_defined ones.

Examples

The ever-popular Hello world program: [code lang="python"] 'Hello, World!' echonl [/code] Pretty straightforward. Push the string onto the stack and then print it out (with a newline). The echo and echonl commands both pop the top item from the stack and print it out.

Defining new words is as simple as pushing a name and block onto the stack and then calling the addword word. [code lang="python"] 'square' {dup *} addword [/code] This example calls dup, which pushes a copy of the top item on the stack. It then calls * which pops to top two items from the stack and pushes the result of their multiplication.

Some user input and if and else: [code lang="python"] 'story' { 'do you want to hear a story?' echonl readstring 'yes' = {'Once upon a time...TODO: write story' echonl} if {"Aww, you don't want to hear my story." echonl} else } add_word story [/code] Note that since it’s postfix notation that the words ‘if’ and ‘else’ appear after the block which will be run. Also, these are not keywords, they are just like any other built-in word. ‘if’ pushes a 1 to the stack after running its block, or a 0 if it doesn’t run its block. Then ‘else’ (once I actually write it) will run its block if the top value on the stack is a 0, otherwise it won’t run it’s block. I might also add an ‘elif’ function which will do the same as else while also pushing a result back onto the stack as well.

(I just realized that I didn’t actually implement the else function..I’ll fix that later.)

Check out the code for more details. Really, it’s not so hard to read..I think.

Work in progress

This language is somewhat usable at the moment, but it’s still in progress. I’ll probably change the names of all the words, and probably just start calling them functions rather than words, since that name is confusing (shame on you, FORTH, for inspiring me to call them that).

I’ll probably add in more words..err…functions to enable functional programming a-la Joy.

I don’t yet have a specific goal for this language, since I’m writing it for my own education. It might dawn on me that there is a perfect application for this language, which will then shape the future direction. If nothing else, it will be a fun toy for those who would like to create their own simple programming language, since it’s such a small piece of code at the moment.

I might also try to write a compiler to compile it down into Python bytecode, again as an exercise in personal education.

If you’re interested, the code can be gotten with git:

git clone http://paulbonser.com/code/stacklang.git

or the current version (0.0.1.1) is available here: stacklang.py

Got any great ideas of what I could do with this language? Using the code for something cool yourself? Got a patch for me? Let me know!