A prefix notation programming language

Posted Sun, Nov 16, 2008 in:

Prefix notation?

Have you ever dreamed of a language which uses strictly prefix (a.k.a. polish, Łukasiewicz) notation?

No? Well, I can’t say I’m surprised. Lisp is often called a prefix notation language, but I’ll let you in on a secret, it’s not purely prefix notation. It uses another notation you’ve probably never heard of: outfix notation.

Outfix notation?

I’d say I made outfix notation up, but I found a reference to it on abstractmath.org, so I at least have something to back this claim up with. Basically, the parentheses are a function which says, “put these items into a list.”

Of course, Lisp uses lists for everything, so you can hardly call it a prefix notation language any more.

Real prefix notation

Now, how about making a real prefix notation language? A real prefix notation language needs no parentheses because it knows how many arguments each function takes, so it can simply pull in the next two expressions following the function name.

A real prefix notation language is a piece of cake to implement, as long as every function has a fixed arity and that arity is known at compile time. Of course, then how do we represent things such as lists with varying amounts of items. How do we pass a variable number of arguments to a function?

The same way Lisp does, we use a list.

Using lists without list syntax

Of course, we have to encode our list a bit differently. For example, take the following simple Lisp snippet:

    (setq numbers '(1 2 3))

The set function takes a fixed number of arguments, so that’s not a problem. The list at the end, how do we do that? Well, if we assume we have standard lisp functions, we do this:

    setq numbers cons 1 cons 2 cons 3 nil

Now that wouldn’t look half bad if it weren’t for the conses of doom at the end there. So what can we do? Well, Lisp has these handy things called macros. Compile-time functions, basically.

Compile time functions

Of course, since we don’t know ahead of time how many items are going to be in the list. Lisp hands macros a list of the arguments to the macro based on what was in the parentheses with the macro name, but once again, we don’t have (or really want, honestly. I’m trying to do something new here!) the parentheses, so this is where we must start deviating from Lisp a bit.

    setq numbers list 1 2 3 .

What’s happening here, is that list is a compile-time function which simply requests items from the reader (the thing that takes characters and interprets them into symbols, numbers, and strings) until it gets back the ‘.’ symbol. If it’s anything else, it just has to tell the compiler to compile it as it would a normal expression. This means that we can call functions as we please. The difference between this an a “normal” outfix operator is that it is controlled by the code itself, rather than built into the compiler. The idea is that you could define your own compile-time functions to extend or change the “syntax” as you see fit.

For example, if you think the above list function is a bit too wordy:

    def-compile ( : 
      list-while /= setq token reader-next
                       ')
        compile token

Assuming “list-while” is a function which will build a list with the value of each iteration, this code does the following:

Define a compile-time function bound to the symbol “(” which does the following:
- Read the next token
- If the token is not “)”, have the compiler compile the token as usual, returning the result
- If the token is “)”, then we are done.

With this new compile time function, we can write nested lists as easily as we can in lisp:

    setq nested-numbers ( 1 2 ( 3 4 5 ( 6 7 ) ( 8 ) ) )

Well, that’s almost as easy as lisp, there’s one problem: those spaces are mandatory. Since “(” and “)” are just plain old symbols, the whole thing would be read as one symbol if we took the spaces out. This can be solved by modifying the reader to recognize “(” and “)” as complete symbols.

Reprogramming the reader

Without going into too many specifics of how the reader is implemented, making it possible to leave out the spaces around parentheses would be something like the following:

    push-reader (
      expr "(" ( add-char output-symbol 
        push-reader (
          expr ")" ( add-char output-symbol pop-reader )
          symbol ")" ( output-symbol ) ) )

This makes the reader treat a single “(” as a symbol when it’s expecting an expression, and also pushes a new reader state including “)” onto the reader stack. This is popped off when a “)” is found. This means that “)” will stop being special once the one matching the first “(” is reached. This shows a bit of the power of the reader implementation, allowing the syntax to be extended for just a portion of the compilation.

For example, something like the following wouldn’t be too hard to implement using this system:

    setq json-snippet json {
      "a": [1, 2, 3], 
      "b": ["x", "y", "z"], 
      "foo": "bar"
     }

It would simply have to modify the reader to recognize “{”, “}”, “[“, “]”, “,”, and “:” as separate symbols with or without whitespace, and add functions for “{” and “[” which also take the commas and colons into account.

No longer lisp, but lisp-like

At this point what we’ve got probably can’t be called “a Lisp” anymore.

There are some things that this language will need that Lisp doesn’t. One thing is that it will need to keep track of functions as they are defined in the code, noting how many arguments they take so it can handle them being called later on. When a function is passed as an argument to another function, it will need to be called using something along the lines of funcall or be given a manual arity hint if you want to call it like other functions.

The “everything is a list” feature of Lisp is kind of lost, though the internal representation could end up being basically the same.

This language doesn’t exist … yet

This is something which is more concept in my brain and on paper at the moment. I’ve done some prototype implementations of the reader, some prototype implementations of the prefix notation function calling, and some prototype compile-time functions (implemented in the host language, however).

What’s left now is to put the whole thing together into a system in which every part is well defined and accessible from within the language itself.

I’m also thinking that I’d like to build this thing from the ground up with an underlying object-oriented system. Not class-oriented, mind you, just objects. Something prototype-based I’m thinking, like JavaScript, Io, Pepsi, and so on. In the end a user-mutable object-functional language is what I think I’ll end up with.

Any comments, suggestions, ridicules, etc… would be well appreciated. Also, if anyone has seen anything like this already in existence from which I could draw inspiration and ideas, please let me know. I’ve already looked at Cola/Coke/Pepsi and gotten some good ideas from there.

Probably Programming