Python HTML Layout Engine Progress

I’ve made some progress on my Python web browser. It’s nothing earth-shattering at the moment, but it does take all the text from a web page and render it.

It currently treats each element (including the ones in the head, actually, I need to fix that) as an inline text element. It doesn’t quite do proper whitespace compression between elements, either, leading to some multiple spaces in certain places. What it does do nicely is the splitting on lines in reasonable places.

The part I’m working on next is the application of CSS rules to the document. I’m considering a couple of different possibilities for methods of walking the DOM tree and cascading the rules into each element. Either way it ends up boiling down to walking the tree and matching CSS selectors against each element, and applying rules for those elements which match, of course taking into account the specificity of the matching selector to make sure the proper rule ends up taking precidence.

I haven’t sat down and figured the bit O of them yet, but I think it’s going to be a memory vs. speed decision.

Once I have styles applying to elements, I’ll probably work on getting all the standard HTML4 CSS rules rendering properly. After that, it will be on the more thorough block element handling, replaced elements (images, form elements), and probably psuedo-classes and psuedo-selectors.

After that, I dunno… Acid2?

Anyway, that’s me getting ahead of myself. I mean, it doesn’t even render block elements yet (since it has no way of setting an elment to be a block element, due to the whole no CSS being applied yet thing).

If you are curious, you can check it out from my public git repo.

I warn you, the code is not really commented too much (except for in the layout section, there’s a whole outline of how that’s all supposed to work in there). If you want to see the magic of how it renders now, go ahead and run getgoogle.py (which, ironically, doesn’t even get Google at this point, since Google has some JavaScript which it wants to render as text…). You’ll need pygame installed to run it.

I’ll save you some time, though. It looks like this:

But hopefully not for long :)

Tags: , , , , ,

7 Responses to “Python HTML Layout Engine Progress”

  1. Julian Says:

    Why don’t you use some open source rendering engine? Or you could make it swappable, your engine, other engines, full choice!

  2. pib Says:

    Indeed, why not? Why not just use an existing browser :P

    Now where’s the fun in that? Anybody can throw a user interface around an existing rendering engine and say they made a browser.

    I am, however, making it as modular as possible, so it should be easy for anyone to swap in and out components as they see fit.

  3. Lucractius Says:

    With the long ago demise of Grail. And a recent inclination towards building python tools. (MC, Xtree, ping, trace, etc) I stumbled across your efforts & i hope to be able to contribute towards them. On the topic of modularity… I would idealy like to replicate a text mode rendering engine (like links/lynx etc) and work up in complexity… But your clearly doing well from the graphical approach :)

  4. pib Says:

    @Lucractius: I’ve considered a text-mode rendering engine myself. It shouldn’t be too hard, since in text mode you could ignore much of the information from CSS. The way it’s set up now, you pass the renderer along to the call to the layout engine, and it queries that for information on font size.

    In text mode, you’d just make the renderer claim that the text was 1px wide for each character, and say that the size of the overall page was whatever the size of your console was. Then the layout code would wrap the lines and such for you.

    I’d be interested to see how things like floating boxes (not yet implemented) would work with this. Also, I think consoles are limited in the number of colors they support, so it’d also be interesting to see if CSS colors could be mapped to console colors or if that would work too horribly.

  5. Lucractius Says:

    One of the reasons id like to tackle text mode is that theres a myriad of things that the text mode / console web browsers just wholesale ignore. Ignoring the CSS would be to easy. Id like to see Console color support that at least attempts to aproximate the css’s inteded color. Id also like to try a novel approach for the render as an experiment. Treat the console as an array of text cells, and have the render attempt to approximate the best positions based on some rudimentary layering and basic rules. Something between NCurses & aaLib i suppose.

  6. lee Says:

    Hi pib: I wanna to find a way, that when I input the element such as div id, the module can return the pos and size. Maybe this program can help me! Thank you! But have you stoped the development?

  7. pib Says:

    I haven’t officially stopped development, but I haven’t worked on it for a while. I have plans to continue on it in the future, but I only have so much free time, and right now it’s being devoted to other things.

    It’s not far enough along that it would do what you want, I think.

Leave a Reply