Untiny that url!

Posted Sat, Apr 11, 2009 in:

There has been some talk about and arguments against and responses to issues about using rev=“cononical” for referencing shorter URLs instead of the automated use of TinyURL when posting to sites like Twitter.

I must say that I agree with Ben Ramsey (see “arguments agains” above) in suggesting we use rel=“alternate shorter” instead.

I also like the idea that Chris Shiflett had of using a HTTP header and a HEAD request to make it so you neither have to retrieve the entire requested page nor parse any HTML. I’d stick with Ben’s suggestion, however, and make the header something like “X-Alternate-Shorter:“, rather than “X-Rev-Canonical”. What’s the harm in calling it something that actually makes sense?

The idea of using HTTP HEAD requests to solve the problem inspired me to come up with a more immediate solution to one of the problems introduced by using url shortening services: uncertainty about where a URL leads.

This problem can be solved on the client side, which requires no work on the part of Twitter (meaning this is more likely to be put into use sooner).

Since most URL shortening services use an HTTP redirect to do their job, all it takes is a HEAD request to the tiny URL in question, and then a look at whatever “Location:” header is returned to see what the real URL is. In fact, you don’t even really need to do a HEAD request in most cases, since most URL shortening services don’t return any body, since they are just redirecting you anyway.

Read on for more information and implementations of an untinyurl function in various languages.

There’s actually already a site online that offers the service of un-shortening URLs for you at UnTinyURL.com, but I wouldn’t suggest using that in any sort of automated system, and it’s of limited usefulness since you don’t really want to have to go to this site just to see what site you’re about to go to. Most people will just click a link, even if it means they might get RickRolled.

For those comfortable with the commandline, a simple curl call can give you the same basic info:

    curl -I http://tinyurl.com/c8f5bz
    HTTP/1.1 301 Moved Permanently
    X-Powered-By: PHP/5.2.9
    Location: http://probablyprogramming.com/2009/04/11/untiny-that-url/
    Content-type: text/html
    Date: Sun, 12 Apr 2009 01:26:08 GMT
    Server: TinyURL/1.6

Toss in a grep and an awk, and you get your URL in a single line, perfect if you’re handing shortened URLs in a shell script for some reason:

    $ curl -s -I http://tinyurl.com/c8f5bz | grep Location | awk '{print $2}'
    http://probablyprogramming.com/2009/04/11/untiny-that-url/

Here’s untinyurl in Python:

    import httplib
    import urlparse
    
    def untinyurl(tinyurl):
        url = urlparse.urlsplit(tinyurl)
        req = urlparse.urlunsplit(('', '', url.path, url.query, url.fragment))
        con = httplib.HTTPConnection(url.netloc)
        try:
            con.request('HEAD', req)
        except:
            return None
        response = con.getresponse()
        return response.getheader('Location', None)

And here’s a version in PHP. It’s a bit longer and uglier than the Python version because I’m using the low-level fsockopen function to do my HTTP request rather than using cUrl or the HTTP extension. The reason I did this is because every PHP install will have fsockopen, whereas not every install will have cUrl or the HTTP extension.

    <?php
    
    function untinyurl($tinyurl) 
    {
        $url = parse_url($tinyurl);
        $host = $url['host'];
        $port = isset($url['port']) ? $url['port'] : 80;
        $query = isset($url['query']) ? '?' . $url['query'] : '';
        $fragment = isset($url['fragment']) ? '#' . $url['fragment'] : '';
    
        $sock = @fsockopen($host, $port);
        if (!$sock) return $tinyurl;
        
        $url = $url['path'] . $query . $fragment;
        $request = "HEAD {$url} HTTP/1.0\r\nHost: {$host}\r\nConnection: Close\r\n\r\n";
    
        fwrite($sock, $request);
        $response = '';
        while (!feof($sock)) {
            $response .= fgets($sock, 128);
        }
        $lines = explode("\r\n", $response);
        foreach ($lines as $line) {
            if (strpos(strtolower($line), 'location:') === 0) {
                list(, $location) = explode(':', $line, 2);
                return ltrim($location);
            }
        }
        return $tinyurl;
    }

I’m not too familiar with Ruby, but after poking around for a little bit, I came up with this Ruby version. Holy crap, Ruby, that was easy and short!

    require 'net/http'
    require 'uri'
    def untinyurl(tinyurl)
      Net::HTTP.get_response(URI.parse(tinyurl))['location'] or tinyurl
    rescue
      tinyurl
    end

And one more, and Erlang implementation (that <SEMI> is supposed to be a semicolon, but something is wrong with the syntax highlighter Erlang plugin). Be sure you call “inets:start()” before calling this function.

    -module(untinyurl).
    -export([untinyurl/1]).
    untinyurl(TinyUrl) ->
        case http:request(head, {TinyUrl, []}, [{autoredirect, false}], []) of
            {ok, {_Status, Headers, _Body{% templatetag closevariable %} -> proplists:get_value("location", Headers, TinyUrl);
            _ -> TinyUrl
        end.

Interesting how the Erlang and Ruby implementations look pretty similar.

I’ve made the source code available at GitHub. If you would like to contribute an untinyurl implementation in another language or have a bug-fix or suggestion for an improvement of one of the implementations I have so far, either email me, send me a pull request on GitHub, or post a comment here.

Probably Programming

Untiny that url!