Practical geolocation example

Posted on August 1st, 2006 by phil and tagged .

I wrote a piece previously regarding some geolocation theory.

Now the theory is all well and good, but it can get a little bit messy. Different RIR's display whois information differently and you have to parse human addresses out.

The good people at webhosting.info have provided everyone (yeh, even you!) with a list of every single netblock and where they think (read: educated guess) an IP is located.

Simply enough it's called the Ip to Country database and you can always download the latest version of their database here.

The database itself is simply a zipped CSV file which (at the time of writing) is 3.3 Megabytes in size. It contains a list of netblocks (i.e. IP ranges) and country codes.

The netblocks are written with a start IP and an end IP. e.g.

"67.147.160.0","67.147.161.255"

That includes 67.147.160.1 - 67.147.160.255 and 67.147.161.1 to 67.147.161.255. If you've already looked at the database (cheating!) then you'll already have started shouting "no it's not!" which is because the IP addresses are not written in their standard form above.

The main reason for that is for searching purposes, especially within the context of an SQL database. All you really need to know is that it's more efficient to store the IP address as an integer than in the dotted quad notation, w.x.y.z. It makes searching quicker too.

If you read the handbook on ip-to-country.webhosting.info you'll get some PHP code to do this.

I wrote a small perl script that searches the CSV file (make sure you unzip it)


#!/usr/bin/perl -w

use strict;

#### CHANGE THESE
my ($ip_addr) = "72.14.221.99";
my ($ipcsv) = "ip-to-country.csv";

### shouldn't need to change anything below these
my ($iplong) = &ip2long($ip_addr);

my (@location) = &ipsearch($iplong);

if (!$location[0]) {
    print "Nothing found\n";
} else {
    print "Location: " . $location[0] . "\n";
}

sub ip2long($) {
    my ($ip) = shift;
    my (@octets) = split(/\./, $ip);

    my ($iplong) = ($octets[0] * 16777216)
            + ($octets[1] * 65536)
            + ($octets[2] * 256)
            + $octets[3];

    return $iplong;
}

sub ipsearch($) {
    my ($iplong) = shift;

    open(IPTOCOUNTRYCSV, $ipcsv) ||
            die "Could not opena '$ipcsv': $!";
    while () {
        my (@row) = split(/,/);
        $row[0] =~ s/"//g;
        $row[1] =~ s/"//g;

        if ($iplong >= $row[0] && $iplong <= $row[1]) {
            $row[2] =~ s/"//g;
            $row[3] =~ s/"//g;
            $row[4] =~ s/"//g;

            return ($row[2], $row[3], $row[4]);
        }
    }

    return (undef, undef, undef);
}

You should be able to copy and paste it into a perl script (e.g. findip.pl) and run it:

$ perl findip.pl
Location: US

The IP address that's in there is one of the main IPs for google.com.