Home arrow MySQL arrow Create a IP-Country Database Using PERL and MySQL
MYSQL

Create a IP-Country Database Using PERL and MySQL


By using of Perl, Stephen illustrates the tremendous Internet marketing advantage that is inextricably evident. It is as simple as collecting and customizing user interaction based on the country from which the IP address is based. Read the details ...

Author Info:
By: Stephen Pierzchala
Rating: 4 stars4 stars4 stars4 stars4 stars / 38
August 01, 2003

print this article
SEARCH DEVARTICLES

By using of Perl, Stephen illustrates the tremendous Internet marketing advantage that is inextricably evident. It is as simple as collecting and customizing user interaction based on the country from which the IP address is based. Read the details ...Targeting Web site content to the specific visitors who view the site is a very important marketing advantage. Being able to track incoming visitors by the country that they originate from is an additional item that can assist companies in ensuring that visitors are presented with relevant content. This may seem like a daunting task, but it can be achieved with a high degree of accuracy using publicly accessible data, and Open Source software.

IP to Country Mapping

The idea for IP to Country mapping is one that has started to appear more frequently on the Internet in recent months. Maxmind offers a commercial, subscription-based solution which provides country-level mappings for free, and has for-pay national, regional and metropolitan databases.(1) IP-to-Country offers an Open Source database for country-level IP mapping, and it was this system that was the original inspiration for the system I have developed.(2)

All of these systems do warn users that they are not 100% accurate. The accuracy of a mapping can be affected if visitors are on large corporate or ISP networks where traffic is routed out a small number of public access points, regardless of its point of origin.

Making IP Addresses Searchable

The first issue that needs to be addressed is how to determine if an IP Address is in one of the ranges that is defined as originating from a distinct country. The simplest way to range-match IP Addresses is to abandon the dotted-quad notation we are all familiar with, and convert the IP Address to an IP Number.

All IP Addresses can be converted into decimal numbers that fall into a known range between 0 and (2^32)-1 (4294967295). In reality, the range is even smaller than that, as public IP Addresses fall between 0.0.0.0 (IP Number: 0) and 223.255.255.255 (IP Number: 3758096383).

A quick search of the Web gave me a simple PERL method for converting IP Addresses to IP Numbers.

IP Address -> IP Number

my (@octets,$octet,$ip_number,$number_convert,$ip_address);
$ip_address = $ARGV[0];
chomp ($ip_address);
@octets = split(/\./, $ip_address);
$ip_number = 0;
foreach $octet (@octets) {
$ip_number <<= 8;
$ip_number |= $octet;
}
print "\nThe IP Address $ip_address converts to
the IP Number $ip_number.\n"; IP Number -> IP Address my (@octets,$i,$ip_number,$ip_number_display,
$number_convert,$ip_address); $ip_number_display = $ip_number = $ARGV[0]; chomp($ip_number); for($i = 3; $i >= 0; $i--) { $octets[$i] = ($ip_number & 0xFF); $ip_number >>= 8; } $ip_address = join('.', @octets); print "\nThe IP Number $ip_number_display
converts to the IP Address $ip_address.\n";

If you are using PHP in your applications, this conversion process is made even easier by native function calls.

Convert IP Address to IP Number
$ip_number = ip2long($ip_address);
Convert IP Number to IP Address
$ip_address = long2ip($ip_number);
IP Address Location Data

Now that we have settled on a format for the IP data to be used in the database, we now have to find IP data that allows us to map IP Addresses to countries. This is easier than it sounds, as this data is centrally held by the 4 regional IP Registries ARIN, RIPE, APNIC and LACNIC. After poking around in the depths of their Web sites, I found that they actually provide text formatted versions of the allocated and assigned IP ranges that they are responsible for. All of the registries use the same format, which makes parsing the these files a simple process.

I chose to use the PERL module WWW::CURL to retrieve the files. You could re-write the application to use LWP or some other method on systems where cURL is not supported, as it is a simple file download over FTP.

cURL Retrieval Section
my ($ip_date, $file_time, $file, $apnic, $ripe, $arin,
$lacnic, @registries, $url, $curl); $ip_date = (strftime("%Y%m", localtime(time)))."01"; $file_time = (strftime("%Y%m%d-%H%M", localtime(time))); $file = $file_time."_ip_data.txt"; $apnic = "ftp://ftp.apnic.net/pub/stats/apnic/apnic-latest"; $ripe = "ftp://ftp.apnic.net/pub/stats/ripe-ncc/_ripencc.latest"; $arin = "ftp://ftp.apnic.net/pub/stats/arin/arin.".$ip_date; $lacnic = "ftp://lacnic.net/pub/stats/lacnic/lacnic.".$ip_date; open (BODY,">>$file"); @registries=($apnic,$ripe,$arin,$lacnic); foreach $url(@registries){ # Init the curl session $curl= WWW::Curl::easy->new() or die "curl init failed!\n"; $curl->setopt(CURLOPT_URL, $url); $curl->setopt(CURLOPT_FTP_USE_EPSV, '0'); $curl->setopt(CURLOPT_FILE, *BODY); if ($curl->perform() != 0) { print "Failed ::".$curl->errbuf."\n"; } } close BODY;

I retrieve 3 of the 4 files from APNIC, as I have encountered issues with access to some of the other registries FTP servers in the past. I have chosen to update this data 2 times a day, which may at first appear excessive. But on busy days, I have seen upwards of 40-50 new rows added to the database between the first and second retrieval.

Some may ask why I chose to write the downloaded files to a file rather than immediately inserting them into the database. Using this two-step process gives me the ability to manually rollback to an older database if there is a problem retrieving one of the registry files.

The data retrieved from the registries is in the following format.

Registry Raw Data Format
<snip>
apnic|CN|ipv4|202.127.4.0|256|19950610|assigned
apnic|BN|ipv4|202.160.0.0|2048|19950610|allocated
apnic|NP|asn|4613|1|19950611|allocated
apnic|LK|ipv4|203.143.0.0|1024|19950612|allocated
apnic|MO|asn|4609|1|19950615|allocated
apnic|KR|asn|4670|1|19950616|allocated
apnic|SB|ipv4|202.63.254.0|512|19950618|assigned
apnic|JP|ipv4|202.232.0.0|262144|19950618|allocated
apnic|SG|ipv4|203.127.192.0|8192|19950618|allocated
apnic|PK|asn|4615|1|19950629|allocated
apnic|HK|asn|4614|1|19950704|allocated
</snip>

The fields are all "|" (pipe-character) separated, and are described below.

COLUMNVALUES
---------------------------------------------------------------------
REGISTRY: apnic,arin,ripencc,lacnic,iana
COUNTRY_CODE: One of 240 unique 2-character country codes or "*"
ADDRESS_TYPE: asn,ipv4
ADDRESS:Either the starting IP Address or AS Number or "*"
NUMBER:Number of IPs in range or "1" if ADDRESS_TYPE is "asn"
DATE:Date IP range or AS Number was added to database or "*"
RANGE_TYPE: "allocated" -> borrowed; "assigned" -> owned

Storing the Data in MySQL

To store the data, I created a two-table MySQL database named "ip_registry", using the script below.

Database creation statement for 'ip_registry'
CREATE DATABASE ip_registry;
Table structure for table 'country_code'
CREATE TABLE ip_registry.country_code (
 code char(2) default NULL, country varchar(50) default NULL, UNIQUE KEY code (code));Table structure for table 'ip_map'
CREATE TABLE ip_registry.ip_map (
 code char(2) default NULL, registry char(10) default NULL, ip_from double default NULL, ip_to double default NULL, UNIQUE KEY registry (registry,ip_from,ip_to));

My database design differs from that used by the team at IP-to-Country. They have chosen to go with a single table format, using columns for the IP Number Start, IP Number End, a country code and a country name. The country codes that IP-to-Country are using are not standardized on a known format, which concerned me. I decided that I needed to settle on a standardized country code configuration, and move that to a separate table, which reduces the size of the large "ip_map" data table. As of 24 July 2003, the "ip_map" data table for my system runs to 53,221 rows.

The recognized standard for country codes is ISO 3166. In this standard, each nation is assigned a unique, two-character code. The ONLY exception I found to this rule is that, for historical reasons, the IP registries have entries for the United Kingdom listed with two country codes (GB and UK). I could have corrected this in the Perl script by standardizing on a single country code, but I preferred the solution of adding another row to the "country_code" table.

From the raw Registry data, I determined that only four of the fields useful for the project that I was working on: REGISTRY, COUNTRY_CODE, ADDRESS, and NUMBER. I then wrote PERL code to read the raw IP Registry data from the data file I created previously, convert the starting IP address to a number, use this starting IP Number that to generate the end IP Number, and then insert the rows into a database.

PERL: IP Number conversion and database insert
$dbh = DBI->connect("DBI:mysql:host=[hostname];
database=ip_registry","[user]",
"[password]",{PrintError=>0,RaiseError=>1}); $sth = $dbh->do("DELETE from ip_map"); print "\n\nData from ip_map table dropped\n\n"; $sth = $dbh->prepare("INSERT into ip_map values (?,?,?,?)"); $count = 0; open (PROCESS, "<$file"); while ($line = <PROCESS>) { chomp ($line); if (($line =~ m/\|ipv4\|/) and ($line !~ m/\*/)) { ($registrar,$country_code,$item_type,$start_ip,$num_ip,$entry_date,
$registry_type) = split(/\|/, $line); @octets_start = split(/\./, $start_ip); $long_start = 0; foreach $octet_start (@octets_start) { $long_start <<= 8; $long_start |= $octet_start; } $long_end = $long_start + ($num_ip-1); $count += $sth->execute($country_code,$registrar,$long_start,
$long_end); } } close PROCESS;

Why is the value of "$long_end" defined by "$long_start + ($num_ip-1)"? Well, it's because you must count the starting value as one of the items in the set -- i.e. counting using ordinal numbers.

START_IP: 12.236.236.0
END_IP:12.236.236.255
NUMBER_IP: 256
IP Number Calculations
WRONG! 216853760 = 216853504 + 256       -> END_IP = 12.236.237.0
RIGHT! 216853759 = 216853504 + (256-1) -> END_IP = 12.236.236.255

The "DELETE" statement in the script has the affect of dropping the table and re-creating it using the column names and types defined in the initial create statement. It is easier to rebuild the data table each time new data is inserted to ensure that duplicates and overlaps do not enter into the database.

I am considering adding a sanity-check that will stop the entire database insertion process if the number of lines in the data file is less than some predetermined value. This would prevent the creation of a truncated database if one of the registries does not update their data files or the script is unable to retrieve the data files.

Querying the Database

Now that the database is constructed, we can start to run queries against it.

mysql> select ip.code,ip.registry, ip.ip_from, 
ip.ip_to, co.country -> from ip_map ip, country_code co -> where (ip.code = 'IS') and (ip.code = co.code);
+-------+------------+-----------------+----------------+-------------+ | code | registry | ip_from | ip_to | country | +-------+------------+-----------------+----------------+-------------+ | IS | ripencc | 1049722880 | 1049731071 | ICELAND | | IS | ripencc | 1359937536 | 1359970303 | ICELAND | | IS | ripencc | 1385447424 | 1385455615 | ICELAND | | IS | ripencc | 3238264832 | 3238330367 | ICELAND | | IS | ripencc | 3261718528 | 3261726719 | ICELAND | | IS | ripencc | 3264217088 | 3264282623 | ICELAND | | IS | ripencc | 3556884480 | 3556892671 | ICELAND | | IS | ripencc | 3558785024 | 3558793215 | ICELAND | | IS | ripencc | 3565084672 | 3565092863 | ICELAND | | IS | ripencc | 3584524288 | 3584532479 | ICELAND | | IS | ripencc | 3585114112 | 3585122303 | ICELAND | | IS | ripencc | 3585433600 | 3585441791 | ICELAND | | IS | ripencc | 3586023424 | 3586031615 | ICELAND | | IS | ripencc | 3587538944 | 3587547135 | ICELAND | | IS | ripencc | 3587981312 | 3587997695 | ICELAND | | IS | ripencc | 3641278464 | 3641282559 | ICELAND | | IS | ripencc | 3642535936 | 3642540031 | ICELAND | | IS | ripencc | 3650592768 | 3650596863 | ICELAND | | IS | ripencc | 3650596864 | 3650600959 | ICELAND | | IS | arin | 2194669568 | 2194735103 | ICELAND | | IS | arin | 2644312064 | 2644377599 | ICELAND | | IS | arin | 2698117120 | 2698182655 | ICELAND | | IS | arin | 3230867968 | 3230868223 | ICELAND | +-------+------------+-----------------+-----------------+------------+ 23 rows in set (0.09 sec)

So the database structure is sound. Also, you can now see why it is important to build the file using all four registries. Even though Iceland should be covered by RIPE, older IP allocations and assignments were been handled by ARIN.

Having the registry information helps build in the flexibility to add a WHOIS functionality using this database, something that I have done on my site (GrabIP). This allows for further drilldowns on the data, beyond the scope of this article.

The main item that will be of interest to most Web site administrators is that they can now build dynamic pages using a data source which tracks their visitors' announced IP address to the country of origin with a high degree of accuracy. This is particulary useful if you are attempting to distribute users to geographically diverse mirror sites. You can also do fun things, such as displaying the flag of the country that the visitor is coming from, something I have implemented on my site.

Notes

(1) The free database from MaxMind has not been updated recently and is still using data from March 2003.

(2) The team at IP-to-Country have promised to update their database on a monthly basis.


DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

All MySQL Tutorials
More By Stephen Pierzchala


blog comments powered by Disqus
MYSQL ARTICLES

- MySQL and BLOBs
- Two Lessons in ASP and MySQL
- Lord Of The Strings Part 2
- Lord Of The Strings Part 1
- Importing Data into MySQL with Navicat
- Building a Sustainable Web Site
- Creating An Online Photo Album with PHP and ...
- Creating An Online Photo Album with PHP and ...
- PhpED 3.2 More Features Than You Can Poke ...
- Creating An Online Photo Album with PHP and ...
- Creating An Online Photo Album with PHP and ...
- Security and Sessions in PHP
- Setup Your Personal Reminder System Using PHP
- Create a IP-Country Database Using PERL and ...
- Developing a Dynamic Document Search in PHP ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials