By using of Perl, Stephen illustrates the tremendous Internet marketing advantage that is inextricably evident. It is as simple as collecting and customizing user interaction based on the country from which the IP address is based. Read the details ...
Targeting Web site content to the specific visitors who view the site is a very important marketing advantage. Being able to track incoming visitors by the country that they originate from is an additional item that can assist companies in ensuring that visitors are presented with relevant content. This may seem like a daunting task, but it can be achieved with a high degree of accuracy using publicly accessible data, and Open Source software.
IP to Country Mapping The idea for IP to Country mapping is one that has started to appear more frequently on the Internet in recent months. Maxmind offers a commercial, subscription-based solution which provides country-level mappings for free, and has for-pay national, regional and metropolitan databases.(1) IP-to-Country offers an Open Source database for country-level IP mapping, and it was this system that was the original inspiration for the system I have developed.(2)
All of these systems do warn users that they are not 100% accurate. The accuracy of a mapping can be affected if visitors are on large corporate or ISP networks where traffic is routed out a small number of public access points, regardless of its point of origin.
Making IP Addresses Searchable The first issue that needs to be addressed is how to determine if an IP Address is in one of the ranges that is defined as originating from a distinct country. The simplest way to range-match IP Addresses is to abandon the dotted-quad notation we are all familiar with, and convert the IP Address to an IP Number.
All IP Addresses can be converted into decimal numbers that fall into a known range between 0 and (2^32)-1 (4294967295). In reality, the range is even smaller than that, as public IP Addresses fall between 0.0.0.0 (IP Number: 0) and 223.255.255.255 (IP Number: 3758096383).
A quick search of the Web gave me a simple PERL method for converting IP Addresses to IP Numbers.
IP Address -> IP Number
my (@octets,$octet,$ip_number,$number_convert,$ip_address);
$ip_address = $ARGV[0];
chomp ($ip_address);
@octets = split(/\./, $ip_address);
$ip_number = 0;
foreach $octet (@octets) {
$ip_number <<= 8;
$ip_number |= $octet;
}
print "\nThe IP Address $ip_address converts to
the IP Number $ip_number.\n";
IP Number -> IP Address
my (@octets,$i,$ip_number,$ip_number_display,
$number_convert,$ip_address);
$ip_number_display = $ip_number = $ARGV[0];
chomp($ip_number);
for($i = 3; $i >= 0; $i--) {
$octets[$i] = ($ip_number & 0xFF);
$ip_number >>= 8;
}
$ip_address = join('.', @octets);
print "\nThe IP Number $ip_number_display
converts to the IP Address $ip_address.\n";
If you are using PHP in your applications, this conversion process is made even easier by native function calls.
Convert IP Address to IP Number
$ip_number = ip2long($ip_address);
Convert IP Number to IP Address
$ip_address = long2ip($ip_number);
IP Address Location Data Now that we have settled on a format for the IP data to be used in the database, we now have to find IP data that allows us to map IP Addresses to countries. This is easier than it sounds, as this data is centrally held by the 4 regional IP Registries — ARIN, RIPE, APNIC and LACNIC. After poking around in the depths of their Web sites, I found that they actually provide text formatted versions of the allocated and assigned IP ranges that they are responsible for. All of the registries use the same format, which makes parsing the these files a simple process.
I chose to use the PERL module WWW::CURL to retrieve the files. You could re-write the application to use LWP or some other method on systems where cURL is not supported, as it is a simple file download over FTP.
cURL Retrieval Section
my ($ip_date, $file_time, $file, $apnic, $ripe, $arin,
$lacnic, @registries, $url, $curl);
$ip_date = (strftime("%Y%m", localtime(time)))."01";
$file_time = (strftime("%Y%m%d-%H%M", localtime(time)));
$file = $file_time."_ip_data.txt";
$apnic = "ftp://ftp.apnic.net/pub/stats/apnic/apnic-latest";
$ripe = "ftp://ftp.apnic.net/pub/stats/ripe-ncc/_ripencc.latest";
$arin = "ftp://ftp.apnic.net/pub/stats/arin/arin.".$ip_date;
$lacnic = "ftp://lacnic.net/pub/stats/lacnic/lacnic.".$ip_date;
open (BODY,">>$file");
@registries=($apnic,$ripe,$arin,$lacnic);
foreach $url(@registries){
# Init the curl session
$curl= WWW::Curl::easy->new() or die "curl init failed!\n";
$curl->setopt(CURLOPT_URL, $url);
$curl->setopt(CURLOPT_FTP_USE_EPSV, '0');
$curl->setopt(CURLOPT_FILE, *BODY);
if ($curl->perform() != 0) {
print "Failed ::".$curl->errbuf."\n";
}
}
close BODY;
I retrieve 3 of the 4 files from APNIC, as I have encountered issues with access to some of the other registries FTP servers in the past. I have chosen to update this data 2 times a day, which may at first appear excessive. But on busy days, I have seen upwards of 40-50 new rows added to the database between the first and second retrieval.
Some may ask why I chose to write the downloaded files to a file rather than immediately inserting them into the database. Using this two-step process gives me the ability to manually rollback to an older database if there is a problem retrieving one of the registry files.
The data retrieved from the registries is in the following format.
Registry Raw Data Format
<snip>
apnic|CN|ipv4|202.127.4.0|256|19950610|assigned
apnic|BN|ipv4|202.160.0.0|2048|19950610|allocated
apnic|NP|asn|4613|1|19950611|allocated
apnic|LK|ipv4|203.143.0.0|1024|19950612|allocated
apnic|MO|asn|4609|1|19950615|allocated
apnic|KR|asn|4670|1|19950616|allocated
apnic|SB|ipv4|202.63.254.0|512|19950618|assigned
apnic|JP|ipv4|202.232.0.0|262144|19950618|allocated
apnic|SG|ipv4|203.127.192.0|8192|19950618|allocated
apnic|PK|asn|4615|1|19950629|allocated
apnic|HK|asn|4614|1|19950704|allocated
</snip>
The fields are all "|" (pipe-character) separated, and are described below.
COLUMNVALUES
---------------------------------------------------------------------
REGISTRY: apnic,arin,ripencc,lacnic,iana
COUNTRY_CODE: One of 240 unique 2-character country codes or "*"
ADDRESS_TYPE: asn,ipv4
ADDRESS:Either the starting IP Address or AS Number or "*"
NUMBER:Number of IPs in range or "1" if ADDRESS_TYPE is "asn"
DATE:Date IP range or AS Number was added to database or "*"
RANGE_TYPE: "allocated" -> borrowed; "assigned" -> owned
Storing the Data in MySQL
To store the data, I created a two-table MySQL database named "ip_registry", using the script below.
Database creation statement for 'ip_registry'
CREATE DATABASE ip_registry;
Table structure for table 'country_code'
CREATE TABLE ip_registry.country_code (
code char(2) default NULL, country varchar(50) default NULL, UNIQUE KEY code (code));Table structure for table 'ip_map'
CREATE TABLE ip_registry.ip_map (
code char(2) default NULL, registry char(10) default NULL, ip_from double default NULL, ip_to double default NULL, UNIQUE KEY registry (registry,ip_from,ip_to));
My database design differs from that used by the team at IP-to-Country. They have chosen to go with a single table format, using columns for the IP Number Start, IP Number End, a country code and a country name. The country codes that IP-to-Country are using are not standardized on a known format, which concerned me. I decided that I needed to settle on a standardized country code configuration, and move that to a separate table, which reduces the size of the large "ip_map" data table. As of 24 July 2003, the "ip_map" data table for my system runs to 53,221 rows.
The recognized standard for country codes is ISO 3166. In this standard, each nation is assigned a unique, two-character code. The ONLY exception I found to this rule is that, for historical reasons, the IP registries have entries for the United Kingdom listed with two country codes (GB and UK). I could have corrected this in the Perl script by standardizing on a single country code, but I preferred the solution of adding another row to the "country_code" table.
From the raw Registry data, I determined that only four of the fields useful for the project that I was working on: REGISTRY, COUNTRY_CODE, ADDRESS, and NUMBER. I then wrote PERL code to read the raw IP Registry data from the data file I created previously, convert the starting IP address to a number, use this starting IP Number that to generate the end IP Number, and then insert the rows into a database.
PERL: IP Number conversion and database insert
$dbh = DBI->connect("DBI:mysql:host=[hostname];
database=ip_registry","[user]",
"[password]",{PrintError=>0,RaiseError=>1});
$sth = $dbh->do("DELETE from ip_map");
print "\n\nData from ip_map table dropped\n\n";
$sth = $dbh->prepare("INSERT into ip_map values (?,?,?,?)");
$count = 0;
open (PROCESS, "<$file");
while ($line = <PROCESS>) {
chomp ($line);
if (($line =~ m/\|ipv4\|/) and ($line !~ m/\*/)) {
($registrar,$country_code,$item_type,$start_ip,$num_ip,$entry_date,
$registry_type) = split(/\|/, $line);
@octets_start = split(/\./, $start_ip);
$long_start = 0;
foreach $octet_start (@octets_start) {
$long_start <<= 8;
$long_start |= $octet_start;
}
$long_end = $long_start + ($num_ip-1);
$count += $sth->execute($country_code,$registrar,$long_start,
$long_end);
}
}
close PROCESS;
Why is the value of "$long_end" defined by "$long_start + ($num_ip-1)"? Well, it's because you must count the starting value as one of the items in the set -- i.e. counting using ordinal numbers.
START_IP: 12.236.236.0
END_IP:12.236.236.255
NUMBER_IP: 256
IP Number Calculations
WRONG! 216853760 = 216853504 + 256 -> END_IP = 12.236.237.0
RIGHT! 216853759 = 216853504 + (256-1) -> END_IP = 12.236.236.255
The "DELETE" statement in the script has the affect of dropping the table and re-creating it using the column names and types defined in the initial create statement. It is easier to rebuild the data table each time new data is inserted to ensure that duplicates and overlaps do not enter into the database.
I am considering adding a sanity-check that will stop the entire database insertion process if the number of lines in the data file is less than some predetermined value. This would prevent the creation of a truncated database if one of the registries does not update their data files or the script is unable to retrieve the data files.
Querying the Database Now that the database is constructed, we can start to run queries against it.
mysql> select ip.code,ip.registry, ip.ip_from,
ip.ip_to, co.country
-> from ip_map ip, country_code co
-> where (ip.code = 'IS') and (ip.code = co.code);
+-------+------------+-----------------+----------------+-------------+
| code | registry | ip_from | ip_to | country |
+-------+------------+-----------------+----------------+-------------+
| IS | ripencc | 1049722880 | 1049731071 | ICELAND |
| IS | ripencc | 1359937536 | 1359970303 | ICELAND |
| IS | ripencc | 1385447424 | 1385455615 | ICELAND |
| IS | ripencc | 3238264832 | 3238330367 | ICELAND |
| IS | ripencc | 3261718528 | 3261726719 | ICELAND |
| IS | ripencc | 3264217088 | 3264282623 | ICELAND |
| IS | ripencc | 3556884480 | 3556892671 | ICELAND |
| IS | ripencc | 3558785024 | 3558793215 | ICELAND |
| IS | ripencc | 3565084672 | 3565092863 | ICELAND |
| IS | ripencc | 3584524288 | 3584532479 | ICELAND |
| IS | ripencc | 3585114112 | 3585122303 | ICELAND |
| IS | ripencc | 3585433600 | 3585441791 | ICELAND |
| IS | ripencc | 3586023424 | 3586031615 | ICELAND |
| IS | ripencc | 3587538944 | 3587547135 | ICELAND |
| IS | ripencc | 3587981312 | 3587997695 | ICELAND |
| IS | ripencc | 3641278464 | 3641282559 | ICELAND |
| IS | ripencc | 3642535936 | 3642540031 | ICELAND |
| IS | ripencc | 3650592768 | 3650596863 | ICELAND |
| IS | ripencc | 3650596864 | 3650600959 | ICELAND |
| IS | arin | 2194669568 | 2194735103 | ICELAND |
| IS | arin | 2644312064 | 2644377599 | ICELAND |
| IS | arin | 2698117120 | 2698182655 | ICELAND |
| IS | arin | 3230867968 | 3230868223 | ICELAND |
+-------+------------+-----------------+-----------------+------------+
23 rows in set (0.09 sec)
So the database structure is sound. Also, you can now see why it is important to build the file using all four registries. Even though Iceland should be covered by RIPE, older IP allocations and assignments were been handled by ARIN.
Having the registry information helps build in the flexibility to add a WHOIS functionality using this database, something that I have done on my site (GrabIP). This allows for further drilldowns on the data, beyond the scope of this article.
The main item that will be of interest to most Web site administrators is that they can now build dynamic pages using a data source which tracks their visitors' announced IP address to the country of origin with a high degree of accuracy. This is particulary useful if you are attempting to distribute users to geographically diverse mirror sites. You can also do fun things, such as displaying the flag of the country that the visitor is coming from, something I have implemented on my site.
Notes
(1) The free database from MaxMind has not been updated recently and is still using data from March 2003.
(2) The team at IP-to-Country have promised to update their database on a monthly basis.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |
More MySQL Articles
More By Stephen Pierzchala
developerWorks - FREE Tools! |
Hold your calendar on January 30, 2008 for this free webcast on the new i5/OS. Rational's Enterprise Modernization products will be discussed at this webcast as they help to drive the application development environment for this new System i OS. <br />And learn how i5/OS will take you to the next step of efficient, resilient business processing. You will hear about the new i5/OS capabilities as it will be the most significant i5/OS release in years. If you cannot join the webcast on 1/30/08 you can still use this link to listen to the replay.<br /> FREE! Go There Now!
|
|
|
|
WebSphere Process Server delivers a unique integration framework that simplifies existing IT resources. Often, as IT assets grow to support business demand, so too does their complexity and manageability. In this webcast, we’ll discuss how WebSphere Process Server helps deliver an SOA infrastructure that provides a common model to orchestrate, mediate, connect, map, and execute the underlying IT functions. Discover how WebSphere Process Server simplifies integration of business processes by leveraging existing IT assets as reusable services without the complexities of traditional integration methodologies. FREE! Go There Now!
|
|
|
|
Discover how IBM Rational AppScan Standard Edition can help you detext vulnerabilities in your web applications in the Web Application Security eKit. IBM Rational AppScan is a leading suite of automated web application security solutions that scan and test for common Web application vulnerabilities. The new Web Application Security eKit provides you with valuable resources, including white papers, demos, and additional information on the benefits of testing your Web applications. FREE! Go There Now!
|
|
|
|
Visit IBM developerWorks to download a free trial of the latest release of IBM Lotus Sametime Standard V8.0. Lotus Sametime Standard V8.0 is a platform for unified communications and collaboration that combines security features with an extensible, open solution including integrated Voice over IP, geographic location awareness, mobile clients, and a robust Business Partner community offering telephony and video integration. FREE! Go There Now!
|
|
|
|
Learn from the best! Find out how developers use Rational ClearCase to be more flexible, innovative and deliver higher quality code in the Rational ClearCase Power Users eKit. This complimentary eKit provides a collection of materials, like articles, whitepapers, and demos that can help you become a power user of Rational ClearCase. FREE! Go There Now!
|
|
|
|
Rational Modeling Extension for Microsoft .NET enhances usability for code generation supporting a more intelligent refactoring. The latest enhancements enable organizations with Java and .NET systems and software development maintain architectural integrity across heterogeneous platforms. FREE! Go There Now!
|
|
|
|
Join this Rational Talks to You teleconference on November 29 at 1:00 pm ET to participate in an interactive discusssion with Grady Booch around architecture and reuse. Get your questions answered! FREE! Go There Now!
|
|
|
|
This paper is about the critical role that a discipline called integrated requirements management can play in helping to ensure that your business goals and IT investments are continuously aligned—whether you are sourcing, integrating, building or maintaining software. It also looks at ways that automated IBM Rational® products can work together to help you use requirements in the very best way. FREE! Go There Now!
|
|
|
|
As businesses grow increasingly dependent upon Web applications, these complex entities grow more difficult to secure. Most companies equip their Web sites with firewalls, Secure Sockets Layer (SSL), and network and host security, but the majority of attacks are on applications themselves – and these technologies cannot prevent them. This paper explains what you can do to help protect your organization, and it discusses an approach for improving your organization’s Web application security. FREE! Go There Now!
|
|
|
|
The discipline of assembling and delivering software is maturing beyond standard developer-centric compile/test software builds. The end-to-end software development lifecycle is emerging as the new focus moves “Beyond the Build.” Join this on demand webcast to learn about methods for streamlining software delivery and key capabilities of the IBM Rational Build Forge framework for automating build and release management in environments of any size. FREE! Go There Now!
|
|
|
|
All FREE IBM® developerWorks Tools! |