MySQL
  Home arrow MySQL arrow Create a IP-Country Database Using PERL an...
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
MYSQL

Create a IP-Country Database Using PERL and MySQL
By: Stephen Pierzchala
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 30
    2003-08-01

    Table of Contents:

    Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    By using of Perl, Stephen illustrates the tremendous Internet marketing advantage that is inextricably evident. It is as simple as collecting and customizing user interaction based on the country from which the IP address is based. Read the details ...Targeting Web site content to the specific visitors who view the site is a very important marketing advantage. Being able to track incoming visitors by the country that they originate from is an additional item that can assist companies in ensuring that visitors are presented with relevant content. This may seem like a daunting task, but it can be achieved with a high degree of accuracy using publicly accessible data, and Open Source software.

    IP to Country Mapping

    The idea for IP to Country mapping is one that has started to appear more frequently on the Internet in recent months. Maxmind offers a commercial, subscription-based solution which provides country-level mappings for free, and has for-pay national, regional and metropolitan databases.(1) IP-to-Country offers an Open Source database for country-level IP mapping, and it was this system that was the original inspiration for the system I have developed.(2)

    All of these systems do warn users that they are not 100% accurate. The accuracy of a mapping can be affected if visitors are on large corporate or ISP networks where traffic is routed out a small number of public access points, regardless of its point of origin.

    Making IP Addresses Searchable

    The first issue that needs to be addressed is how to determine if an IP Address is in one of the ranges that is defined as originating from a distinct country. The simplest way to range-match IP Addresses is to abandon the dotted-quad notation we are all familiar with, and convert the IP Address to an IP Number.

    All IP Addresses can be converted into decimal numbers that fall into a known range between 0 and (2^32)-1 (4294967295). In reality, the range is even smaller than that, as public IP Addresses fall between 0.0.0.0 (IP Number: 0) and 223.255.255.255 (IP Number: 3758096383).

    A quick search of the Web gave me a simple PERL method for converting IP Addresses to IP Numbers.

    IP Address -> IP Number
    
    my (@octets,$octet,$ip_number,$number_convert,$ip_address);
    $ip_address = $ARGV[0];
    chomp ($ip_address);
    @octets = split(/\./, $ip_address);
    $ip_number = 0;
    foreach $octet (@octets) {
    $ip_number <<= 8;
    $ip_number |= $octet;
    }
    print "\nThe IP Address $ip_address converts to
    the IP Number $ip_number.\n"; IP Number -> IP Address my (@octets,$i,$ip_number,$ip_number_display,
    $number_convert,$ip_address); $ip_number_display = $ip_number = $ARGV[0]; chomp($ip_number); for($i = 3; $i >= 0; $i--) { $octets[$i] = ($ip_number & 0xFF); $ip_number >>= 8; } $ip_address = join('.', @octets); print "\nThe IP Number $ip_number_display
    converts to the IP Address $ip_address.\n";

    If you are using PHP in your applications, this conversion process is made even easier by native function calls.

    Convert IP Address to IP Number
    $ip_number = ip2long($ip_address);
    Convert IP Number to IP Address
    $ip_address = long2ip($ip_number);
    
    IP Address Location Data

    Now that we have settled on a format for the IP data to be used in the database, we now have to find IP data that allows us to map IP Addresses to countries. This is easier than it sounds, as this data is centrally held by the 4 regional IP Registries — ARIN, RIPE, APNIC and LACNIC. After poking around in the depths of their Web sites, I found that they actually provide text formatted versions of the allocated and assigned IP ranges that they are responsible for. All of the registries use the same format, which makes parsing the these files a simple process.

    I chose to use the PERL module WWW::CURL to retrieve the files. You could re-write the application to use LWP or some other method on systems where cURL is not supported, as it is a simple file download over FTP.

    cURL Retrieval Section
    my ($ip_date, $file_time, $file, $apnic, $ripe, $arin,
    $lacnic, @registries, $url, $curl); $ip_date = (strftime("%Y%m", localtime(time)))."01"; $file_time = (strftime("%Y%m%d-%H%M", localtime(time))); $file = $file_time."_ip_data.txt"; $apnic = "ftp://ftp.apnic.net/pub/stats/apnic/apnic-latest"; $ripe = "ftp://ftp.apnic.net/pub/stats/ripe-ncc/_ripencc.latest"; $arin = "ftp://ftp.apnic.net/pub/stats/arin/arin.".$ip_date; $lacnic = "ftp://lacnic.net/pub/stats/lacnic/lacnic.".$ip_date; open (BODY,">>$file"); @registries=($apnic,$ripe,$arin,$lacnic); foreach $url(@registries){ # Init the curl session $curl= WWW::Curl::easy->new() or die "curl init failed!\n"; $curl->setopt(CURLOPT_URL, $url); $curl->setopt(CURLOPT_FTP_USE_EPSV, '0'); $curl->setopt(CURLOPT_FILE, *BODY); if ($curl->perform() != 0) { print "Failed ::".$curl->errbuf."\n"; } } close BODY;

    I retrieve 3 of the 4 files from APNIC, as I have encountered issues with access to some of the other registries FTP servers in the past. I have chosen to update this data 2 times a day, which may at first appear excessive. But on busy days, I have seen upwards of 40-50 new rows added to the database between the first and second retrieval.

    Some may ask why I chose to write the downloaded files to a file rather than immediately inserting them into the database. Using this two-step process gives me the ability to manually rollback to an older database if there is a problem retrieving one of the registry files.

    The data retrieved from the registries is in the following format.

    Registry Raw Data Format
    <snip>
    apnic|CN|ipv4|202.127.4.0|256|19950610|assigned
    apnic|BN|ipv4|202.160.0.0|2048|19950610|allocated
    apnic|NP|asn|4613|1|19950611|allocated
    apnic|LK|ipv4|203.143.0.0|1024|19950612|allocated
    apnic|MO|asn|4609|1|19950615|allocated
    apnic|KR|asn|4670|1|19950616|allocated
    apnic|SB|ipv4|202.63.254.0|512|19950618|assigned
    apnic|JP|ipv4|202.232.0.0|262144|19950618|allocated
    apnic|SG|ipv4|203.127.192.0|8192|19950618|allocated
    apnic|PK|asn|4615|1|19950629|allocated
    apnic|HK|asn|4614|1|19950704|allocated
    </snip>
    

    The fields are all "|" (pipe-character) separated, and are described below.

    COLUMNVALUES
    ---------------------------------------------------------------------
    REGISTRY: apnic,arin,ripencc,lacnic,iana
    COUNTRY_CODE: One of 240 unique 2-character country codes or "*"
    ADDRESS_TYPE: asn,ipv4
    ADDRESS:Either the starting IP Address or AS Number or "*"
    NUMBER:Number of IPs in range or "1" if ADDRESS_TYPE is "asn"
    DATE:Date IP range or AS Number was added to database or "*"
    RANGE_TYPE: "allocated" -> borrowed; "assigned" -> owned
    

    Storing the Data in MySQL

    To store the data, I created a two-table MySQL database named "ip_registry", using the script below.

    Database creation statement for 'ip_registry'
    CREATE DATABASE ip_registry;
    Table structure for table 'country_code'
    CREATE TABLE ip_registry.country_code (
     code char(2) default NULL, country varchar(50) default NULL, UNIQUE KEY code (code));Table structure for table 'ip_map'
    CREATE TABLE ip_registry.ip_map (
     code char(2) default NULL, registry char(10) default NULL, ip_from double default NULL, ip_to double default NULL, UNIQUE KEY registry (registry,ip_from,ip_to));

    My database design differs from that used by the team at IP-to-Country. They have chosen to go with a single table format, using columns for the IP Number Start, IP Number End, a country code and a country name. The country codes that IP-to-Country are using are not standardized on a known format, which concerned me. I decided that I needed to settle on a standardized country code configuration, and move that to a separate table, which reduces the size of the large "ip_map" data table. As of 24 July 2003, the "ip_map" data table for my system runs to 53,221 rows.

    The recognized standard for country codes is ISO 3166. In this standard, each nation is assigned a unique, two-character code. The ONLY exception I found to this rule is that, for historical reasons, the IP registries have entries for the United Kingdom listed with two country codes (GB and UK). I could have corrected this in the Perl script by standardizing on a single country code, but I preferred the solution of adding another row to the "country_code" table.

    From the raw Registry data, I determined that only four of the fields useful for the project that I was working on: REGISTRY, COUNTRY_CODE, ADDRESS, and NUMBER. I then wrote PERL code to read the raw IP Registry data from the data file I created previously, convert the starting IP address to a number, use this starting IP Number that to generate the end IP Number, and then insert the rows into a database.

    PERL: IP Number conversion and database insert
    $dbh = DBI->connect("DBI:mysql:host=[hostname];
    database=ip_registry","[user]",
    "[password]",{PrintError=>0,RaiseError=>1}); $sth = $dbh->do("DELETE from ip_map"); print "\n\nData from ip_map table dropped\n\n"; $sth = $dbh->prepare("INSERT into ip_map values (?,?,?,?)"); $count = 0; open (PROCESS, "<$file"); while ($line = <PROCESS>) { chomp ($line); if (($line =~ m/\|ipv4\|/) and ($line !~ m/\*/)) { ($registrar,$country_code,$item_type,$start_ip,$num_ip,$entry_date,
    $registry_type) = split(/\|/, $line); @octets_start = split(/\./, $start_ip); $long_start = 0; foreach $octet_start (@octets_start) { $long_start <<= 8; $long_start |= $octet_start; } $long_end = $long_start + ($num_ip-1); $count += $sth->execute($country_code,$registrar,$long_start,
    $long_end); } } close PROCESS;

    Why is the value of "$long_end" defined by "$long_start + ($num_ip-1)"? Well, it's because you must count the starting value as one of the items in the set -- i.e. counting using ordinal numbers.

    START_IP: 12.236.236.0
    END_IP:12.236.236.255
    NUMBER_IP: 256
    IP Number Calculations
    WRONG! 216853760 = 216853504 + 256       -> END_IP = 12.236.237.0
    RIGHT! 216853759 = 216853504 + (256-1) -> END_IP = 12.236.236.255

    The "DELETE" statement in the script has the affect of dropping the table and re-creating it using the column names and types defined in the initial create statement. It is easier to rebuild the data table each time new data is inserted to ensure that duplicates and overlaps do not enter into the database.

    I am considering adding a sanity-check that will stop the entire database insertion process if the number of lines in the data file is less than some predetermined value. This would prevent the creation of a truncated database if one of the registries does not update their data files or the script is unable to retrieve the data files.

    Querying the Database

    Now that the database is constructed, we can start to run queries against it.

    mysql> select ip.code,ip.registry, ip.ip_from, 
    ip.ip_to, co.country -> from ip_map ip, country_code co -> where (ip.code = 'IS') and (ip.code = co.code);
    +-------+------------+-----------------+----------------+-------------+ | code | registry | ip_from | ip_to | country | +-------+------------+-----------------+----------------+-------------+ | IS | ripencc | 1049722880 | 1049731071 | ICELAND | | IS | ripencc | 1359937536 | 1359970303 | ICELAND | | IS | ripencc | 1385447424 | 1385455615 | ICELAND | | IS | ripencc | 3238264832 | 3238330367 | ICELAND | | IS | ripencc | 3261718528 | 3261726719 | ICELAND | | IS | ripencc | 3264217088 | 3264282623 | ICELAND | | IS | ripencc | 3556884480 | 3556892671 | ICELAND | | IS | ripencc | 3558785024 | 3558793215 | ICELAND | | IS | ripencc | 3565084672 | 3565092863 | ICELAND | | IS | ripencc | 3584524288 | 3584532479 | ICELAND | | IS | ripencc | 3585114112 | 3585122303 | ICELAND | | IS | ripencc | 3585433600 | 3585441791 | ICELAND | | IS | ripencc | 3586023424 | 3586031615 | ICELAND | | IS | ripencc | 3587538944 | 3587547135 | ICELAND | | IS | ripencc | 3587981312 | 3587997695 | ICELAND | | IS | ripencc | 3641278464 | 3641282559 | ICELAND | | IS | ripencc | 3642535936 | 3642540031 | ICELAND | | IS | ripencc | 3650592768 | 3650596863 | ICELAND | | IS | ripencc | 3650596864 | 3650600959 | ICELAND | | IS | arin | 2194669568 | 2194735103 | ICELAND | | IS | arin | 2644312064 | 2644377599 | ICELAND | | IS | arin | 2698117120 | 2698182655 | ICELAND | | IS | arin | 3230867968 | 3230868223 | ICELAND | +-------+------------+-----------------+-----------------+------------+ 23 rows in set (0.09 sec)

    So the database structure is sound. Also, you can now see why it is important to build the file using all four registries. Even though Iceland should be covered by RIPE, older IP allocations and assignments were been handled by ARIN.

    Having the registry information helps build in the flexibility to add a WHOIS functionality using this database, something that I have done on my site (GrabIP). This allows for further drilldowns on the data, beyond the scope of this article.

    The main item that will be of interest to most Web site administrators is that they can now build dynamic pages using a data source which tracks their visitors' announced IP address to the country of origin with a high degree of accuracy. This is particulary useful if you are attempting to distribute users to geographically diverse mirror sites. You can also do fun things, such as displaying the flag of the country that the visitor is coming from, something I have implemented on my site.

    Notes

    (1) The free database from MaxMind has not been updated recently and is still using data from March 2003.

    (2) The team at IP-to-Country have promised to update their database on a monthly basis.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

    More MySQL Articles
    More By Stephen Pierzchala

     

    IBM® developerWorks developerWorks - FREE Tools!


    NEW! Applying lean thinking to the governance of software development

    Effective governance for lean development isn’t about command and control. Instead, the focus is on enabling the right behaviors and practices through collaborative and supportive techniques. Hear from Scott Ambler on how it is far more effective to motivate people to do the right thing than it is to force them to do so. Learn how to form a lightweight, collaboration-based framework that reflects the realities of modern IT organizations.
    FREE! Go There Now!


    NEW! Download a free trial of Lotus Quickr 8.0

    Visit IBM developerWorks to download a free trial version of Lotus Quickr 8.0, which enables collaboration by transforming the way everyday business content such as documents, rich media, photos, and video can be shared. Lotus Quickr makes it faster and easier to share content of all types (not just documents) within virtual teams. It is designed to make it easier to collaborate across organizational boundaries, while continuing to work within the context of familiar desktop applications.
    FREE! Go There Now!


    NEW! Evaluate Rational Business Developer V7.1

    Visit IBM developerWorks to download a free trial version of IBM Rational Business Developer V7.1. Rational Business Developer offers rapid and simplified development of business applications and services through Enterprise Generation Language (EGL) tools, generating Java or mainframe solutions while shielding developers from technical complexities.
    FREE! Go There Now!


    NEW! IBM Rational Systems Development e-Kit

    As systems increase in complexity, communication between systems and software teams becomes more and more difficult. Now, there’s a way to improve product quality and communication.<br />Read the “Model Driven Systems Development” white paper to see how. Also included in this kit are more educational white papers, customer examples, tutorials, informative Webcasts, and best practices for designing, building and managing systems.<br />
    FREE! Go There Now!


    NEW! Info 2.0: Harnessing the power of Web 2.0 and Enterprise Mashups

    Listen to this webcast to get an overview of Info 2.0 and a technical demo of how to quickly build an enterprise mashup. IBM's Info 2.0 technology leverages emerging Web 2.0 technologies such as mashups, feeds, AJAX, and JSON in order to simplify assembly of information using feeds and services. Come learn about the technical elements of Info 2.0 including the Feed Generation framework, Mashup Engine, and mashup assembly components. Learn how to pull information from databases, departmental information, and the Web to create mashups critical to your company’s success. We will also discuss best practices to help you get started.
    FREE! Go There Now!


    NEW! Innovate don't duplicate! Asset reuse strategies for success

    Asset Reuse is a key strategy for companies looking to create innovative solutions to solve complex software development problems. Searching for, identifying, updating, using and deploying software assets can be a difficult challenge. Listen to this webcast, to learn about strategies and tools that you can leverage for a successful project, including Rational Asset Manager, Rational Software Architect and WebSphere Service Registry and Repository.
    FREE! Go There Now!


    NEW! Rational Testing eKits

    Discover how Rational tools and best practices for testing can make your job easier. The new Rational Testing eKits provide you with valuable resources – including demos, webcasts, tutorials, and articles – that help you address your specific testing needs across the software lifecycle. Five new eKits are available covering the topics of Requirements and Test Management, Functional Testing, Performance Testing, Code Quality and Embedded Systems, and SOA and Web Services Testing.
    FREE! Go There Now!


    NEW! Using Rational Business Developer to enhance your developer productivity

    Join this Rational Talks to You teleconference, to hear how Enterprise Generation Language (EGL) eliminates the need for tedious and error-prone low level coding, so developers can focus on business requirements. EGL extends the Rational software development platform with a simplified programming language that enables developers who have little or no experience with Java, Web technologies or Service Oriented Architecture, to create enterprise-class applications and services quickly and easily. It also allows developers who may have little or no mainframe programming experience to quickly create traditional mainframe components.
    FREE! Go There Now!


    NEW! Webcast: Striking the right balance between manual and automated testing

    Join this webcast to learn how IBM Rational's Functional Testing solution enables you to implement automation your way, at your pace, with your existing staff. In this webcast, you’ll learn how you can eliminate redundancy of manual test scripts, reduce errors, and increase test coverage through test automation. After this presentation you will understand how IBM Rational Functional Testing solution can streamline your manual testing and make test automation easily attainable.
    FREE! Go There Now!


    NEW! Webcast: What is new in Viper 2 for developers?

    Viper 2 brings a great value to developer communities including SQL, XML, PHP, Ruby, .NET and Java. You probably already know that DB2 Express-C is free for developers to develop, deploy and distribute. Viper 2 provides a variety of means that help move your application from the development stage to deployment more rapidly. This webcast shows how to best utilize the latest tools available for developing DB2 applications.
    FREE! Go There Now!



    All FREE IBM® developerWorks Tools!

    MYSQL ARTICLES

    - MySQL and BLOBs
    - Two Lessons in ASP and MySQL
    - Lord Of The Strings Part 2
    - Lord Of The Strings Part 1
    - Importing Data into MySQL with Navicat
    - Building a Sustainable Web Site
    - Creating An Online Photo Album with PHP and ...
    - Creating An Online Photo Album with PHP and ...
    - PhpED 3.2 – More Features Than You Can Poke ...
    - Creating An Online Photo Album with PHP and ...
    - Creating An Online Photo Album with PHP and ...
    - Security and Sessions in PHP
    - Setup Your Personal Reminder System Using PHP
    - Create a IP-Country Database Using PERL and ...
    - Developing a Dynamic Document Search in PHP ...







    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 6 Hosted by Hostway
    Stay green...Green IT