Home arrow MySQL arrow Page 2 - Developing a Dynamic Document Search in PHP Part 1/2

Developing a Dynamic Document Search in PHP Part 1/2

In today's article Murali discusses how to implement a complete document content search using only PHP and MySQL...

Author Info:
By: Murali Dharan
Rating: 5 stars5 stars5 stars5 stars5 stars / 26
June 26, 2003
  1. · Developing a Dynamic Document Search in PHP Part 1/2
  2. · The Article
  3. · The Article (contd.)
  4. · The Article (contd.)
  5. · Conclusion

print this article

Developing a Dynamic Document Search in PHP Part 1/2 - The Article
(Page 2 of 5 )

The database consists of three tables: a content Table, keyword Table, and a link table. The content table holds an article’s title and abstract. The keyword table holds keyword. The keyword field is indexed. Lastly, the link table holds the keyword id and content id.

The SQL statements for creating these three tables are shown below:

Content Table:

CREATE TABLE content (
contid mediumint(9) NOT NULL auto_increment,
title text,
abstract longtext,

Keyword Table:

CREATE TABLE keytable (
 keyid mediumint NOT NULL auto_increment,
keyword varchar(100) default NULL,
PRIMARY KEY (keyid),
KEY keyword (keyword) ) TYPE=MyISAM;

Link Table:

keyid mediumint NOT NULL,
contid mediumint NOT NULL)

Preparing the Database
An input interface with a HTML form is created to enter the title and content. After filling in these forms and hitting enter, the title and the abstract are stored in the content table. The generated new content ID is stored in a variable temporarily. The next step is the ‘Upload Engine’ that parses each word in the abstract and processes the whole text. It removes common words like is, was, and, if, so, else, then etc. Each word is then stored in a wordmap array. Each word will only have one entry in the wordmap array.

For every word in the wordmap array, a keyword table is parsed and a match tries to be found. If there is a match, the generated key ID and content ID generated earlier are stored in the link table. If not, the new keyword is inserted in the keyword table. And thus we finished preparing our database.

The code snippet given below explains every step of the program.

Searching the keyword table for every word is a long process. This also reduces the efficiency of the program. To implement this, all of the keywords in the keyword table are stored in an associative array called $allWords. An associative array is one which works on a B-Tree algorithm and is very useful for performing searches. Here is the function:

Function LoadCurrenrWords(){
global $allWords;

    $result = mysql_query( "select keyid, keyword from keytable" ) or die( "Error in executing mysql query" );

    while ( $row = mysql_fetch_array($result) ) {
        $allWords[$row[‘keyword’] = $row[‘keyid’];

Common Words
$COMMON_WORDS is an associative array that stores an array of words, which are commonly used in the English Language. These words have to be removed while parsing the file.

$COMMON_WORDS=array(“a”=>1, “as”=>1);

You can add as many common words as you like. See source code for full list of common  words.

ExtractWords() Function
This function filters words by allowing only alphabetic characters. To implement this, I used a technique called STATE MACHINE that filters the characters.

Alphabetic characters are taken as STATE1 and other characters (Numeric and Special Characters) as STATE0. Initially the machine will be in the STATE0. While parsing letters, if it encounters alphabetic characters, the machine switches to STATE1. Otherwise, it will remain in the same state. As a result we get a word with only alphabetic characters.

function ExtractWords($text){
    $STATE0 = 0;  //Numeric / Other Characters
    $STATE1= 1;   //Alpha Characters
    $state = $ STATE0;

    $wordList = array();
    $curWord = "";

    for ( $i = 0; $i < strlen($text); ++$i ) {
        $ch = $text{$i};
        $isAlpha = ctype_alpha( $ch );

        if ( $state == $STATE0) {
            if ( $isAlpha ) {
                $curWord = $ch;
                $state = $STATE1;
        else if ( $state == $STATE1) {
            if ( $isAlpha ) {
                $curWord .= $ch;
            else {
                $wordList[] = strtolower( $curWord );
                $state = $ STATE0;

    if ( $state == $ STATE1) {
        $wordList[] = strtolower( $curWord );

    return $wordList;

As a result we get a list of words stored in an array returned to the calling function.

FilterCommonAndDuplicateWords() Function
This function is called after the ExtractWords() function. It parses filtered words and removes common words like ‘a’,’is’,’was’,’and’, etc. Other words are taken as valid words, their duplicates are removed, and are then stored in an associative array $wordMap. This array is returned to the called function.

function FilterCommonAndDuplicateWords( $wordList ) {
    global $COMMON_WORDS;
    global $MAX_WORD_LENGTH;

    $wordMap = array();

    foreach ( $wordList as $word ) {
        $len = strlen( $word );
        if ( ($len > 1) && ($len < $MAX_WORD_LENGTH) ) {
            if ( !$wordMap[$word] ) {
                if ( !$COMMON_WORDS[$word] ) {
                    $wordMap[$word] = 1;

Process Form function()
This is the core part of the upload program. After finishing filtering, removing common words and duplicate words, this function is called. Firstly, this function inserts the title and abstract in the content table. The newly generated content Id is stored in the $contented variable. It then updates the keyword and link tables.

For every word in the $wordMap array, if the word is already exists in the keywords table, it inserts the key ID and the content ID into the link table. Conversely, if the word is not found, it inserts the new word into then keyword table. The newly generated key ID is stored in the $keyed variable. It then updates the link table by inserting the key ID and content ID variables into the link table.

function ProcessForm($title ,$body){

global   $allWords;

$tempWordList = ExtractWords( $body );
$wordList = FilterCommonAndDuplicateWords($tempWordList);

// insert into content
mysql_query( sprintf( "INSERT INTO content (title, abstract) VALUES ('%s', '%s')",
mysql_escape_string($title), mysql_escape_string($body) ) );

//store the newly generated content id in $contentId
$contentId = mysql_insert_id();

    // insert all the new words and links
    while(list($word,$val)=each($wordList)) {
        $keyId = "";
        if ( !$allWords[$word] ) {
            mysql_query( sprintf( "INSERT INTO keytable ( keyword ) VALUES ( '%s' )",
                mysql_escape_string($word) ) );

            $keyId = mysql_insert_id();
            $allWords[$word] = $keyId;
        else {
            $keyId = $allWords[$word];

        // insert the link
        mysql_query( sprintf( "INSERT INTO link (keyid, contid) VALUES ( %d, %d )", $keyId, $contentId ) );
//End of Processing Form.


blog comments powered by Disqus

- MySQL and BLOBs
- Two Lessons in ASP and MySQL
- Lord Of The Strings Part 2
- Lord Of The Strings Part 1
- Importing Data into MySQL with Navicat
- Building a Sustainable Web Site
- Creating An Online Photo Album with PHP and ...
- Creating An Online Photo Album with PHP and ...
- PhpED 3.2 – More Features Than You Can Poke ...
- Creating An Online Photo Album with PHP and ...
- Creating An Online Photo Album with PHP and ...
- Security and Sessions in PHP
- Setup Your Personal Reminder System Using PHP
- Create a IP-Country Database Using PERL and ...
- Developing a Dynamic Document Search in PHP ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials