Yeah, I got a site search running. I know you think it is a small thing, and you might say that I should have used Google site search right? Well trust me, it is not a small matter, and I too wanted to use Google but decided not to. Let me explain why. I looked at the services that Google provided. I wanted a solution that would return the search results in my own template. As in the resulting page should be hosted on my website and not on someone else’s. Google did seem to have a solution, however my guess is the service is down. According to the description, I should be able to customize Google for my site. However, when I tried to do so, I got an error that my website had not been indexed by Google yet. I found this kinda odd, coz I get a lot of hits from people who are directed to me via Google. So then I checked if ASU was indexed, and guess what Google said it wasn’t. Same result with Stanford, Google and Yahoo. Here is a sample error message.
So this ruled Google out of the picture. I surely didn’t want to use the normal search because the results were displayed on Google’s webpage and hence I had no control over the format. So I looked up the internet and hit upon this porject called phpDig. Nice php and MySql based application. It runs a spider on your website and is HIGHLY customizable. So I jumped at the opportunity and incorporated it into my website. So the search you see running is powered by phpDig. Thanks you guys!! I will make a donation to the project tomorrow, I promise.
Now phpDig works fine, but another issue is running the spider at regular intervals. If I did have root access on the server, it would have worked out fine. Unfortunately the server is not owned by me and hence I cannot schedule a cron job. This is where fake-cron stepped in. Quoting from their document
Fake Cron is a script which has been designed to simulate the cron process on webservers. It is useful where cron is not permitted or where you would like an easy admin interface to control your various cron activated scripts.
So I got that too installed and wrote a small shell script that runs my spider and voila I am good to go. Customizing the search page took a while, but now I am quite happy with the way it looks. So thats my saga of the search. If anyone needs any help setting up a similar setup, lemme know.
Technorati tags: Site search, Spidering hacks, Phpdig, Cron