rediff.com
Have a Question? Ask

How does web crawling work?

Tags: careerseducationtechnology

Asked by : Shrinath Nagarajan |  22 Oct 03:42 pm

   Earn 10 points for answering this question
Earn 10 points for answering this question.
4000 characters remaining
Email Id:     Password:    
New User? Sign up
 
1. 

The first thing you need to understand is what a Web Crawler or Spider is and how it works. A Search Engine Spider (also known as a crawler, Robot, SearchBot or simply a Bot) is a program that most search engines use to find whats new on the Internet. Googles web crawler is known as GoogleBot. There are many types of web spiders in use, but for now, were only interested in the Bot that actually crawls the web and collects documents to build a searchable index for the different search engines. The program starts at a website and follows every hyperlink on each page. So we can say that everything on the web will eventually be found and spidered, as the so called spider crawls from one website to another. Search engines may run thousands of instances of their web crawling programs simultaneously, on multiple servers. When a web crawler visits one of your pages, it loads the sites content into a database. Once a page has been fetched, the text of your page is loaded into the search engi ...more

Says vivek singh 22 Oct 03:52 pm

This answer is chosen as the best answer for this question.