The first thing you need to understand is what a Web Crawler or Spider is and how it works. A Search Engine Spider (also known as a crawler, Robot, SearchBot or simply a Bot) is a program that most search engines use to find whats new on the Internet. Googles web crawler is known as GoogleBot. There are many types of web spiders in use, but for now, were only interested in the Bot that actually crawls the web and collects documents to build a searchable index for the different search engines. The program starts at a website and follows every hyperlink on each page.
So we can say that everything on the web will eventually be found and spidered, as the so called spider crawls from one website to another. Search engines may run thousands of instances of their web crawling programs simultaneously, on multiple servers. When a web crawler visits one of your pages, it loads the sites content into a database. Once a page has been fetched, the text of your page is loaded into the search engi
...more