Q: Why does my site need a search engine? Won’t Google do?
A: Left to itself, Google.com will eventually spider and index everything, provided that:
- the content is linked to by another web page
- the content is publicly accessible from the internet
- the content isn’t blocked from search engines by a robots.txt file
This includes many non-HTML file types, including PDF, PostScript, and Microsoft Office formats.
The problem is that Google only crawls your site every so often. The more dynamic your content, the more linked to you are, the higher your PageRank, the more frequently they crawl you, but you can’t make them come on demand.
Also, you can’t make them crawl your internal pages, or your C: or S: or H: drives. For your C: drive and your MyDocuments folder, you can use Google Desktop Search. But GDS doesn’t crawl network drives, although you can copy the contents of the network drive to a local drive, and then GDS will index it.
Finally, Google makes an appliance, a lovely yellow box you can buy and stick on your LAN, which can index anything you tell it to. More info here: http://www.google.com/appliance/faq.html
Comments