FanRailBot: FanRail's Web Crawler
FanRailBot is FanRail's spider or web crawling robot, and it collects documents from around the
web to build a searchable index for the FanRail search engine. On this page, you'll find answers
to the most commonly asked questions about how our web crawler works.
FAQ - Frequently Asked Questions
1. How frequently will FanRailBot crawl my website?
2. Can I stop FanRailBot from crawling some or all of my web pages?
3. I do not have a robots.txt file on my server, why is FanRailBot requesting it?
4. What types of links does FanRailBot follow?
5. Can I prevent FanRailBot from following links on my pages?
6. How can I stop FanRailBot from following only certain links?
Answers
1. How frequently will FanRailBot crawl my website?
Because we're currently building our database of railroad related websites and we're a
new search engine, FanRailBot may crawl your site frequently. Much of this can be caused
by how often a link is found to your website from other websites. After your site has been
index, you should see FanRailBot about once per month to see if there has been any changes
to your website that needs to be updated in our database.
2. Can I stop FanRailBot from crawling some or all of my web pages?
The short answer is yes. FanRailBot follows the rules of the robots.txt standard when crawling the web
and indexing content. For detailed information on how to set up and use a robots.txt file on your
site, please read the Robot Exclusion Standard.
To completely disallow FanRailBot from accessing any web pages on your site, use the following
in your robots.txt file:
User-agent: FanRailBot
Disallow: /
To only disallow certain pages or folders from FanRailBot's crawl, you may add something like the following
to your robots.txt file:
User-agent: FanRailBot
Disallow: /cgi-bin/
Disallow: /page2.html
Disallow: /page3.html
3. I do not have a robots.txt file on my server, why is FanRailBot requesting it?
Because FanRailBot follows the robots.txt protocol, it will always request that file from every
server it visits. If it does not find a robots.txt file, it will assume that you allow spidering
of all of your websites content made accessible through web links. If you want to prevent the
"file not found" error messages from showing up in your web server logs, you can create an empty
file named robots.txt.
4. What types of links does FanRailBot follow?
FanRailBot follows HREF links:
<a href="somepage.html">FanRailBot Will Follow This Link</a>
5. Can I prevent FanRailBot from following links on my pages?
Yes. FanRailBot 2.0 respects the robots meta tag in your web pages. To prevent FanRailBot from
following any links in your web pages, use the following meta tag in the head of your web pages:
<META NAME="FanRailBot" CONTENT="nofollow">
6. How can I stop FanRailBot from following only certain links?
FanRailBot also respects the new "rel=nofollow" tag that was originally created for the blogging
community. So, if you have a couple of links that you do not want FanRailBot to follow, yet still
have others you DO want it to follow, use the following code in your links:
<a href="somepage.html">FanRailBot Will Follow This Link</a>
<a href="somepage.html" rel="nofollow">FanRailBot Will NOT Follow This Link</a>
If you have further questions not answered here on this page, please feel free to contact us
by clicking here.
|