What is crawling? Avoid it by using robots.txt.
When users surf the web, content is made available for them by search engines through 2 main ways: crawling and indexing.
The former takes place when search engine crawlers access publicly available webpages. So, it basically involves looking at the webpages and following the links on those pages.
Indexing, on the other hand, means gathering information about a page, so that it is made available through search engine results.
The problem with crawling is that sometimes you might not want to allow crawlers to access areas of your website. Such is the case with accessing pages that use limited server resources. That’s why you might want to use the robots.txt file.
What is the robots.txt file and why is it so important?
It is a text file which allows you to specify how you’d like your site to be crawled. Crawlers generally go through the robots.txt file from your website, before they crawl it. The robots.txt file is so great because you can specify which parts can and cannot be crawled.
It’s so important because it allows you to control access to the files and directories on your server. It’s like an electronic NO TRESPASSING sign. It tells the Googlebot and other crawlers which files and directories on your server should not be crawled (nor displayed in search engine results).
What is the file’s location?
In order for it to be valid, it must be located on the root of the website host.
For example, in order to control crawling on all URLs below http://www.yoursite.com/
, the robots.txt file must be located at: http://www.yoursite.com/robots.txt
.
A robots.txt file can be placed on subdomains: http://website.yoursite.com/robots.txt
) or on non-standard ports: http://yoursite.com:8181/robots.txt
, but it cannot be placed in a subdirectory: http://yoursite.com/pages/robots.txt
.
Add a robots.txt file to your webiste
It’s really easy to do it and I have a great source for you. Click here to learn how to create the file: http://www.robotstxt.org/robotstxt.html . You’ll find all the needed information.
If you want to re-check it, click here: http://www.frobee.com/robots-txt-check
And we’re done! Hope your robots.txt file will prevent your site for gathering unneeded annoying crawling!
- How to Get The Most Out of Google Analytics - May 5, 2020
- 3 things to learn from the most successful startups - April 16, 2020
- How to control crawling with robots.txt - April 16, 2020