Robots.txt contains instructions for search engine bots.
It is a simple file, but it is also easy to mess it up.
If you do, it can dramatically hurt your SEO.
In this post, we will show you everything you need to know about it.
Contents:
- What Is Robot.txt?
- Is Robots.txt Important for SEO?
- Robots.txt User-Agents and Directives
- How to Find Your Robots.txt File
- How to Create a Robots.txt File
- Robots.txt in WordPress
- Robots Meta Tags vs Robots.txt Files
- Robots.txt Tester Tool
What Is Robot.txt?
The biggest search engines like Google, Bing, and Yahoo all support its use.
You can find the file in the root directory of your website, and you should name it “robots.txt”.
However, as stated by Google:
It is not a mechanism for keeping a web page out of search engine results.
So, if you want to keep your pages out of Google, you should use the “noindex” tag.
Now, do you want to know how robot.txt works?
Here is an example of what it looks like:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
This tells all search engine robots that they are not allowed to index any pages in the /wp-admin/ directory.
On the other hand, they are allowed to index the admin-ajax.php page.
Again, it’s important to underline that there’s no way to ensure that content will be excluded from search results using the robots.txt file, according to Google’s Webmaster Central Help Center.
If other pages point to your page with descriptive text, Google could still index the URL without visiting the page.
To block pages from appearing on search engine results, you should use robot meta tags.
Is Robots.txt Important for SEO?
It’s not crucial for most websites to have it.
It’s because Google can index your website without one.
They will also automatically not index duplicate pages or pages they find not important.
With this in mind, these are the three main reasons why you should have a robots.txt file:
- Block Private Pages
- Prevent Resources from Being Indexed
- Maximize Crawl Budget
Block Private Pages
If you have pages on your website that you don’t want to be indexed, you can block them with robots.txt.
This is useful if you have pages that are for logged-in users only. You don’t want search engines to index these pages because they won’t be relevant to anyone who isn’t logged in.
Or, for staging pages. If you’re working on a new page or redesigning an old one, you can also block it with this file until it’s ready to be published.
Prevent Resources from Being Indexed
Another common use for robots.txt is to prevent search engines from indexing certain resources on your website.
This is usually done to conserve server resources or to prevent duplicate content from being indexed.
For example, you might have an image gallery that is generating a lot of traffic but not generating any leads or sales. In this case, you might want to block the image gallery from being indexed.
Maximize Crawl Budget
Last but not least, robots.txt can be used to help maximize your crawl budget.
Your crawl budget is the number of pages on your website that search engine robots can and will index.
If you have a large website with thousands of pages, you might want to use robots.txt to block some of the less important pages. This will help the search engine robots focus on the most important pages, which can help your SEO.
Robots.txt User-Agents and Directives
Robots.txt files consist of two parts: user-agents and directives.
User-Agents
User-agents are the search engine robots that the file is meant for.
There are many user-agents, but the most popular ones are:
- Google: Googlebot
- Bing: Bingbot
- Yahoo!: Slurp
- Baidu: Baiduspider
- DuckDuckGo: DuckDuckBot
Use the star (*) wildcard to match all user-agents.
Here is an example:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
This file will apply to all search engine robots.
Directives
There are three directives that Google currently supports:
- Disallow
- Allow
- Sitemap
Disallow
The Disallow directive tells the search engine robots not to index specific pages or resources.
For example, if you want to block the /wp-admin/ directory from being indexed by all search engines, you would use this directive:
User-agent: * Disallow: /wp-admin/
Allow
The Allow directive is the opposite of Disallow. It tells search engine robots that they are allowed to index certain pages or resources.
For example, if you want to allow the /wp-admin/admin-ajax.php page to be indexed, you would use this directive:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
Sitemap
The Sitemap directive tells the search engine robots where they can find your sitemap.xml file. This is a file that contains a list of all the pages on your website.
For example, if your sitemap.xml file is located at /sitemap_index.xml, you would use this directive:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: https://erikemanuelli.com/sitemap_index.xml
How to Find Your Robots.txt File
Robots.txt files are located in the root directory of your website.
For example, if your website is example.com, the file would be placed at example.com/robots.txt.
If you can’t find it on your website, you can use a Chrome extension to help you out.
How to Create a Robots.txt File
It is quite easy!
Just create a new text file and name it “robots.txt”.
You can use simple programs like Windows notepad.
Then, add your directives to the file.
For example:
Save the file and upload it to the root directory of your website. That’s it!
If you need further help, Google has an extensive guide here.
Robots.txt in WordPress
If you’re using WordPress, there are plugins that can help you manage your robots.txt file, like Yoast SEO and All in One SEO Pack.
To create or edit a new one using Yoast SEO plugin, go to your WordPress dashboard, and click on ‘Yoast SEO’ in the admin menu.
Click on “Tools”:
You should have enabled “File Editing”.
Click on “File Editor”:
Here, click the “Create robots.txt file” button:
Finally, you can view or edit your file:
Robots Meta Tags vs Robots.txt Files
What’s the difference between robots meta tags and robots.txt files?
Robots meta tags are HTML codes that you can add to your website’s pages. They give search engine robots instructions on whether to index a page and follow the links on that page.
For example, if you don’t want to index a page, you would use this tag:
<meta name=”robots” content=”noindex”>
And if you want to index a page but you don’t want to have some of the links followed, you would use this tag (the anchor text is the clickable words used for the link):
<a href=”URL” rel=”nofollow”>ANCHOR TEXT</a>
Neither of these tags will block resources like images or CSS files from being indexed.
In conclusion, robots meta tags are HTML codes that you can add to your website’s pages. They tell search engine robots whether they are allowed to index a page and follow the links on that page.
Robots.txt files are text files that you can upload to your website’s root directory. They tell search engine robots which pages or resources they can index.
Robots.txt Tester Tool
Once you’ve created or edited your robots.txt file, you can use the Google’s robots.txt Tester tool to test it.
Just choose a verified property:
The tool will show you whether your robots.txt file is working as you intended:
If you see any errors, make sure to fix them:
Final Words
I hope this post helped you understand what robots.txt is and how to use it for SEO.
If you have any questions, feel free to leave a comment below.
Hi Erik
What a joy to be here again after a bit gap. 😊 I am so glad that I am back here to check an amazing and informative post on Robots.txt.
A well explained post on the subject, I never knew it’s important in relation to SEO. It’s really good to know that the biggest search engines like Google, Bing, and Yahoo all support its use.
Thanks Erik for this educational piece.
Hope you are doing good 😊
Best
~ Phil
Thanks for visiting and commenting, Philip.
Good to hear you liked this guide.