This blog post explores common errors found in robots.txt files and how to address them to avoid negatively impacting your website’s search engine visibility.
What is robots.txt?
Robots.txt is a plain text file located in your website’s root directory. It instructs search engine crawlers (like Googlebot) on which pages and files they can access and crawl.
Why are robots.txt mistakes dangerous?
While not always detrimental, mistakes in robots.txt can lead to unintended consequences, such as:
- Pages being unintentionally blocked from search engines: This can significantly decrease your website’s organic traffic.
- Search engines not rendering your website correctly: This can lead to a poor user experience and affect your search ranking.
Here are eight common mistakes to avoid in your robots.txt file:
- Robots.txt not in the root directory:Â Ensure your robots.txt file is placed in the root directory of your website.
- Mistake: Your robots.txt file is located in a subdirectory like
mywebsite.com/folder/robots.txt
. - Fix: Move the robots.txt file to the root directory of your website, which is simply
mywebsite.com/robots.txt
.
- Mistake: Your robots.txt file is located in a subdirectory like
- Poor use of wildcards:Â Use wildcards (* and $) cautiously to avoid accidentally blocking or allowing too much content.
- Mistake: You use a wildcard (*) at the beginning of a rule, accidentally blocking all pages on your website.
- Fix: Be specific with your wildcards. Instead of
Disallow: *
, useDisallow: /admin/
to only block the admin directory.
- Noindex in robots.txt:Â Google no longer follows noindex directives in robots.txt. Use alternative methods like robots meta tags or X-Robots-Tag headers.
- Mistake: You have a line like
Disallow: /blog/
in your robots.txt file, intending to prevent a specific blog post from being indexed. - Fix: Google no longer follows noindex directives in robots.txt. Use a robots meta tag on the specific blog post page itself to prevent indexing.
- Mistake: You have a line like
- Blocked scripts and stylesheets:Â Don’t block access to CSS and JavaScript files, as they are essential for rendering your website correctly.
- Mistake: You have lines like
Disallow: /css/
orDisallow: /js/
in your robots.txt file. - Fix: Search engines need access to these files to render your website properly. Remove these lines from your robots.txt file.
- Mistake: You have lines like
- No XML sitemap URL:Â While not an error, including your sitemap URL in robots.txt can help search engines discover your website structure and content more efficiently.
- While not an error, omitting the sitemap URL: This can slow down the process of search engines discovering your website’s content.
- Recommendation: Include the URL of your sitemap in your robots.txt file using the
Sitemap:
directive. For example:Sitemap: https://www.yourwebsite.com/sitemap.xml
- Access to development sites:Â Block crawlers from accessing and indexing unfinished development sites. Remember to remove this block when launching your website.
- Mistake: Your development website (e.g.,
dev.yourwebsite.com
) is not blocked from search engines. - Fix: Add a
Disallow: /
rule to the robots.txt file of your development website to prevent search engines from crawling and indexing it. Remember to remove this block when launching your website publicly.
- Mistake: Your development website (e.g.,
- Using absolute URLs:Â Use relative paths in your robots.txt file to avoid potential issues with crawlers interpreting the URLs incorrectly.
- Mistake: You use absolute URLs in your robots.txt file, like
Disallow: https://www.yourwebsite.com/private/
. - Fix: Use relative paths instead. In this case, the correct rule would be
Disallow: /private/
.
- Mistake: You use absolute URLs in your robots.txt file, like
- Deprecated & unsupported elements:Â Avoid using crawl-delay and noindex directives in robots.txt as they are no longer supported by Google.
- Mistake: You have lines like
Crawl-delay: 10
orDisallow: /page noindex
in your robots.txt file. - Fix: Google no longer supports
crawl-delay
andnoindex
directives in robots.txt. These elements have been replaced by other methods, such as Search Console settings for crawl rate and robots meta tags for noindex.
- Mistake: You have lines like
How to recover from a robots.txt error:
- Fix the robots.txt file and verify the changes.
- Use SEO crawling tools to test your website.
- Submit an updated sitemap to search engine consoles like Google Search Console and Bing Webmaster Tools.
Final thoughts:
- It’s crucial to handle robots.txt with caution, especially on large websites where errors can significantly impact traffic and revenue.
- Make changes carefully, double-check them, and consider testing in a sandbox environment before implementing them on your live website.
- If you encounter an issue, diagnose the problem, fix the robots.txt file, and resubmit your sitemap for crawling. With these steps, you can hopefully restore your website’s search ranking within a reasonable timeframe.