3 mins
OpenContent Silos
3 mins
31 January 2023
Robots.txt and Meta Robots Tags are used by SEOs to tell Google and other bots what to crawl and what not to crawl on your site. This is especially useful to manage the flow and distribution of link juice, ensuring that link juice is primarily passed to your high priority pages (i.e. to make sure that link juice is directed towards the pages that you actually want ranking well) and not wasted on your low priority pages.
It’s also very useful for telling Google not to crawl content that could get you into trouble with manual actions or other quality related devaluations, or that could unnecessarily use up your crawl budget, such as thin content, duplicate content, internal search pages or pages that don’t live up to Google’s E-E-A-T Guidelines. We would also recommend using it, in addition to other security measures, to prevent the crawling and/or indexing of sensitive information on your site.
There are many considerations to take into account, and we’d always suggest checking with your SEO consultant before coming to any decision.
The robots.txt file lives in root directory of a website. It tells crawlers about which parts of the site they should or should not access. The robots.txt file uses a simple syntax and can contain multiple directives to control crawler behaviour.
Example of a Robots.txt File:
User-agent: *
Disallow: /private/
Allow: /public/
In the example above, the directive
User-agent: *
applies to all web crawlers (the “*” indicates that it’s applicable to all crawlers. You can be more specific and create rules that just apply to individual crawlers. In such a case you’d replace the “*” with the name of the specific crawler, such as “GoogleBot”).
The “Disallow: /private/” directive instructs crawlers not to access any pages within the “/private/” directory, while “Allow: /public/” permits access to pages within the “/public/” directory.
Meta robots tags are HTML tags placed within the <head> section of a webpage. They provide page-level instructions to search engine crawlers on how to handle the page content. Different directives can be used in the meta robots tag to control indexing, the following of links, or preventing the caching of a page.
Example of meta robots tags:
<meta name="robots" content="index, follow">
The above code instructs search engine crawlers to index the page and follow its links. This is the most common directive and indicates that the page should be included in search engine results.
Preventing Indexing and Following Links
<meta name="robots" content="noindex, follow">
With this directive, search engine crawlers will not index the page but will follow the links present on it.
Preventing Indexing and Following Links
<meta name="robots" content="noindex, nofollow">
Using this directive, search engine crawlers will neither index the page nor follow any links present on it.
Preventing Indexing and Caching
<meta name="robots" content="noindex, noarchive">
This directive ensures that search engine crawlers do not index the page and do not store a cached version of it.
Allowing Indexing But No Snippet Display
<meta name="robots" content="index, nosnippet">
With this directive, search engine crawlers can index the page, but they should not display any snippet from the page in search engine results.
Robots.txt and meta robots are crucial tools in the SEO toolbox.
We definitely recommend discussing robots.txt and meta robots tags with your SEO agency before implementing them. Correct implementation is a vital aspect to any SEO campaign, but incorrect use can cause quite serious problems.