Meta Robots Tags and Robots.txt

31 January 2023

Robots.txt and meta robots tags

Robots.txt and Meta Robots Tags are used by SEOs to tell Google and other bots what to crawl and what not to crawl on your site. This is especially useful to manage the flow and distribution of link juice, ensuring that link juice is primarily passed to your high priority pages (i.e. to make sure that link juice is directed towards the pages that you actually want ranking well) and not wasted on your low priority pages.

It’s also very useful for telling Google not to crawl content that could get you into trouble with manual actions or other quality related devaluations, or that could unnecessarily use up your crawl budget, such as thin content, duplicate content, internal search pages or pages that don’t live up to Google’s E-E-A-T Guidelines. We would also recommend using it, in addition to other security measures, to prevent the crawling and/or indexing of sensitive information on your site.

There are many considerations to take into account, and we’d always suggest checking with your SEO consultant before coming to any decision.

Robots.txt

The robots.txt file lives in root directory of a website. It tells crawlers about which parts of the site they should or should not access. The robots.txt file uses a simple syntax and can contain multiple directives to control crawler behaviour.

Example of a Robots.txt File:

javascript

User-agent: *

Disallow: /private/

Allow: /public/

In the example above, the directive

User-agent: *

applies to all web crawlers (the “*” indicates that it’s applicable to all crawlers. You can be more specific and create rules that just apply to individual crawlers. In such a case you’d replace the “*” with the name of the specific crawler, such as “GoogleBot”).

The “Disallow: /private/” directive instructs crawlers not to access any pages within the “/private/” directory, while “Allow: /public/” permits access to pages within the “/public/” directory.

Meta Robots Tags

Meta robots tags are HTML tags placed within the <head> section of a webpage. They provide page-level instructions to search engine crawlers on how to handle the page content. Different directives can be used in the meta robots tag to control indexing, the following of links, or preventing the caching of a page.

Example of meta robots tags:

html

<meta name="robots" content="index, follow">

The above code instructs search engine crawlers to index the page and follow its links. This is the most common directive and indicates that the page should be included in search engine results.

Preventing Indexing and Following Links

html

<meta name="robots" content="noindex, follow">

With this directive, search engine crawlers will not index the page but will follow the links present on it.

Preventing Indexing and Following Links

html

<meta name="robots" content="noindex, nofollow">

Using this directive, search engine crawlers will neither index the page nor follow any links present on it.

Preventing Indexing and Caching

html

<meta name="robots" content="noindex, noarchive">

This directive ensures that search engine crawlers do not index the page and do not store a cached version of it.

Allowing Indexing But No Snippet Display

html

<meta name="robots" content="index, nosnippet">

With this directive, search engine crawlers can index the page, but they should not display any snippet from the page in search engine results.

SEO Benefits and Best Practices

Robots.txt and meta robots are crucial tools in the SEO toolbox.

Indexation Control: By specifying the pages to be crawled and indexed, you can prioritise the most important content, ensuring search engines focus on relevant pages. Some websites have loads of low priority, low quality web pages. For instance, websites that have internal search engines can sometimes be filled with doorway pages, thin content, duplicate content and other forms of low quality content or webspam. This content can also take up a huge amount of your crawlbudget. Google has repeatedly told Webmasters to disallow internal search pages and there have been a large number of examples of websites that have received manual actions for not doing so, such as giphy.com, cnet.com, softonic.com ebay.com and craigslist.com
Privacy and Security: Robots.txt can be used to restrict access to sensitive information, such as private directories or files, preventing them from appearing in search engine results.
Link Sculpting: Meta robots tags allow you to fine-tune link flow within your website. By controlling which pages are followed or nofollowed, you can optimise link equity distribution. It’s worth noting that Google still takes nofollowed links into consideration now, and may well pass link juice through them, as they now consider nofollow to be a ‘hint’ instead of a directive. Read more about this in our Guide to the NoFollow tag.
Duplicate Content Mitigation: Properly configuring robots.txt and meta robots tags can prevent search engines from indexing duplicate or low-quality content, ensuring your site’s credibility and visibility.

We definitely recommend discussing robots.txt and meta robots tags with your SEO agency before implementing them. Correct implementation is a vital aspect to any SEO campaign, but incorrect use can cause quite serious problems.