The robots meta tag and the x-robots tag are used to instruct crawlers how to index pages of a website. The former is indicated in the HTML code of a web page, while the latter is included in the HTTP header of a URL.
The process of indexing goes through several steps: the content is loaded, analyzed by search engine robots, and added to the database. Information that has made it to the index is what is being shown in the SERPs.
In our post about the robots.txt file , we discussed how to allow bots to crawl a website and how to prevent them from crawling certain content. In this article, we’ll explore how to gain control over how web pages are indexed, what content to close from indexing and how to do it correctly.
SEO benefits of using robots and X-Robots-Tag
Let’s examine how the robots meta tag and the X-Robots-Tag help in search engine optimization and when you should use them.
1. Choosing what pages to index
Not all website pages can attract organic visitors. If indexed, some of them might actually harm the site’s search visibility. These are the types of pages that are usually blocked from indexing with the help of noindex :
- duplicated pages
- sorting options and filters
- search and pagination pages
- technical pages
- service notifications (about a sign up process, completed order, etc.)
- landing pages designed for testing ideas
- pages that are in progress of development
- information that isn’t up-to-date yet (future deals, announcements, etc.)
- outdated pages that don’t bring any traffic
- pages you need to block from certain search crawlers
2. Managing how certain file types are indexed
You can prevent robots from crawling not only HTML pages but also other types of content like an image URL or .pdf file.
3. Keeping the link juice
Blocking links from crawlers with the help of nofollow , you can keep the page’s link juice because it won’t be passed to other sources though external or internal links.
4. Optimizing the crawl budget
The bigger a site is, the more important it is to direct crawlers to the most valuable pages. If search engines crawl a website inside and out, the crawl budget will simply end before bots reach the content helpful for users and SEO. This way, important pages won’t get indexed or will get to the index behind the desired schedule.
The directives of robots and X-Robots-Tag
The Robots and X-Robots-Tag differ in their basic syntax and utilization. The robots meta tag is inserted into the HTML code of a web page and has two important attributes: name (for indicating the search robot’s name) and content (commands for the search robot). The X-Robots-Tag is added to the configuration file and doesn’t have any attributes.
Telling Google not to index your content with the help of robots looks like this:
If you choose to prevent Google from indexing your content using the x-robots tag, it will look like this:
X-Robots-Tag:googlebot: noindex, nofollow
The Robots and X-Robots-Tag have the same directives used for giving search bots different instructions. Let’s review them in detail.
Robots and X-Robots-Tag directives: functions and browser support
Directive | Its function | BING | YAHOO! | |
index/noindex | Tells to index / not index a page. Used for pages that are not supposed to be shown in the SERPs. | + | + | + |
follow/nofollow | Tells to follow / not follow the links on a page. | + | + | + |
archive/noarchive | Tells to show / not show a cached version of a web page in search. | + | + | + |
all/none | All is the equivalent of index, follow used for indexing text and links. None is the equivalent of noindex, nofollow used for blocking indexing of text and links. | + | – | + |
nosnippet | Tells not to show a snippet or video in the SERPs. | + | + | – |
max-snippet | Limits the maximum snippet size. Indicated as max-snippet:[number] where number is a number of characters in a snippet. | + | – | + |
max-image-preview | Limits the maximum size for images shown in search. Indicated as max-image-preview:[setting] where setting can have none , standard , or large value. | + | – | + |
max-video-preview | Limits the maximum length of videos shown in search (in seconds). It also allows setting a static image (0) or lifting any restrictions (-1). Indicated as max-video-preview:[value] . | + | – | + |
notranslate | Prevents search engines from translating a page in the search results. | + | – | – |
noimageindex | Prevents images on a page from being indexed. | + | – | – |
unavailable_after | Tells not to show a page in search after a specified date. Indicated as unavailable_after: [date/time] . | + | – | – |
All of the abovementioned directives can be used with both the robots meta tag and x-robots tag for Google bots to understand your instructions.
Note that indexing a site’s content that is not hidden from search engines is done by default so you don’t have to indicate index and follow directives.
Conflicting directives
If combined, some directives may cause conflicts, for example, permitting to index and at the same time preventing the same content from indexing. Google will choose the restrictive instruction over the permissive one.
Directive combination | Google’s actions |
---|---|
The robot will choose noindex and the page text won’t be indexed. | |
The robot will choose noindex and the page text won’t be indexed but it will follow and crawl the links. | |
All instructions will be considered: text and links will be indexed while links leading to a page’s copy won’t be indexed. |
The robots meta tag: syntax and utilization
As we’ve said, the robots meta tag is inserted into the page HTML code and contains information for search bots. It’s placed in the
section of the HTML document and has two obligatory attributes: name and content . Simplified, it looks like this:The name attribute
This attribute defines the meta tag type according to the information it gives to search engines. For instance, meta name=”description” sets a short description of a page to be displayed in the SERPs, meta name=”viewport” is used for optimizing a site for mobile devices, meta http-equiv=”Content-Type” defines a type of document and its encoding.
In meta name=”robots” , the name attribute specifies the name of the bot the instructions are designed for. It works similarly to the User-agent directive in robots.txt that identifies the search engine crawler.
The “robots” value is used to address all search engines, while if you have to set the instructions particularly for Google, you’ll write meta name=”googlebot” . For several crawlers, you’ll need to create separate tags.
The content attribute
This attribute contains instructions for indexing the page content and its display in the search results. The directives explained in the table above are used in the content attribute.
Note that:
- Both attributes are not case-sensitive.
- If attribute values aren’t included or written correctly, the search bot will ignore the blocking instruction.
- When addressing several crawlers, you need to use a separate robots meta tag for each. As for the content attribute, you can indicate its different directives in a single meta tag, comma separated.
The robots.txt file and the robots meta tag
Given the fact that search robots first look at the robots.txt file for crawling recommendations, they won’t be able to crawl a page and see the instructions included in the code if the page is closed in robots.txt.
If a page has the noindex attribute but is blocked in the robots.txt file, it can be indexed and shown in the search results—for example, if the crawler finds it by following a backlink from another source. Since robots.txt is generally accessible, you can’t be sure that crawlers won’t find your “hidden” pages.
With that said, if you close a page with the help of the robots meta tag, make sure there’s nothing in the robots.txt file preventing it from being crawled. When it comes to blocking images from indexing, sometimes it does make sense to use robots.txt.
Using the robots meta tag
- Method 1: in an HTML editor
Managing pages is similar to editing a text file. You have to open the HTML document in an editor, add robots to the
section, and save.Pages are stored in the site’s root catalog which you can reach using your personal account from a hosting provider or an FTP. Save the source document before making changes to it.
- Method 2: using a CMS
It’s easier to block a page from indexing using a CMS. There are a number of plugins, for example, Yoast SEO for WordPress, that allow you to block indexing or crawling links when editing a page.
Verifying the robots meta tag
It takes time for search engines to index or deindex a page. To make sure your page isn’t indexed, use services for webmasters or browser plugins that check meta tags (for example, SEO META in 1 CLICK for Chrome).
You can also check if the page is indexed using Google Search Console:
If a page check shows that the robots meta tag doesn’t work, verify if the URL isn’t blocked in the robots.txt file, checking it in the address bar or using Google’s robots.txt tester .
2022世界杯买球平台 also allows you to check what website pages are in the index. To do so, go to the Index Status Checker tool .
X-Robots-Tag: syntax and utilization
X-Robots-Tag is a part of the HTTP response for a given URL added to the configuration file. It acts similarly to the robots meta tag and impacts how pages are indexed but sometimes, you should use x-robots specifically for indexing instructions.
A simple example of the X-Robots-Tag is as follows:
X-Robots-Tag: noindex, nofollow
When you need to set rules for a page or file type, the X-Robots-Tag looks like this:
Header set X-Robots-Tag "noindex, nofollow"
The
location = filename { add_header X-Robots-Tag "noindex, nofollow"; }
If the bot name is not specified, directives are automatically used for all crawlers. If a particular robot is identified, the tag looks like this:
Header set X-Robots-Tag "googlebot: noindex, nofollow"
When you should use X-Robots-Tag
- Deindexing non-HTML files
Since not all pages have the HTML format and
section, some website content can’t be blocked from indexing with the help of the robots meta tag. This is when x-robots comes in handy.For example, when you need to block .pdf documents:
Header set X-Robots-Tag "noindex"
- Saving the crawl budget
With the robots meta tag, the crawler loads a page and then reads the directives, while x-robots gives indexing instructions before the search bot gets to a page. In the latter situation, search engines don’t spend time crawling the pages and keep the crawl budget to use it for more important content. It’s especially helpful to use the X-Robots-Tag for large-scale websites.
- Setting crawling directives for the whole website
Using the X-Robots-Tag in HTTP responses allows you to set the directives and manage how your content is indexed on the level of your website and not separate pages.
- Addressing local search engines
The biggest search engines understand the majority of restrictive directives, while small local search engines may not know how to read indexing instructions in the HTTP header. If your website targets a specific region, learn about local search engines and their characteristics.
The major function of the robots meta tag is to hide pages or some content elements from the SERPs. The X-Robots-Tag allows you to set more general instructions for the whole website and inform search bots before they crawl web pages, saving the crawl budget.
How to apply X-Robots-Tag
To add the X-Robots-Tag header, you should use configuration files in the website’s root directory. The settings will differ depending on the web server.
Apache
You should edit server documents .htaccess and httpd.conf . If you need to prevent all .png and .gif files from indexing in the Apache web server, you should add the following:
Header set X-Robots-Tag "noindex"
Nginx
You should edit the configuration file conf . If you need to prevent all .png and .gif files from indexing in the Nginx web server, you should add the following:
location ~* \.(png|gif)$ { add_header X-Robots-Tag "noindex"; }
Important : before editing the configuration file, save the source file to eliminate website performance issues in case there are some errors.
How to check X-Robots-Tag
There are several ways to learn what response the HTTP page header gives and whether it contains the X-Robots-Tag: online URL checking services, browser extensions, and webmaster tools.
For instance, the HTTP header that blocks indexing looks like this:
HTTP/1.1 200 OK Date: Tue, 10 November 2020 09:30:22 GMT X-Robots-Tag: noindex
Checking x-robots in Google
To check the tag using Google Search Console, go to URL Inspection , and click on Test live URL and View crawled page . You’ll see the information about the HTTP response in the More info section.
Examples of the robots meta tag and the X-Robots-Tag
noindex
Telling all crawlers not to index text on a page and not to follow the links:
X-Robots-Tag: noindex, nofollow
nofollow
Telling Google not to follow the links on a page:
X-Robots-Tag: googlebot: nofollow
noarchive
Telling search engines not to cache a page:
X-Robots-Tag: noarchive
none
Telling Google not to index and follow the links in an HTML document:
X-Robots-Tag: googlebot: none
nosnippet
Telling search engines not to display snippets for a page:
X-Robots-Tag: nosnippet
max-snippet
Limiting the snippet to 35 symbols maximum:
X-Robots-Tag: max-snippet:35
max-image-preview
Telling to show large image versions in the search results:
X-Robots-Tag: max-image-preview:large
max-video-preview
Telling to show videos without length limitations:
X-Robots-Tag: max-video-preview:-1
notranslate
Telling search engines not to translate a page:
X-Robots-Tag: notranslate
noimageindex
Telling not to index the images on a page:
X-Robots-Tag: noimageindex
unavailable_after
Telling crawlers not to index a page after January 1, 2021:
X-Robots-Tag: unavailable_after: 2021-01-01
Common mistakes with robots and X-Robots-Tag usage
Conflict with robots.txt
Official X-Robots-Tag and robots guidelines state that a search bot has to be able to crawl the content intended to be hidden from the index. If you disallow a certain page in the robots.txt file, the directives will be inaccessible for crawlers.
Blocking indexing with robots.txt is another common mistake. This file serves for limiting page crawling and not for preventing pages from being indexed. To manage how your pages are displayed in the search, use the robots meta tag and x-robots.
Removing noindex
If you use the noindex directive to hide the content from the index for a certain period, it’s important to open the access for crawlers on time. For instance, you have a page with a future promo deal: if you don’t remove noindex at the time it’s ready, it won’t be shown in the search results and won’t generate traffic.
Backlinks to a nofollow page
The nofollow instruction can fail to work if the page has external sources pointing to it.
Removing a URL from the sitemap before it gets deindexed
If a page has the noindex directive, it’s not reasonable to remove it from the sitemap file. Your sitemap allows crawlers to quickly find all pages including those that are intended to be removed from the index.
What you can do is create a separate sitemap.xml with a list of pages containing noindex and remove URLs from the file as they get deindexed. If you upload this file into Google Search Console, robots are likely to crawl it quicker.
Not checking index statuses after making changes
It may happen that valuable content will be blocked from indexing by mistake. To avoid that, check your pages’ indexing statuses after making any changes to them.
How not to get important pages deindexed?
You can monitor changes in your site’s code using 2022世界杯买球平台 ’s Page Changes Monitor :
What should you do when a page disappears from the search?
When a page you need to be shown in the SERPs isn’t there, check if there are directives blocking indexing or a disallow directive in the robots.txt file. Also, verify if the URL is included in the sitemap file. Using Google Search Console, you can tell search engines you need to have your page indexed, as well as inform them about an updated sitemap.
Summary
The robots meta tag and the x-robots tag serve for managing how pages are indexed and shown in the search results. They differ in utilization: the robots meta tag is included in the page code, while the X-Robots-Tag is specified in the configuration file. Remember some of their other important characteristics:
- The robots.txt file helps search bots crawl pages correctly, while the robots meta tag and X-Robots-Tag influence how content gets to the index. All three are vital for technical optimization.
- Both the robots meta tag and x-robots tag are used for blocking page indexing but the latter gives robots instructions before they crawl pages, saving the crawl budget.
- If robots.txt prevents bots from crawling a page, the robots meta tag or x-robots directives won’t work.
- Mistakes made while setting the robots meta tag and the x-robots tag can lead to incorrect indexing issues and website performance problems. Set the directives carefully or trust it to an experienced webmaster.
Thanks for making it clear how meta robots is different from robots.txt!
We appreciate your feedback, Lynn!
Thanks for helpful information! Great there are plugins that can do everything for me and I don’t have to edit the code :))
It is great indeed. Thanks for your comment, Liam!
Nice post. I think It’s great.