How to easily edit your robots.txt file in production?

The robots.txt file allows you to control the pages and resources you give access to for crawling bots, whether they are search engine bots or AI crawlers.

Although it is just a simple text file, modifying it can sometimes:

Be slow: for some key account clients, it requires creating a ticket, processed with a more or less significant delay
Be complex: on some CMS, it is impossible to directly edit your robots.txt file from the backoffice, or it requires installing additional plugins, which is not always possible quickly if it requires the intervention of an external service provider, or is not necessarily desirable from a security point of view

However, as an SEO manager or consultant, you generally need to act quickly to modify your robots.txt file. There is a simple solution for this, compatible with all CMS, to edit your robots.txt file directly from an interface and publish these modifications in production in a few seconds.

What is the robots.txt file?

The robots.txt file is a text file located at the root of a site that allows you to indicate to crawling bots which URLs they can or cannot access.

In SEO, the robots.txt file thus helps prevent search engine crawlers from exploring certain URLs or directories that are useless for indexing, as these blocked URLs then have a low probability of appearing in search results.

Furthermore, crawling these URLs (pages with no SEO interest, duplicate content pages, etc.) could impact your crawl budget. Indeed, Googlebot, Google's crawler, could then devote a large part of its resources to crawling useless URLs, at the expense of your site's useful pages - those you are trying to position in search results.

Be careful, however, not to confuse crawling and indexing. The robots.txt file allows you to prevent access to certain URLs for crawling bots (they will not be able to crawl the relevant web pages or resources), but does not prohibit their indexing (meaning these URLs can still appear in search engine results).

In the era of generative artificial intelligence, the robots.txt file is also seeing a resurgence of interest, as it also allows for prohibiting AI crawlers from crawling all or part of a website, thus preventing its content from being used for training generative AI models or in AI assistant responses.

Note, however, that some crawling bots may not respect the directives of your robots.txt file, which are only an indication, and may still crawl your site.

What specifications should be followed for robots.txt?

Your robots.txt file must be located at the root of your site. For example, if your site is accessible on the subdomain www.example.com, then your robots.txt file will be accessible at the URL https://www.example.com/robots.txt.

If you use another subdomain, then you will need to configure another robots.txt file located at the root of it (for example: https://subdomain.example.com/robots.txt) and specific to that subdomain.

Be careful to correctly name this file robots.txt (with an "s" at robots). This is an error I have seen in the past on a client's site.

All rules regarding its interpretation are detailed in the Robots Exclusion Protocol. Google applies this protocol and details the main rules:

File format: it must be a plain text file encoded in UTF-8
Syntax (see below)
Size: 500 kilobytes (KB) maximum, beyond which the content is ignored
Etc.

Syntax

A valid robots.txt file line consists of:

A field
A colon (":")
A value

It is possible to add a comment by starting the line with the "#" character. What follows is then ignored.

Google accepts the following fields:

user-agent, to identify the crawling bot to which the following rules will apply (examples: Googlebot, Bingbot, GPTBot, etc.)
disallow, to indicate URL paths not to be crawled
allow, to indicate URL paths to be crawled
sitemap, to provide the URL of a sitemap (in absolute format)

URL paths must always start with "/" to designate the root. By default, there are no crawling restrictions.

Using the "*" character after the "user-agent" field allows indicating that the following directives apply to all crawling bots.

Some examples

Blocking a directory from crawling

User-agent: *
Disallow: /admin/

We prohibit all crawling bots from accessing URLs starting with /admin/.

Allowing access to certain resources in a directory blocked from crawling

User-agent: *
Disallow: /prive/
Allow: /prive/*.js$
Allow: /prive/*.css$

We prohibit all crawling bots from accessing URLs starting with /prive/, except for files ending precisely in .js and .css inside this directory.

Blocking URL parameters

User-agent: *
Disallow: /*?tri=
Disallow: /*?sessionid=

We prohibit all crawling bots from accessing URLs containing the parameters ?tri= and ?sessionid=.

Declaring a sitemap

Sitemap: https://redirection.io/sitemap.xml

We provide the URL of the XML sitemap file to facilitate the discovery of listed URLs.

Prohibiting AI crawlers from crawling the entire site

User-agent: *
Disallow: /admin/

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

We prohibit all crawling bots from accessing URLs starting with /admin/. And we prohibit certain AI crawlers (GPTBot, ClaudeBot, PerplexityBot) from accessing the entire site.

Best practices and mistakes to avoid

A poorly configured robots.txt file can have a major impact on your SEO, especially if you block the crawling of some of your strategic pages. You should therefore ensure you follow these best practices when configuring your robots.txt file:

Do not block the crawling of essential resources (CSS, JavaScript, images) to allow Google to correctly understand your pages
Do not block the crawling of important pages: for this, check if necessary the impact of your rules on your top URLs
Do not confuse crawling and indexing: the robots.txt file prevents crawlers from accessing your page content, but it does not prevent them from indexing their URL
Add comments when the robots.txt file becomes large: migrations, mergers, old and new rules... A robots.txt file can quickly become unreadable as a site evolves, hence the importance of comments
Test your modifications, using the tools offered below

How to create and modify your robots.txt file?

As an SEO manager in charge of a website, or an SEO consultant working to improve the ranking of your clients' sites, you may need to modify your robots.txt file, whether it is to:

Optimize it by adding new lines to block the crawling of certain URLs of no interest to search engines
Block new crawling bots, related notably to generative AI models

Although seemingly trivial, since it is a simple text file to edit, this modification can prove long and/or complex to implement in some cases.

It requires opening a ticket, the processing of which can sometimes take several days or even weeks before being effective in production
Some sites developed on "custom" CMS do not allow direct editing of the robots.txt file, making the modification more difficult and potentially dependent on the intervention of an external provider (web agency, developer)
Even some of the most used CMS on the market do not allow easy editing of the robots.txt file

Before seeing how to proceed with the main CMS on the market, a simple solution allows you to easily modify your robots.txt file without technical knowledge and in total autonomy - whether you use WordPress, PrestaShop, Magento, Shopify, or a "custom" solution: redirection.io.

redirection.io

redirection.io allows you to serve a fully personalized robots.txt file on your site, editable from a web interface, without any modification necessary on your hosting, and without touching your website's code.

Once redirection.io is installed on your site, you just need to create a new rule from the manager by entering the URL of your robots.txt file as the source URL to be able to modify it.

If you have entered an absolute URL, redirection.io automatically retrieves the content of your current robots.txt file. You then only have to edit it to add or modify directives.

The robots.txt action in a redirection.io rule

Two other possibilities are offered by default:

Allow all, to allow all crawling bots to access all pages of your site (Allow: /)
Block all, to conversely prohibit crawlers from exploring your entire site (Disallow: /)

Once the rule is saved and published, your robots.txt file is updated in just a few moments on your site. You can thus modify it whenever you want in a totally autonomous way, without calling a developer.

This operating mode is valid regardless of the CMS you use, whether it is a "custom" CMS or one of the main CMS on the market, for which we detail below the other possible options for modifying your robots.txt file.

WordPress

By default, WordPress generates a virtual robots.txt file. To modify it, you have two possibilities:

Use an SEO plugin

Most of the most popular SEO plugins include a robots.txt file editor. This is notably the case for Yoast SEO, Rank Math, or SEOPress. You just need to directly modify the content of your robots.txt file via the editor integrated into the extension.

This method avoids using an FTP client to download your updated robots.txt file each time but requires you to install an additional extension - unless you already use these plugins for your SEO configuration.

The "Yoast SEO" plugin, for example, offers a robots.txt file editor accessible from the "Tools" menu of the plugin:

robots.txt modification with Yoast SEO

The "Rank Math SEO" plugin also offers a robots.txt file editor, accessible from the "Dashboard" menu of the plugin (Dashboard > Advanced Mode > General Settings > Edit robots.txt):

$robots.txt modification with Rank Math SEO$

Use an FTP client

If you prefer not to install an extension, you can use a text editor to create your robots.txt file, then upload it to the root of your site via an FTP client.

This method implies that you have FTP access to your server, with the associated risks in case of wrong manipulation, and the potential lack of autonomy on such an action.

PrestaShop

PrestaShop allows you to generate a default robots.txt file. To do this, go to Configure > Shop Parameters > Traffic & SEO, then scroll all the way down to the "Robots.txt file generation" section.

Attention:

This is a default file that you cannot customize
Clicking "Generate robots.txt file" will overwrite the content of your current robots.txt file

robots.txt modification with PrestaShop

To customize your robots.txt file, you will need to:

Install an SEO module or a dedicated module (free or paid) to edit it online
Upload a robots.txt file to the root of your site via an FTP client

Again, as with WordPress, this requires you to either download an additional module or use an FTP client - provided of course that you have the possibility to do so.

Shopify

Shopify automatically generates a robots.txt file with standard exclusion rules. To modify it, you must create a robots.txt.liquid template. To do this, you must:

Go to Online Store > Themes
Click on the three dots next to your active theme and then "Edit code"
In the Templates folder, add a new template named robots.txt.liquid

Shopify then provides a base (with its standard rules), which you can customize by adding your rules. This is not about directly editing a file, but dynamically overriding the rules generated by Shopify.

To delete an existing default rule, you must iterate through the rules generated by Shopify and exclude those you no longer wish to apply. This is done via Liquid conditions. Similarly, to modify an existing rule, it is not enough to delete it: you must dynamically replace it with a new directive.

This operation can be restrictive if you want to easily and quickly modify your robots.txt file without having technical knowledge of syntax or the Liquid language.

Magento

Magento offers native management of the robots.txt file, which can be edited directly from the backoffice, without going through a plugin or FTP access. To do this:

Go to: Content > Design > Configuration
Click edit for the relevant store view
Go to the "Search Engine Robots" section
Enter your rules in the "Edit custom instruction of robots.txt" field

You can also manually create a robots.txt file to upload to the root of your site, in the "pub" folder, provided of course you have server access to be able to do so.

Cloudflare

Note that Cloudflare offers a "managed robots.txt" feature, but this only allows you to override your existing robots.txt file (if applicable) with additional rules to block AI crawlers.

Cloudflare then retrieves the content of your existing robots.txt file (or creates an empty one if not), and adds several directives just before to prohibit the main AI crawlers from exploring your site's content.

This solution does not allow you to customize your robots.txt file by adding your own rules or editing your existing rules. However, installing redirection.io via Cloudflare allows you to do so.

How to test and validate your robots.txt file

Is your robots.txt file in place? Remember to test it well to ensure you have made no syntax errors and that you are not blocking any important URLs for your SEO.

Pay particular attention to:

Checking the syntax: no inconsistent rules or directives
Testing your top URLs, to ensure they are not blocked following the modifications made
Launching a crawl simulating a user-agent (notably Googlebot) to analyze the impact

The following tools allow you to check these different points.

With the redirection.io tool

redirection.io offers a free tool to test your robots.txt file. It allows you to have a summary of the rules (Allow, Disallow) by user-agent, and to check the syntax and validity of these directives.

The redirection.io robots.txt validator

With Screaming Frog

Screaming Frog allows you to crawl a site:

Either by ignoring the robots.txt file
Or by respecting the current robots.txt file
Or by using a custom robots.txt file

This last feature is extremely practical if you are not certain of the impact on your site's crawl of the modifications made to your robots.txt file, as you can test them beforehand.

You can also directly enter certain URLs and check if they are blocked by your robots.txt file (current or custom). Screaming Frog even indicates precisely the line causing the blockage - if applicable.

The Screaming Frog robots.txt report

With Google Search Console

Google Search Console also has a report on the robots.txt file. This report is accessible by going to Settings > robots.txt. It notably specifies:

The date of the last crawl of the robots.txt
Warnings or errors encountered

It also allows you to display the latest version retrieved by Googlebot, and to request a new crawl if needed.

This report is very practical to ensure Google encounters no problem retrieving your robots.txt file, and to check problematic lines if necessary.

Example of an error encountered on a robots.txt file:

If you have configured a domain property, you will have a preview of the robots.txt file of the 20 main hosts of your domain.

Monitoring your robots.txt file

Is your robots.txt file online, and have you validated it? Now make sure it suffers no regression, whether it is a change in response code or an unwanted modification of the content.

With redirection.io

Thanks to real-time log analysis, redirection.io allows you to identify if your robots.txt file returns a response code other than 200. You can then configure an alert to be warned in real-time of an anomaly.

Creation of a dedicated Log View for robots.txt monitoring

From the manager, go to Logs > Logs list
Filter for URLs containing robots.txt (or on the precise URL of your robots.txt file), and for response codes other than 200
Save the "log view" (you can for example call it "Robots.txt file - Response code not 200")
Configure a notification (Settings > Notifications) to be alerted as soon as a new entry is detected in this log view

Creation of a notification for robots.txt monitoring

You can of course refine the filtering by focusing for example only on a user-agent.

With Oseox Monitoring

When you add a domain to monitor, Oseox Monitoring automatically adds the URL of your robots.txt file for monitoring. You thus receive an alert within the hour or within the day (depending on the chosen monitoring frequency) if the response code of your robots.txt file changes or if there is the slightest modification to the content of your robots.txt file.

Oseox Monitoring alert for the robots.txt file

This is extremely useful for identifying any regression.

As an anecdote, a few years ago, this allowed me to warn a client e-merchant within the hour who had just prohibited the crawling of their entire website (Disallow: / directive in the robots.txt file) for all bots - including Googlebot. As it was a website generating several million euros in revenue, the consequences could have been major.

Prohibition of site crawling

Conclusion

The robots.txt file is part of the SEO basics. Yet, its modification is far from easy on the main CMS on the market, and can sometimes require the intervention of other resources, whether internal (developer) or external (web agency). This modification can also require significant delays before being effective in production at some key account clients, which are difficult to reconcile with the implementation of rapid optimizations for an SEO manager or consultant.

redirection.io then offers an easy and fast solution to modify your robots.txt file, by allowing you to edit it directly from a web interface and publish the modifications made in production in a few seconds. SEO and/or marketing teams thus gain time and autonomy, and do not have to solicit other teams or providers.