Call Us 0330 236 6522
Live Chat
Chat With Us
Contact Us
Submit a Ticket

Posted by:

Aaron Sadler

Posted:

3rd April 2024

In this post, we will cover how we have made use of Cloudflare Workers and Transform Rules to block access to search engines from crawling and indexing our umbhost.dev domain and subdomains.

We use the umbhost.dev domain to hold our temporary and pre-live URLs, so it's massively important that these do not get indexed by search engines.

You could also use this as a guide to block search engines from any subdomain (staging.domain.com for example).

Transform Rule

We have added a response header to every request which goes through the umbhost.dev domain, the header is called X-Robots-Tag with a value of noindex

This header will instruct crawlers to not index a page, you can read more about the X-Robots-Tag header on the Google Docs site.

To add this header you will need to login to the Cloudflare Dashboard and then browse to Domain Zone -> Rules -> Transform Rules -> Modify Response Header

On this screen click on the + Create rule button.

To target every URL accessed through the domain configure it as follows:

If you wish to only target a single (or multiple) subdomains configure them as follows:

Cloudflare Worker

Next, we wanted to add a robots.txt which was handled at the edge and automatically applied to every URL served.

To do this we make use of Cloudflare Workers, which are serverless code deployed directly to the edge.

To create a Cloudflare Worker browse to Workers & Pages -> Overview

On the Overview screen click on the Create Application button and then on the Create Worker button

On the next screen give your worker a name such as project-robots-blocker and then click on the Deploy button (The code can't be edited until it has been deployed).

On the next screen click the Edit Code button, in the editor which opens replace the code shown with the snippet below.

This code will return the robots.txt in the following format:

Next, click Save and Deploy

Now we need to hook this up to our domain, to do this browse to Domain Zone -> Worker Routes and then click on the Add Route button.

In the window which pops up enter the domain as follows:

*.domain.com/robots.txt

(Make sure to replace domain.com with your domain)

The asterisk will target all subdomains on the domain, if you wish to target a single subdomain you can replace the asterisk with the required subdomain.

Finally, click on Save.

And that's all there is to it, now all requests to the domain will automatically have the X-Robots-Tag header applied and a robots.txt file.
No more chance of accidentally having a pre-production site or temporary URL ending up being indexed by search engines.

(Only the good search engines which obey the rules will be affected)

Comments

Post a comment

Fields marked with an * (asterisk) are required

* Name

* Email

* Comment

* Consent for storing submitted data

Yes, I give permission to store and process my data

Posted by:

Category

Posted:

Transform Rule

Cloudflare Worker

Comments

Post a comment

UmbHost and Growcreate join forces to offer gre...

Umbraco 8 has reached end of life – What it mea...

Green Web Foundation Listing Renewed

We're now UK Umbraco Foundation Sponsors!

.NET 9 Now Available