Aaron Sadler
Posted by:

Aaron Sadler

In this post, we will cover how we have made use of Cloudflare Workers and Transform Rules to block access to search engines from crawling and indexing our umbhost.dev domain and subdomains.

We use the umbhost.dev domain to hold our temporary and pre-live URLs, so it's massively important that these do not get indexed by search engines.

You could also use this as a guide to block search engines from any subdomain (staging.domain.com for example).

Transform Rule

We have added a response header to every request which goes through the umbhost.dev domain, the header is called X-Robots-Tag with a value of noindex

This header will instruct crawlers to not index a page, you can read more about the X-Robots-Tag header on the Google Docs site.

To add this header you will need to login to the Cloudflare Dashboard and then browse to Domain Zone -> Rules -> Transform Rules -> Modify Response Header

Cloudflare Transform Rules Modify Response Header Location

On this screen click on the + Create rule button.

To target every URL accessed through the domain configure it as follows:

Create X-Robots-Tag response header

If you wish to only target a single (or multiple) subdomains configure them as follows:

How to target specific subdomains with response header

Cloudflare Worker

Next, we wanted to add a robots.txt which was handled at the edge and automatically applied to every URL served.

To do this we make use of Cloudflare Workers, which are serverless code deployed directly to the edge.

To create a Cloudflare Worker browse to Workers & Pages -> Overview

Workers & Pages overview

On the Overview screen click on the Create Application button and then on the Create Worker button

Create worker button

On the next screen give your worker a name such as project-robots-blocker and then click on the Deploy button (The code can't be edited until it has been deployed).

On the next screen click the Edit Code button, in the editor which opens replace the code shown with the snippet below.

This code will return the robots.txt in the following format:

Next, click Save and Deploy

Now we need to hook this up to our domain, to do this browse to Domain Zone -> Worker Routes and then click on the Add Route button.

In the window which pops up enter the domain as follows:

*.domain.com/robots.txt

(Make sure to replace domain.com with your domain)

The asterisk will target all subdomains on the domain, if you wish to target a single subdomain you can replace the asterisk with the required subdomain.

Configure worker route

Finally, click on Save.

And that's all there is to it, now all requests to the domain will automatically have the X-Robots-Tag header applied and a robots.txt file.
No more chance of accidentally having a pre-production site or temporary URL ending up being indexed by search engines.

(Only the good search engines which obey the rules will be affected)

Comments

Post a comment

Fields marked with an * (asterisk) are required


Recent Posts

Umbraco
We're now UK Umbraco Foundation Sponsors!

Supporting great Umbraco ev...

News
.NET 9 Now Available

DotNet 9 Has been rolled ou...

News
Black Friday 2024 - FREE Umbraco Forms licence!

How to claim your FREE Umbr...

Umbraco
How to deploy Umbraco using Web Deploy from Azu...

How to deploy Umbraco using...

Umbraco
How to deploy Umbraco using Web Deploy from Git...

This blog post shows how yo...

ADVERTISTING
Cloud Umbraco Hosting

Eco-Friendly Umbraco Hosting Starting At £10.00/month