You are viewing this site in a simplified layout because your browser does not yet support CSS Subgrid.

op111.net

Search op111.net

How to block Googlebot impersonators

In my last post I wrote about my surprise to see a well-known SEM/SEO bot impersonate the Googlebot and to discover that it has been doing it for years and that it has been known for years.

Impersonating web bots is of course bad. It is something you do when you want:

  1. To hide your identity; or
  2. To bypass access controls

It is also easy to do. You just change the user agent of your bot.

It is equally easy to block the impersonators if you know the addresses of the genuine bot.

For Googlebot we know them because Google publishes the addresses of their bots:

For Googlebot we are interested in the first file.

The logic

The logic is simple. And it can use either a denylist or an allowlist.

The denylist approach says:

If it claims it is Googlebot and comes from one of the following ranges, which we know aren’t Googlebot’s, then it lies. Block it.

For example:

##
##  Impersonates Googlebot
##
##  Access denied based on blacklist.
##
RewriteCond %{HTTP_USER_AGENT}  "=Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
RewriteCond %{REMOTE_ADDR}      "^51\.79\."         [OR]
RewriteCond %{REMOTE_ADDR}      "^85\.208\."        [OR]
RewriteCond %{REMOTE_ADDR}      "^95\.89\."         [OR]
RewriteCond %{REMOTE_ADDR}      "^107\.175\."       [OR]
RewriteCond %{REMOTE_ADDR}      "^109\."            [OR]
RewriteCond %{REMOTE_ADDR}      "^115\."            [OR]
RewriteCond %{REMOTE_ADDR}      "^143\.92\."        [OR]
RewriteCond %{REMOTE_ADDR}      "^185\."            [OR]
RewriteCond %{REMOTE_ADDR}      "^196\.251\."       [OR]
RewriteCond %{REMOTE_ADDR}      "^2a02:85f:"
RewriteRule .* - [R=403,L]

Then you add non-Google ranges to the denylist as you discover them.

NOTE

The snippets in this post are for Apache and use mod_rewrite. There are other ways too to write the rules for Apache, and, of course, the logic is the same for any web server or proxy server.

The second approach is an allowlist:

If it claims it is Googlebot and does not come from the Googlebot ranges, then it lies. Block it.

For example:

##
##  Impersonates Googlebot
##
##  Access denied based on allowlist.
##  Octets and hextets current as of 2025-11-12.
##
RewriteCond %{HTTP_USER_AGENT}  "Googlebot/"
RewriteCond %{REMOTE_ADDR}      "!^34\."
RewriteCond %{REMOTE_ADDR}      "!^35\."
RewriteCond %{REMOTE_ADDR}      "!^66\."
RewriteCond %{REMOTE_ADDR}      "!^192\."
RewriteCond %{REMOTE_ADDR}      "!^2001:"
RewriteRule .* - [R=403,L]

The main difference between the two approaches is that one is reactive and the other proactive. The proactive approach needs less maintenance. It also catches many more impersonators.

It does not catch all of them though. Those wide ranges have more than Googlebot addresses. To catch all impersonators we have to narrow the ranges down to the exact ranges Google publishes.

But the allowlist gets too long then and needs a lot more maintenance.

A middle ground is to go just a bit narrower, down to the second octet or hextet. Now we catch even more liars but the maintenance burden does not increase much:

##
##  Impersonates Googlebot
##
##  Access denied based on allowlist.
##  Octets and hextets current as of 2025-11-12.
##
RewriteCond %{HTTP_USER_AGENT}  "Googlebot/"
RewriteCond %{REMOTE_ADDR}      "!^34\.22\."
RewriteCond %{REMOTE_ADDR}      "!^34\.64\."
RewriteCond %{REMOTE_ADDR}      "!^34\.65\."
RewriteCond %{REMOTE_ADDR}      "!^34\.80\."
RewriteCond %{REMOTE_ADDR}      "!^34\.88\."
RewriteCond %{REMOTE_ADDR}      "!^34\.89\."
RewriteCond %{REMOTE_ADDR}      "!^34\.96\."
RewriteCond %{REMOTE_ADDR}      "!^34\.100\."
RewriteCond %{REMOTE_ADDR}      "!^34\.101\."
RewriteCond %{REMOTE_ADDR}      "!^34\.118\."
RewriteCond %{REMOTE_ADDR}      "!^34\.126\."
RewriteCond %{REMOTE_ADDR}      "!^34\.146\."
RewriteCond %{REMOTE_ADDR}      "!^34\.147\."
RewriteCond %{REMOTE_ADDR}      "!^34\.151\."
RewriteCond %{REMOTE_ADDR}      "!^34\.152\."
RewriteCond %{REMOTE_ADDR}      "!^34\.154\."
RewriteCond %{REMOTE_ADDR}      "!^34\.155\."
RewriteCond %{REMOTE_ADDR}      "!^34\.165\."
RewriteCond %{REMOTE_ADDR}      "!^34\.175\."
RewriteCond %{REMOTE_ADDR}      "!^34\.176\."
RewriteCond %{REMOTE_ADDR}      "!^35\.247\."
RewriteCond %{REMOTE_ADDR}      "!^66\.249\."
RewriteCond %{REMOTE_ADDR}      "!^192\.178\."
RewriteCond %{REMOTE_ADDR}      "!^2001:4860:"
RewriteRule .* - [R=403,L]

The maintenance

It is important to understand that these rules are not set-and-forget. Google’s ranges do not change every day and the changes do not affect all sites, but they do change.

So you have to keep an eye on them. There are two ways to do that:

The automation method

This is the professional method:

There are several ways to automate this. For example, a script that runs periodically and regenerates the config if the config needs updating.

The lazy method

The lazy method is not as safe as the professional method but it can be safe enough, depending on the use case:

Let Google tell you.

Failed: Blocked due to access forbidden (403)

If you see that in the Google Search Console, it is time to update the allowlist in the rule.

Comments, questions and corrections

If you have comments, questions, or corrections, please mention me on Bluesky, @op111.net, or send a message using the contact form.

Thank you for reading!

— Demetris, 2025-11-12