We’ve noticed a significant increase in bad bots crawling our servers, causing high load and ignoring robots.txt directives. Many of these are AI bots, along with some overly aggressive search engine crawlers.
This might not be an issue for larger servers, but for smaller setups, it can easily cripple a website.
A few things we’ve been implementing to help mitigate this include caching wherever possible—using tools like Redis and plugins such as WP Redis for WordPress.
Another useful tool is Fail2Ban, which includes a built-in bad bot filter. However, the default configuration may need some adjustment to be effective.
To enable Fail2Ban and configure it properly, edit /etc/fail2ban/jail.conf (or better, /etc/fail2ban/jail.local) and make sure you include a block similar to this:
[apache-badbots]
# Ban hosts which agent identifies spammer robots crawling the web
# for email addresses. The mail outputs are buffered.
port = http,https
filter = apache-badbots
logpath = %(apache_access_log)s
bantime = 48h
maxretry = 1
enabled = true
Make sure all your Apache log files are listed in /etc/fail2ban/paths-common.conf. If you’re using hosting panels like Virtualmin or Plesk, you may need to manually add additional log paths to ensure full coverage.
apache_error_log = /var/log/virtualmin/*error_log
/var/log/apache2/*error.log
apache_access_log = /var/log/virtualmin/*access_log
/var/log/apache2/*access.log
Next, edit the filter file at /etc/fail2ban/filter.d/apache-badbots.conf. The default configuration already includes some entries, which you can leave as-is. You can then add your custom patterns at the top under the label badbotscustom. The section might look something like this:
[Definition]
# List the bots you want to block by
badbotscustom = ClaudeBot|ClaudeBot/1\.0|DataForSeoBot/1\.0|claudebot|OAI-SearchBot|ImagesiftBot|PetalBot|YandexBot|serpstatbot|GeedoProductSearch|Barkrowler|claudebot|SeekportB
ot|GPTBot|AmazonBot|Amazonbot|Bytespider|Bytedance|fidget-spinner-bot|EmailCollector|WebEMailExtrac|TrackBack/1\.02|sogou music spider|seocompany|LieBaoFast|SEOkicks|Cliqzbot|ss
earch_bot|domaincrawler|AhrefsBot|spot|DigExt|Sogou|MegaIndex\.ru|majestic12|80legs|SISTRIX|HTTrack|Semrush|MJ12|Ezooms|CCBot|TalkTalk|Ahrefs|BLEXBot
badbots = Atomic_Email_Hunter/4\.0|atSpider/1\.0|autoemailspider|bwh3_user_agent|China Local Browse 2\.6|ContactBot/0\.2|ContentSmartz|DataCha0s/2\.0|DBrowse 1\.4b|DBrowse 1\.4d
|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1\.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1\.00|ESurf15a 15|ExtractorPr
o|Franklin Locator 1\.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|Guestbook Auto Submitter|Industry Program 1\.0\.x|ISC Systems iRc Search 2\.1|IUPUI
Research Bot v 1\.9a|LARBIN-EXPERIMENTAL \(efp@gmx\.net\)|LetsCrawl\.com/1\.0 \+http\://letscrawl\.com/|Lincoln State Web Browser|LMQueueBot/0\.2|LWP\:\:Simple/5\.803|Mac Finder
1\.0\.xx|MFC Foundation Class Library 4\.0|Microsoft URL Control - 6\.00\.8xxx|Missauga Locate 1\.0\.0|Missigua Locator 1\.9|Missouri College Browse|Mizzu Labs 2\.2|Mo College
1\.9|MVAClient|Mozilla/2\.0 \(compatible; NEWT ActiveX; Win32\)|Mozilla/3\.0 \(compatible; Indy Library\)|Mozilla/3\.0 \(compatible; scan4mail \(advanced version\) http\://www\.
peterspages\.net/?scan4mail\)|Mozilla/4\.0 \(compatible; Advanced Email Extractor v2\.xx\)|Mozilla/4\.0 \(compatible; Iplexx Spider/1\.0 http\://www\.iplexx\.at\)|Mozilla/4\.0 \
(compatible; MSIE 5\.0; Windows NT; DigExt; DTS Agent|Mozilla/4\.0 efp@gmx\.net|Mozilla/5\.0 \(Version\: xxxx Type\:xx\)|NameOfAgent \(CMS Spider\)|NASA Search 1\.0|Nsauditor/1\
.x|PBrowse 1\.4b|PEval 1\.4b|Poirot|Port Huron Labs|Production Bot 0116B|Production Bot 2016B|Production Bot DOT 3016B|Program Shareware 1\.0\.2|PSurf15a 11|PSurf15a 51|PSurf15a
VA|psycheclone|RSurf15a 41|RSurf15a 51|RSurf15a 81|searchbot admin@google\.com|ShablastBot 1\.0|snap\.com beta crawler v0|Snapbot/1\.0|Snapbot/1\.0 \(Snap Shots, \+http\://
www\.snap\.com\)|sogou develop spider|Sogou Orion spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sogou spider|Sogou web spider/3\.0\(\+http\://www\.sogou\
.com/docs/help/webmasters\.htm#07\)|sohu agent|SSurf15a 11 |TSurf15a 11|Under the Rainbow 2\.2|User-Agent\: Mozilla/4\.0 \(compatible; MSIE 6\.0; Windows NT 5\.1\)|VadixBot|WebV
ulnCrawl\.unknown/1\.0 libwww-perl/5\.803|Wells Search II|WEP Search 00
# This doesnt work well
# failregex = ^<HOST> -.*"(GET|POST|HEAD).*HTTP.*"(?:%(badbots)s|%(badbotscustom)s)"$
# Use this one instead
failregex = ^<HOST> -[^"]*"(?:GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d+)" \d+ \d+ "[^"]*" "[^"]*(?:%(badbots)s|%(badbotscustom)s)[^"]*"$
ignoreregex =
datepattern = ^[^\[]*\[({DATE})
{^LN-BEG}
You can test the regex against logs you have already with something like this
fail2ban-regex /var/log/apache2/access.log /etc/fail2ban/filter.d/apache-badbots.conf
This will give you an indicator on matches/not matches and similar. Restart fail2ban and check that its got the badbots enabled
# /etc/init.d/fail2ban restart
# fail2ban-client status
Status
|- Number of jail: 2
`- Jail list: apache-badbots, sshd
You will want to verify that its working by checking the firewall/other that you are using. eg iptables looks like this
Chain f2b-apache-badbots (1 references)
target prot opt source destination
REJECT 0 -- 51.8.102.147 0.0.0.0/0 reject-with icmp-port-unreachable
REJECT 0 -- 51.8.102.102 0.0.0.0/0 reject-with icmp-port-unreachable
REJECT 0 -- 3.89.170.186 0.0.0.0/0 reject-with icmp-port-unreachable
REJECT 0 -- 54.167.32.123 0.0.0.0/0 reject-with icmp-port-unreachable
REJECT 0 -- 34.206.212.24 0.0.0.0/0 reject-with icmp-port-unreachable
REJECT 0 -- 3.229.2.217 0.0.0.0/0 reject-with icmp-port-unreachable
If you need help setting this up, feel free to submit a support ticket—we’re happy to assist.