익명 15:40

AI bots crawling servers in the last months, what is the best tool or approach t...

AI bots crawling servers in the last months, what is the best tool or approach to counter?

Since February 2026, I am seeing huge spikes in (Spain located) Internet accesible servers (nginx, apache, tomcat) due to AI bots crawling for content.

Is there a recommended way to address this?

I am using the typical reactive way of automatically throttling IPs with custom scripts, but wanted to know if there is a better way.

The least proxies, software, containers I can put the best, servers already struggling due to internal AIs deployments.



Top Answer/Comment:

In case you only want to alleviate the stress caused by crawling by massive bots, it would be better to use rate limiting from your web server than to introduce more proxies or Docker containers.

As far as Nginx goes, using limit_req and limit_conn will most likely do the job for slowing crawlers without harming legitimate requests. When combined with appropriate time-outs and connection limits, it will result in saving quite some resources.

Also, it would help if you investigated the problem traffic first. In many cases, what we call “AI bots” actually declare themselves through the User-Agent field and respect robots.txt. Others might be more similar to scrapers. Some practical measures include:

  • Rate limiting requests per IP.
  • Limiting concurrent connections.
  • Serving cached content whenever possible.
  • Blocking or throttling known abusive networks.
  • Using firewall-level controls (nftables, iptables, or provider firewall rules) before traffic reaches the application.
  • Restricting access to endpoints that do not need to be publicly crawled.

Bear in mind that purely IP blocking may become increasingly ineffective since many crawlers work from distributed cloud providers' addresses. This being the case, rate limiting measures are typically preferred over keeping up to date massive blacklists.

If resources are scarce anyway, setting up rate limiting on Nginx/Apache will be preferable in terms of overhead. It does not introduce any extra elements yet protects your server from too aggressive crawling.

상단 광고의 [X] 버튼을 누르면 내용이 보입니다