I’ve been in a similar situation, and I’m also blocking large ranges of IP addresses in addition to running Anubis in front of my most scraped services (Git/forgejo and Lemmy)
I came up with a hacky python script that watches my fail2ban logs, counts bans for IP ranges going from /28 to /8, applies some heuristics (based on range size n and how offending IPs are split between the 2 /(n+1) subranges) I came up with to detect ranges that should be blocked, the issues a log line that is picked up by fail2ban to manage bans of increasing length on récidive.
It’s quite contrived and I often fear it will be too agressive and block something I rely on, but it has been working really wellin my experience.
It will initially block a lot of small ranges, but over time the ranges will grow larger. Smaller ranges having a lower threshold helps it block only the narrowest ranges needed, which gives some time for larger ranges that contain them to drop out of fail2ban’s watchlist.
I should clean up this mess and make it a git repo, maybe even try to have it merged in fail2ban
I am curious to know more!
So we’re at the point where A I. Is not only stealing intellectual property, but also driving up costs for people while doing it.
At this point we need to treat AI web scrapers as DDoS attacks and prosecute the companies and people involved the same way we would those



