Hi all .. i know i've asked this before but we are facing it again .. over the weekend our dataverse installation was being heavily crawled by someone in the Huawei cloud in Singapore (it still is) .. I could not get it to stay up for more than 20 minutes .. we do have our site going through CloudFlare with some basic WAF rules and a rate limiting rule, but it wasn't enough .. so I ended up adding an AWS WAF rule to the load balancer to block bots .. however I think this may prove to be too aggressive .. anyone have any suggestions on the best way to control crawling? we are also currently blocking some known bad bots but only a few specific ones
i believe i have implemented all of these things
I'm not sure if this helps but Dataverse 6.2 allows rate limiting of commands.
Please see https://github.com/IQSS/dataverse/releases/tag/v6.2
Your WAF solution sounds pretty good to me. But it's too aggressive?
ohhhh the rate limiting db options! need to upgrade to v6.2
yea i put the aws bot control rule in and it is blocking our OAI feed .. doing lots of troubleshooting to test which particular rule is doing it
Last updated: Oct 30 2025 at 06:21 UTC