Stream: troubleshooting

Topic: Heavy crawling on site causing crashes


view this post on Zulip Deirdre Kirmis (Apr 08 2024 at 15:53):

Hi all .. i know i've asked this before but we are facing it again .. over the weekend our dataverse installation was being heavily crawled by someone in the Huawei cloud in Singapore (it still is) .. I could not get it to stay up for more than 20 minutes .. we do have our site going through CloudFlare with some basic WAF rules and a rate limiting rule, but it wasn't enough .. so I ended up adding an AWS WAF rule to the load balancer to block bots .. however I think this may prove to be too aggressive .. anyone have any suggestions on the best way to control crawling? we are also currently blocking some known bad bots but only a few specific ones

view this post on Zulip Deirdre Kirmis (Apr 08 2024 at 15:55):

i believe i have implemented all of these things

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 08 2024 at 20:20):

I'm not sure if this helps but Dataverse 6.2 allows rate limiting of commands.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 08 2024 at 20:20):

Please see https://github.com/IQSS/dataverse/releases/tag/v6.2

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 08 2024 at 20:21):

Your WAF solution sounds pretty good to me. But it's too aggressive?

view this post on Zulip Deirdre Kirmis (Apr 08 2024 at 20:32):

ohhhh the rate limiting db options! need to upgrade to v6.2

view this post on Zulip Deirdre Kirmis (Apr 08 2024 at 20:35):

yea i put the aws bot control rule in and it is blocking our OAI feed .. doing lots of troubleshooting to test which particular rule is doing it


Last updated: Oct 30 2025 at 06:21 UTC