Hacker Newsnew | past | comments | ask | show | jobs | submit | dr_um's commentslogin

A small, single EU country focused non-static e-commerce, with proper robots.txt instructions that worked perfectly well in the search & co bots -only "era" with rate limiting for nginx/php-fpm setup - is kinda struggling without CF to handle 15000 requests per 15 minutes, coming from Chrome "users" from IPv6. Best so far was an avg. server load in htop = 40 on an 8-core server x_x

That's 16.6rps. A single guy holding the F5 key on chrome can generate that much traffic and take down your website. That kind of performance was never acceptable.

People will always reframe their request numbers to avoid stating their pitiful requests per second numbers, it's hilarious. "This thing is handling hundreds of thousands of requests per day!" Like cool, you're barely making it double digit requests per second.

> handle 15000 requests per 15 minutes,

that's just ~17 req/sec

That's "cheap VPS running wordpress" level of traffic


Maybe a plain WordPress install. Run something like WooCommerce and install a bunch of plugins to get the functionality that WordPress and WooCommerce should have built-in, and suddenly a cheap VPS can only handle 2 or 3 requests per second.

It's phenomenal how inefficient the WordPress/WooCommerce stack is.

Though the main issue I'm seeing is credit card testing, not scraping.

And I'm ideologically opposed to using a CDN (because it shouldn't be needed for such a small site!) so it's somewhat a self-inflicted problem...


"Security" plugins are also HUGE problem here, most of them turns "few cached DB SELECTs" (or static file read if you use caching plugin) into now a bunch of inserts, just to log/analyze "offender" IP and maybe block it, in many cases turning "blocking offender" to be more costly that would be serving the page without the security plugin

You can calculate traffic stats for a day by IPs/subnets and probably bots will stand out. If they are using IPv6 you can figure out the ASN and block it completely.

Block out IPv6 and see if that helps.

Why not block all odd v4 addresses while you're at it? I heard that that can reduce scraping volume by 50%!

That's harder to set up, and also unfair to people who have an odd IP address.

It's easier and better to just block 0.0.0.0/1 half of the time, and 128.0.0.0/1 for the other half of the time. Switch every day at noon.

Bot traffic will be cut by 50%, and humans are all treated equally! It's a total win!


And blocking ipv6 addresses isn't unfair to people who have an ipv6 address?

Yeah, I suppose you're right.

Just block it all.


Blocking Singapore reduces the AI load 90%.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: