Hacker News new | past | comments | ask | show | jobs | submit login
Someone attacked our company (usefathom.com)
234 points by joshmanders 11 months ago | hide | past | favorite | 108 comments



Hey as a (way smaller) competitor of yours, I honestly feel you.

I understand you may not want my advice. But anyways, here it goes :)

I have dealt with similar issues too at my full-time job. We ingest data in the billions range too, and our costs are a lot lower than what you describe.

* On the public endpoint, can you move the first point of contact closer to the edge? You want to block requests before they even reach your ingestion pipeline. If not, add a few geographical Load Balancers with the sole job of accepting requests before forwarding them to your Lambda. Divert traffic at the DNS level using a geo routing policy on Route53.

* Are you also hitting DynamoDB for every request as part of your spam system? I found it quite expensive on-demand, unless you provision capacity. We used to pay tens of thousands per month for DynamoDB in the past, when a single Redis instance would have done it. Specially if it's temporary, non-critical data such as spam detection.

* Do you batch the incoming events before triggering SQS?

* How are your networking costs going? They tend to creep up in the AWS bills too.

It just sounds to me you clearly need something in-between the public endpoint and those Lambdas. But maybe I'm missing some info. Otherwise you're going to keep paying for capacity that a "traditional" server could handle, without costing you more for each request.

If you can't fight back, and don't want real requests to be lost, put a server at the battle-front, and buffer all raw HTTP requests into a queue made for this (like Kinesis). You can then process events and recover the data at your own pace. I did this in the past, and could handle 60k+ req/s with 2 instances on AWS. I used a simple Go tool to capture the traffic https://github.com/buger/goreplay

Also, at my full-time job we use Kinesis to buffer all analytics events, it's cheaper than SQS and handles billions of data points per month. This also keeps the ingest rate constant, but you seem to already do that with your workers.

Full disclosure: I'm Anthony from http://panelbear.com/ , and I just want to offer honest help.


Additionally, be sure to check how your script behaves with SPAs and the browser history API. You seem to be binding to the browser history as-is, and might be firing lots of non-page view events.

It's not just a theoretical case, I saw this for many customers with Panelbear. Many JS frameworks use the history API in ways you may not be considering.

That's why I would recommend de-duplicating events on the client-side before sending them to your ingestion pipeline.

But that is DDoS matters aside.


> We will not let a lonely nerd attack our business

How do they even know it was "a lonely nerd"? In fact, it's much more likely that it was carried out by a well-socialized team of shady professionals, on a commission from competitors.

It's 2020, people, can we drop it with the "Hack3rs" stereotypes...?


> It's 2020, people, can we drop it with the "Hack3rs" stereotypes...?

Well, stereotype accuracy is one of the most robust and replicable findings from psychology research...


Citing the same field of study that came up with and prescribed lobotomies doesn't help your point.


While more likely (why would a random nerd would have such a beef with them?), stating that without actual evidence is borderline conspiracy theory. I wouldn't have mentioned the hypothetical lone nerd at all, but I figure he was so angry he had to picture someone to be angry at.


I’d have been happy with some generic “shady characters”. It just feels lazy (and bigoted) in 2020 to use this sort of negative stereotype, particularly when coming from someone in the business. One cannot deal with a threat effectively if one mis-identifies the actual profile and capabilities of such threat.


If it's more likely then why can't it be used as the starting point? What's the conspiracy theory?


Evil competitor hired shady cybercriminals to put him out of business. It's the most likely explanation, but it looks weird if stated without actual evidence.


> I don’t know anybody who has signed up for this $3,000/month service from AWS… it’s called AWS Shield Advanced. The big value of this service to us is that we have access to some of the world’s best DDoS mitigation experts. In the event of an attack, we can page them, and they’ll help us mitigate the attack, creating firewall rules, identifying bad actors, and offering advice. So instead of just two of us responding to DDoS attacks, we have genius engineers we can speak with, and that feels good.

Just amazed at the amount of money and time that people are willing to give AWS so that they don't manage dedicated servers themselves, I mean, paying $36k/year just to have someone manage firewall rules that's plain laughable... not to mention this is happening on "hacker news".

> They're competitors of ours and it's just not a good fit.

I sincerely hope that AWS doesn't start an analytics product and become a "competitor" because that sounds like you're going to have to rewrite all your software outside of AWS, and it sure seems extremely locked in ...


You're trivializing the value they provide. If mitigating large scale DDoS attacks was as simple as adding a few iptables rules they would have solved this weeks before paying AWS $3,000 a month for what is effectively an insurance policy.

The reality is that AWS has terabits of bandwidth and can mitigate these attacks upstream from your servers, which have bandwidth measured in gbps, not tbps.

So, unless you have a global network the size and scale of Amazon or Cloudfront, no, you can't just mitigate these attacks with a few firewall rules.


If someone is going to throw 10gbit ddos at your home ip - how few firewall rules are going to help?

I am not very well versed in ddos, but that's just traffic generated from thousands of bots. Its still takes bandwidth and will literally overload your link.

I assume AWS has multiple upstream links where the traffic is coming via multiple links and its easier for them to handle than for someone with one or few upstream links.


I get why they don't want to use Cloudflare but reading that gave me faith in why we offer the free DDoS service we do. The expense and hassle of dealing with a DDoS and then the expense and hassle of dealing with AWS looks awful.


As a one-man SaaS, Cloudflare has been invaluable for me.

I'm currently still on your free tier, but would be happy to pay when I need to.

Just wanted to say, thank you!


Dealing with AWS hasn’t been hassle but is expensive. However, the value proposition is well worth it for us.


I'd be checking for bugs in the js -- particularly as might be triggered from a bot environment that one or more of your customers might use to test their site or observe their sites latency response ...

puppeteer with setRequestInterception configured to block responses in the right way maybe https://github.com/puppeteer/puppeteer/blob/v5.5.0/docs/api.....

I can imagine nightmares along those lines that could persuade some normally functioning js client logic that works normally when in a normal network environment to continually reissue failed requests when browser requests/responses are blocked in just the wrong way -- that could cause the same client side ip to appear in logs over and over again.

And then ip variability could be the customer spinning up browser bots in various regions around the world in order to measure their sites region specific latency response.

Just offering up an alt theory -- as I'm with some of the others -- not sure the graphs shown immediately scream dos to me.


I fully believe this was a DDOS attack. But, this is good advice. If you think you're getting DDOS'd, check for the number of unique IPs against your own user pool. Probably a lot more people introduce bugs into their codebase that DDOS's their own app than get repeatedly targeted by a well-funded attacker as above.


The data was full of spam & was targeted at high profile customers. It was also run every hour for one period then ramped up to 500,000 concurrents. It was a DDoS :(


Loved this post, thanks for the detailed write-up. As a solo-SaaS-founder I have a persistent, but very mild anxiety of something like this happening. The attacks we've received have been comparably mild and easy to mitigate.

Boggles my mind who the heck thinks Fathom is valuable enough to target with such a persistent series of attacks.


Interesting that they won’t use Cloudflare due to competition in the analytics space. It makes me realize that there actually isn’t a legitimate competitor to Cloudflare and they are dominating right now.


There isn't any competition in the free-ish space, no. But there are some big-boy options: Incapsula/Imperva, DOSarrest, Akamai, SiteLock, Radware, and a few other smaller players. These are complex setups where you at minimum need to be running an AS and own your IP space and infrastructure -- you get all sorts of cool solutions like sending (router) flows to them to monitor your traffic, and when an attack is detected they'll automatically advertise your IP space for you, scrub traffic via witchcraft and GRE-tunnel your clean traffic back to you. But don't bother contacting them if you have less than four figures to spend monthly -- if you don't have at least a few cabinets of gear (if not your own DC) you're not the target market.

Cloudflare is dominating the "consumer-grade" market, but not really past that.


I work at Cloudflare. We offer protection for the "complex setups" you describe with BYO ASN, IP packet filtering + forwarding, and GRE (or PNI) for traffic return. And we're very good at it.

https://www.cloudflare.com/case-studies/wikimedia-foundation is an example deployment covering Wikimedia's datacenters.


You had me until the last sentence. I would urge you to do an nslookup on the top 10,000 alexa domains. You would be surprised at how many point to Cloudflare DNS.


Isn't Netlify a major competitor to Cloudflare?


As a customer of both, everything about the way Netlify operates seems half-assed.


Vercel is good too. And I've heard great things about BunnyCDN. Cloudflare does a lot though.


Bitmitigate is also a competitor.


In the same way that kitfox is an airbus competitor?


BunnyCDN is amazing. But their product isn't a 1:1 replacement for Cloudflare.


What would the motivation be for such an attack? Any thoughts on who could be behind it?


I've seen a lot SEO motivated DDoS attacks in very high value money-keyword niches. An attack basically downs a site for an extended period of time. GoogleBot notices the slow speed and logs dozens of 500 errors, which can result in dropped rankings.

I did a quick check for usefathom.com though, the only somewhat interesting keyword would be "google analytics alternative", but fathom ranks on page 2, which imo would not really justify an attack. Maybe there is some other long-tail keyword with a high enough value though where usefathom.com pushed small niche players out of the top positions.

Also a DDoS is obviously bad for the user experience of paying customers, nobody wants their analytics service to be down or have missing data bc of downtime etc.


Perhaps AWS needs to sell AWS Shield Advanced? Ha ha, I'm probably too paranoid.


You and me both, fargle. When the attacks started, I thought someone was attacking us just so I would write a technical blog post.


And why? What is "Fathom", what do they do that could have triggered such an attack? Or was that an attack to deliberately force them to turn to Cloudflare for "help"?


Some proxy engaged by the competition?


Since they seem to be some sort of analytics service bundled into client webpages, it's possible they are getting incidentally slammed by someone trying to attack one of their clients.


I've seen something similar actually for our client-embedded web plugin, a web crawler hits a client's site and I see 6x traffic hit AWS WAF periodically - fortunately the brunt is rate-limited.

I'd be almost certain that AWS' staff looked at the logs to determine this, they would've sorted it in 20 minutes by seeing a load of requests coming from a specific host.


PHP Laravel on Amazon Lambda to count pageviews? Are you sure it's not just a regular customer and not a DDOS?

> privacy-first analytics solution [...] The only downside of this is that we need to keep access logs (IP & User-Agent, no browsing history) for 24 hours

Keeping IPs don't make you super privacy friendly.


Awesome, I'd love some advice if you're willing. So here's our situation.

* We're getting hit with a huge DDoS attack, repeatedly over 3 weeks, with no sign of stopping

* With zero access logs, there was no way to find patterns in the attack, and we had no way to block it

* Our service was going offline during these attacks

* We introduced access logs that are auto-deleted after 24 hours. We redacted all information about the site/page/activity etc. but keep IP & User Agent for pattern matching

* We were then able to identify a pattern and block the attack on Saturday

* Without access logs (even redacted ones), this wasn't possible

I was hoping a more senior engineer on Hacker News would comment and I can't wait to hear how you'd do it. I have no experience in DDoS protection at all, and this seems like the only possible way. Even rate limiting requires storing IP addresses. But if you know a more privacy-focused way to block these attacks, I'm sure I'll buy you a few beers when we hang out.


> * We're getting hit with a huge DDoS attack, repeatedly over 3 weeks, with no sign of stopping

I've read the full blog post, I am not convinced it's a DDoS attack. Traffic patterns for web analytics will come from over the place and will look like a DDoS when it's not. For example, a customer misplacing their analytics in a JS loop and having a moderate traffic blog will generate billions of requests from all over the globe. Event tracking can be billions of requests as well by themselves. Never attribute to malice that which is adequately explained by simpler means.

> * With zero access logs, there was no way to find patterns in the attack, and we had no way to block it

It's a loss of time and energy. Your system should be able to handle these billions of requests.

> * We were then able to identify a pattern and block the attack on Saturday

Was it specific accounts?

> * Without access logs (even redacted ones), this wasn't possible

You can one-way hash the IP. So you can still look for pattern but you've lost the actual IP. And same one-way hash IP can block whichever IP seems devious in your firewall. (like md5 can be enough.)

> But if you know a more privacy-focused way to block these attacks, I'm sure I'll buy you a few beers when we hang out.

Haha no need to. But come say hi if you ever in Austin. julien _at_ serpapi.com.


> You can one-way hash the IP. So you can still look for pattern but you've lost the actual IP. And same one-way hash IP can block whichever IP seems devious in your firewall.

The plaintext space (amount of possible IPs, even more so for IP blocks) is so small that that you can try all the possible plaintexts within seconds, essentially reversing the hash.


You can store "possible attacking" IP's in a counting Bloom filter and start blocking when it saturates. Random false positives are probably not an issue for an analytics service that's inherently lossy anyways.

By using Bloom filter you avoid storing IP's, and save tons of space. They're basically required for rate limiting in IPV6 where the address space is so large an attacker can just run you out of ram by using so many different IP's.

It's also pretty common to have several thresholds of blocking based on individual IP vs subnets. So if more than 2 ips get blocked in a small subnet, you block the whole subnet.


Just off the top of my head, haven't given this much thought.

Could you salt the hash with something way less predictable that gets discarded when the next one is generated? Assuming you only kept logs for a day for example you could regenerate it daily and store $salt somewhere only the most senior of senior techs can access it (if anyone at all)


Then a longer to run hash function, like 256 times sha256 with a secret salt. It's reversible in theory but it will take a lot of resources.


You can’t afford to do that to every 1500-byte packet at 10-100 Gbps.


It was 100% a layer 7 DDoS, the attack was targeted and malicious. I can't say too much but the AWS team confirmed it. But I appreciate how something like you describe could happen.

I like the idea of an MD5 hash. Although I'm not too certain why an IP would be a bad thing to log for 24 hours. From a privacy law perspective, the MD5 hash is considered PII. And if we see an IP address in an access log, we know that an IP visited one of the tens of thousands of websites Fathom runs on, but we don't know which one.

Edit: Something else, with MD5 there's no way of finding patterns with the IPs, so you'd have to play whack a mole. Whereas raw IPs allow no real privacy invasion whilst allowing pattern detection


> It was 100% a layer 7 DDoS, the attack was targeted and malicious. I can't say too much but the AWS team confirmed it. But I appreciate how something like you describe could happen.

Layer 7 just means they are making ton of HTTP requests from lot of IPS however this is the nature of a web analytics.

> we know that an IP visited one of the tens of thousands of websites Fathom runs on, but we don't know which one.

How can the attackers know which websites have your analytics on? Crawling the web is super hard.

You should be able to get back which accounts are affected from your analytics call:

    https://starman.fathomdns.com/?p=%2F&h=https%3A%2F%2Fusefathom.com&r=&sid=BIABKBRK&res=1440x900
Isn't sid=BIABKBRK the account? The access log should store this `sid` allowing you to just block the account. (or ask them for ton of money if it's a legitimate use :) )


> Layer 7 just means they are making ton of HTTP requests from lot of IPS however this is the nature of a web analytics.

Exactly! Which is why it was so hard. There's no path pattern. Everything hits "/". So the only way to fight back is to match IP / header patterns (but even then, we have to redact sensitive headers).

> How can the attackers know which websites have your analytics on? Crawling the web is super hard.

The attacker went after some of our more high profile customers. They're known via testimonials or from Twitter.

> The access log should store this `sid` allowing you to just block the account. (or ask them for ton of money if it's a legitimate use :) )

We could certainly temporarily block traffic to a site. The problem is, without some kind of firewall (e.g. WAF), our application has to absorb so much traffic, and that's the issue. We need to block it at the edge.


> The problem is, without some kind of firewall (e.g. WAF), our application has to absorb so much traffic, and that's the issue. We need to block it at the edge.

I think you need some high performance ingestion code.

One day somebody is going to load test or misconfigure their site. Or you'll get a really big client. Or somebody will get Reddit frontpaged or slashdotted. Blocking huge volumes of traffic isn't always going to be an option. You should be able to handle it.

I would get off all this "sexy" stuff like lamdba and SQS and build some old school battle boxes at a co-lo. Run some extreme performance framework like Actix or Vert.X built to handle giant piles of traffic. A single box with those frameworks and a 40 gig line can handle millions of requests a second.

Use counting bloom filters synced between boxes for IP blocking. This avoids saving IP's and is needed to prevent ram exhaustion attacks anyways. Use two layers, one for individual IP's and a second for blocking subnets with lots of bad actors in them. Block for ~24h by using dual sets of bloom filters in a sliding window with 12h overlap where IP's get added to both. When a filter is 24h old, discard it. This needs to be done because you can't remove items from Bloom filters, they saturate over time.

On these super boxes you can do filter, blocking, batching, and dedup. Then send the data off to wherever for further processing.

You mentioned not wanting to use Cloudflare because they're a competitor. That's fine but I would take a look at their blog. They go over the tech they use for mass data ingestion and filtering, and it's basically what I just described.


> For example, a customer misplacing their analytics in a JS loop and having a moderate traffic blog will generate billions of requests from all over the globe.

Potentially, but then customers don’t see a ton of spam on their dashboard. Just a lot of repeated requests.

They don’t come in random waves either. Valid traffic is pretty well spaced out.

> Your system should be sble to handle these billions of requests

Potentially, but you really don’t want to pay for them. The only real way to go about stopping it is blocking offending ips.


I would probably start by looking for third-parties to help manage to problems, at least unless they all end up at the application layer. Throw it beyond Cloudflare and talk to them, and if they're not the right fit, try someone else.


There is nothing wrong with this approach but exactly this mentality lead to the actual centralized internet where only a few peers handle most of the traffic.


They're still going to need to process IP addresses or some kind of PII though, that's the thing :(


I see, I had misunderstood the extent to which they wanted to remain privacy-focused. Definitely can be a lot more work then when you can't outsource that stuff.


API spam is a big problem in Google Analytics too, you might just be big enough now to have become a target. I think the answer is to build a spam filter. You could take the ones made for email and adapt them for analytics requests possibly. This seems like a real place where ML pattern recognition models would shine. Any realistic spam filter will involve sampling a fraction of traffic for deep analysis and IP banning badly behaving subnets for ~24h. And it's definitely better to shadow ban, but smart assholes will have an account to see if their spam is getting through as well.

I wouldn't worry about short-term IP caching. AWS's upstream load balancers and your own servers are probably doing it anyways to maintain TCP state tables. Linux kernel's "conntrack". If you don't want to cache IP's you can you a probabilistic data structure like a bloom filter synched between instances. If has a small false positive rate but is very fast and doesn't store whole IP. Bloom filter based IP filtering is used in every big DDOS prevention system I know of.

As much as you like lambda, I would ditch it. And the queues. My general advice is that any time you need to add a work queue to something, it's not fast enough.

Your analytics endpoint data ingestion should be something lightning fast like Go, Rust, or an async Java back end. Analytics is a lossy process, you lose traces because of browser behavior and plugins all the time anyways so I wouldn't prioritize 100% accuracy.

I would focus on power/dollar over reliability. If I was you, my ingest boxes would be load balanced with DNS round robin and sitting at various Colocation providers. Get a fat 40 gig unlimited data pipe. Build some stupid fast Rust/Go/Java backend that can saturate that pipe. And do all your filtering/spam analysis here.

I don't think lambda, SQS queues, PHP are the best technologies for this kind of mass data ingestion. I don't even think your ingestion layer should be on AWS. I would follow the lead of other companies doing mass data ingestion and build your own machines. That's how CloudFlare, Netflix etc are able to handle so much traffic without going bankrupt.

I would consider yourself lucky that your first DDOS was so small. 10k requests/sec is tiny. ~400k/sec can be generated on a regular desktop with fiber internet connection. Right now, a single user could knock you offline by messing around with JMeter. I think it's a wake up call that you're in the infrastructure business whether you like it or not, and you need to massively beef up your data ingestion layer. Realistically, you should be able to handle ~50 gigabit attack with 10 million requests/sec. I think that's achievable with a couple boxes colocated on 40 gig lines running fast software


yes, they really need to architecture their service. they don't seem to have a reliable scaling up plan.

I've had a similar ddos attack which manged to bypass the cloudflare protection and our app handled it without $0 increase in bill.


Yeah. A single big client or even a viral video on a single page will knock out their systems. 10k requests/sec when there's 3 billion people surfing the web isn't enough. I would hate to be in their position right now. It's gonna be tough to engineer this with a 2 man team


> Are you sure it's not just a regular customer and not a DDOS?

Please be respectful of OP, he put a few weeks working on the issue, it is not fair to tell him that you think everything he tells is shit, only because you don't believe what he says


You have to consider OP might have been misled by the Amazon Sales team to buy their $3k per month DDoS protection.


Never expected to see someone defending me on Hacker News. I appreciate it. To clear things up:

* I found AWS Shield Advanced organically

* It was a DDoS attack

* I am incredibly happy with the personalized service. It’s like hiring someone to handle it, except you have people on call 24x7


> It’s like hiring someone to handle it, except you have people on call 24x7

This is what confuses me out of a lot of these replies, saying you're being swindled for paying for this "absurd service" when you should just "own your infra yourself and hire people to manage this"

Do they think owning all this yourself and hiring specialized DDoS people is gonna cost less than $36k/yr? Because I don't think it will.


It's incredible that C10K was achieved in 1999. http://www.kegel.com/c10k.html


Since the article doesn't define C10K (10K concurrent connections) https://en.m.wikipedia.org/wiki/C10k_problem


Advantage comes with those who attack, not defend. Attackers will always find new ways to attack and eventually break-in. Be ready for this. As history shows, crackers (DDoS'ers) always win if the price for your head is high enough (money or fame).


Could you elaborate on those historical experiences with DDoS?


Thanks for the write up.

A few thoughts:

1. Build a “live status” dashboard for your processing system so you can see the profile on the IPs or customers being affected, so you have more visibility into the issue when it happens

2. Can you just put the system behind cloudfront and process CF log files instead of having to invoke lambdas?

3. Identify the affected customer and put their lambda results in a separate queue, to help limit blast radius if it happens again from other customers

4. Double check there wasn’t throttling because downstream services were throttling lambda (you mentioned SQS?)

5. Make your lambda even faster (if it’s 100ms make it 10ms or less); can you move your logic out of lambda? Can you instead of inserting into SQS log to cloudwatch and then read its data? Or ensure no cross region lambda <> SQS invocations, etc

6. Consider replacing lambda (that cannot batch process invocations) with app servers that can absorb and then batch send many requests to SQS (let them absorb and coalesce a large amount of requests into aggregate SQS messages at the edge); going a step further eg even uploading aggregated data files to S3 for later processing then put a single message on the SQS queue pointing to that file


Being DDoS’d is expensive. That’s one thing that most cloud providers make difficult to protect against: how to prevent a random attacker from depleting my bank account.


At present, we put all incoming page views into SQS

Hmm, not the most economical of choices, but...

We had decided that we were going to simply increase our lambda concurrency limit to 8,000,000 requests per second (800,000 concurrents) and handle the spam attacks

... wait, what? If you're using Lambda functions for something as trivial as logging page views, you evidently have more money than sense.

The author is concerned about this story reading like an advert for AWS Shield, but to me it reads like a case study in when "Serverless" is a bad idea.


My main thought here is how important backpressure is for managing queues. We all hate to admit defeat, but sometimes shit is broken and you just have to say "sorry, service not available" and drop the requests on the floor. No buffer is infinitely long.


And to add on this, the plan moving forward is to default to writing data to the appropriate places (aim is < 100-200ms) but fall back to SQS when services are unavailable.


Lambda concurrency wasn't actually increased to 800,000 concurrents. It's a stupid idea that someone would think up after fighting attacks all day.


Regardless of what you set Lambda concurrency to: Are you seriously invoking Lambdas for every page view?


We are indeed. Neither of us enjoy DevOps so we pay a premium to not have to manage servers. It brings a huge mental health benefit (we are a two person company), we're profitable and our monthly Lambda cost isn't that significant. Honestly, the biggest inefficiency (in terms of cost) is our use of SQS/RDS, which we are ditching soon.


Well, it's your money...

I'd be happy to help make this more efficient though (at no charge of course). Offhand I'd say "web server which accumulates data and uploads it to S3 every N requests or M seconds" would probably get you what you need at a tiny fraction of the cost of "lambda which posts to SQS". Create an AMI and toss it at an autoscaling group and you really won't need to worry about scaling issues either.


I bet you could set up something way cheaper. And I'm certain you know 10x more than me regarding servers, hardening, configuration, etc. And I'm certain you enjoy servers!

For us, the cost works and we have appropriate margin for it. The cost savings aren't worth the extra "we have to monitor these servers" thoughts. Our approach is 100% emotional.


Whatever works for you. :-)

The offer is open if you ever change your mind (e.g. if the cost becomes annoyingly large when you scale up further).


ApiGateway -> kinesis stream -> lambda consumes batch of N page views. You keep the serverless approach you like, and it will cost you a fraction of what you currently spend.


I saw Kinesis stream today with DynamoDB and I will have to review it :)


Hey, in the most non-antagonistic way possible -- do you know who "cperciva" (Colin "Did you win a Putnam? Yes, I did." Percival) is?

You might want to take them up on the offer, you don't get an opportunity like that everyday.

Absolute legend.


"AWS Hero" is probably more relevant than "Putnam Fellow" here. ;-)

But seriously, it's clear making this more efficient isn't a priority for them, and I completely understand and respect that. I ignore plenty of good advice for the same reason: "Working on more important stuff in Tarsnap right now".


Haha I didn't know who he was. But the fact Alex Debrie follows him is certification enough for me. And I was wrong. Looking at his Twitter, it looks like he knows 2000x more than me about servers.


If you want more blog posts for Hacker News, take him up on the offer and then blog about it: "Cperciva replaced our millions of Lambda invocations with one m6g.medium and now I'm stuck as a FreeBSD admin!"


I would say this goes beyond emotional. You only have finite resources. You have chosen to spend some money to avoid spending time and effort. That's a very reasonable call. Businesses do this all the time when they hire more staff.


Respect your work and the kind offer you made on this thread, and I broadly agree with your suggestion, but I was intrigued as to what the Lambda costs would be in relation to their account charges and if under normal circumstances using Lambda would be a bad fit here.

Fathom has a $34/month plan which handles up to 400k pageviews per month. 400k Lambda invocations is 16 cents, if they're doing it efficiently. Transfer pricing seems negligible also at only ~2KB a request.

I'm pleasantly surprised, actually, but of course when someone throws millions of improper requests at you every second.. I guess that's when the game changes ;-)


If you were charging, how much would you charge to set up a server that you literally never have to bounce to get it back up?


That's the wrong approach -- you're going to want to be able to reboot (or replace) servers for security patches if nothing else.

But if you're regularly rebooting servers because they stop responding, something has gone very wrong.


Great you guys suffered, learned and survived this attack.

You have done what only few small shops would or could and this knowledge will be a valuable asset. You can expect a lot of requests for help / information based on your write up. Prepare some canned answers and a cool presentation you can whip out when needed.


It has been proposed that having a robust monitoring and alerting system would significantly reduce the impact of a DDoS.

What would be an advisable approach for a start up with limited resource in a situation like this?


I feel you, it is horrendous seeing your work being torn up like that. Having anything public-facing on the internet feels like running a liqueur store in a bad neighbourhood.

I am the co-founder of a SaaS in the higher-ed-tech sector (universities, 300+ of them). Going to make sure we implement support for Fathom (in addition to GA, GTM, Matomo) and reach out to our customers. Hope it helps a bit. Our users would also benefit from stuffing less data to G.


That sounds incredible. We’re used in a lot of universities right now and it would be great to see more


One simple way to stop a lot of shenanigans such as this is to block Tor exit nodes from connecting to your services.

https://blog.torproject.org/changes-tor-exit-list-service

While some attackers have access to compromised home routers and IoT devices, most script kiddies do not, so they use Tor (because they don't want to get caught).


Is it really possible to do a DDoS attack over Tor? I don't think the network is fast enough to do such an attack. You lock out a bunch of privacy-conscious users by blocking Tor.


This is rubbish advice when talking about attacks as large as this.


There are many other Analytic companies around, Plausible, Simple Analytics, GoatCounter.. etc

I think they need to take notes.


2 questions. 1) Who stands to gain from doing DDOS attacks? Either the business competitors who wants to put you out of business or the "protectors" themselves (so they can sell their DDOS protection service"?? 2) Can you use AWS services to launch DDOS attacks?


I was thinking in some momment the culprit was identified and received a visit from fat Clemenza "Hello Carlo". Are there any way to identify who's doing this?


I am just curious to know if any effort has been made to find out who was behind the attacks? at least more information about it?


Can’t talk about this publicly, sorry :(


I don't fully get how you can DDOS this service as it seems you need to give your credit card to sign up. Can't they block the API keys with issues and be done with it?


Public access point. The server accepts pageviews, so they DDoS'd that.


The article mention the use of lambda/serverless. If he had a server many of these issues would not exist in the first place :)


And I does not looks like a DDoS either, this amount of connections is quite normal for mid-sized apps, and connections can be kept open in case you app is slow, making your stats higher than it actually should be.

Btw, two servers plus a bit of architecture simplification solves this and cut the costs down to ~$400 or less.


We're going to see a lot more of such attacks (Denial of Capital?) as engineers blindly throw more and more SaaS components together without any sort of rate limiting in place, especially so when those endpoints are publicly accessible and tied to your account (e.g. Firebase or Algolia requests that are billable to your conveniently included client-side API key).

And in all honesty it's a lot easier to take a site offline through depletion of budget than trying to exhaust an infinitely-scaling service.


Cloudflare at least will refund any extra costs incurred from their serverless product if you get DDOSed


Same with AWS Shield Advanced :)


What is that, $3k/mo?


Yup with a minimum 1 year commitment.


Need to have some pretty solid expectation you are going to be hit again at that price point.

That said, it’d be a no brainer at enterprise level.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: