Hacker News new | past | comments | ask | show | jobs | submit login
A cryptocurrency company had a $65M bill, per Datadog’s Q1 earnings call (twitter.com/turnernovak)
97 points by slyall on May 6, 2023 | hide | past | favorite | 213 comments



Datadog prices are out of this world. I assumed their front facing prices were for the lone dev, so when I reached out as a company looking to integrate multiple of their products I expected a deal, and got very little compromise. I told them up front they were going to have to cut prices by 90% for us to consider them -- no budge. And they are pretty belligerent salesman, not wanting to leave me alone. After a while I just blocked their domain to my email.


Datadog startup journey from my perspective:

1. use Datadog, because it gets you a bunch of stuff without having to really set it up, like anomaly detection, which is poor man's monitoring & alerting

2. once you start getting product-market fit and the number of instances you run grows you notice your monthly bill going crazier and crazier and you now have something that starts to resemble an ops team -> migrate to a different product, set up proper monitoring

I've seen that at 3 companies I worked at.


Right, the monitoring cost with datadog of my little business would be 7 figures (and revenue is only 7 figures!!) I mean, I guess they want to exclude people like me.. but then why market to me?


We're going through this now. What other products are out there, preferably easy for a moron to set up?


You should check out SigNoz (https://github.com/SigNoz/signoz) - It's an open source alternative to DataDog with metrics, traces and logs in a single application. You can just self host it yourself or try the hosted version.

PS: I am one of the maintainers at SigNoz


we're working on one! based on open telemetry (no vendor lock-in) and clickhouse (which makes it cheaper/faster)

it currently supports logs, traces, session replay/RUM, alerts and dashboards - would love to hear what you think :) https://www.hyperdx.io


Check out Grafana.


I'm not familiar with pricing but depending on alternatives maybe this isn't so bad?

> and you now have something that starts to resemble an ops team

Like what if Datadog just replaces your ops team completely? What if we start to see AI tools that do cost a lot of money but they can replace a team? Just curious.


Datadog does one part of an ops team, even if it could do all functions of an ops team I would make convincing arguments about your sovereignty of fixing your issues and being resistant to price gouging


Datadog costs more to monitor your AWS t3.medium instance than the actual instance.

I asked them how them can justify that.

They recommended I use "modern infrastructure" which means Docker.


(Disclaimer: I work at Chronosphere, a Datadog competitor) This is a big issue in the observability space. We have written a few blog posts on this, but basically it’s easy to fall into a trap where cardinality and high dimensional monitoring causes your metrics to pop, causing costs to skyrocket. You have a few experiments, are running a bunch of smaller k8s pods per cluster and whoosh! you might be looking at millions, rather than thousands of time series that you're sending to your provider. Most vendors won’t provide tooling or suggest ways to reduce these costs, b/c they have no economic incentive to do so…. Anyway, bottom line is that no one should have to pay more to observe a service than to operate it.

Also: it’s 2023. Every company needs to be getting compatible with open standards like OpenTelemetry, Prometheus, etc.


Some of the vendors in the space are absolutely going in on OTel. DD seems to be working actively against it.


> They recommended I use "modern infrastructure" which means Docker.

It looks to me they recommended to never consider Datadog for anything whatsoever, ever again.


Sounds like New Relic 2.0


not even close. NR doesn't bill overages, doesn't do per-host. we switched and saved tens of thousands monthly


Was Datadog pricing per-host? If so, then I guess running a Kubernetes cluster using the biggest available instances is the most "modern" solution to Datadog-using infrastructure.


It is mainly per host $15 or $23 PCM with the first 5-10 containers free then $0.002 per hour (~$1.5 PCM) per container. The insight and stats you get are quite granular and valuable however. For large scale deployments you can ignore certain containers etc.


Per host and/or per container


Basically, yes.


Their sales team is a nightmare, I had to block them on personal and company email.


Just wait until they find your personal cell phone number and try to force you into scheduling a "time that works" on the calendar.


Just wait until they find your wife's phone number. (OK, this wasn't datadog, but some similarly aggressive headhunting agency went from calling me to calling my wife - who is in no way affiliated with my company - to ask her if she could tell me about how they could help me in hiring, yadda yadda.)


I had something similar happen with a Facebook recruiter. I didn't reply to her first two emails so she started emailing my mom to try to get in touch with me. My mom called me because she thought it was a phishing email. I had never had that happen before.


I would contact the police about that.


[flagged]


Go home ChatGPT, you are drunk.


I have started invoicing companies for wasting my time and also threatening them with CAN-SPAM complaints for failing to include an opt-out, an actual mailing address, etc.

Then, when my invoice isn’t paid, I threaten collections on them personally and the company. Usually that solves it. Then I’m “such a dick” but highly effective in recovering my time.


Please make this into an automated business.


That's a great approach and I'd also be curious to know if somebody has paid you, my immediate guess would be "no" - but you never know


Wait, have any companies actually paid you?!


Please answer OP


Just keep scheduling a time that works, and then don't show up.


Back in the days when this kind of thing happened on the telephone, the SERIOUSLY passive aggressive trick was to talk to them, and then hang up on yourself. Repeatedly.

"Hi this is Arnie from CHewemup'n'Spitemout Staffing, is this Bob?"

"Hey Arnie, what perfect timing! I just started looking for a new opportunity, and I'm really excited to— CLICK."

Ring ring.

"This is Arnie, we seem to have been cut off."

"Oh Arnie, right, thought you hung up on me."

"No, not me, must be a bad connection. You were saying?"

"Yes, I was saying this is a great time to talk about opportunities. I just finished a major Java Enterprise JavaBeans project, and I'm— CLICK."

Lather, rinse, repeat, as the meme used to go before we called them memes.


Jolly Roger telephone company is another fun choice. At ~$20 /yr seems not terrible for a bit of fun.

https://jollyrogertelephone.com/

I also will give out the number to local strip clubs for sales people which seems to be confusing to them.


oh man, that's great-- you could probably get through a LOTT more CLICKs before they get the point. what a great use of human psychology-- they assume you have good intentions because you called them... it's a bit devious, but i'm gonna have to add that one to the toolbox hahaha.


Man I bet you could write a really dumb script that automates it too.


Make sure you include a logger in that script to catch any anomalies


I reported them to the FTC. No idea if it did anything but they never called back.


I briefly worked there. They literally have no idea what they're doing, like not even a little bit.


I think the vast majority of us don't either


[flagged]


Ive worked in sales for over 10 years. Incel is a weird insult for salespeople, as I've never met a group of people that sleeps around more than salespeople. I spent years selling cars even, and if you bring a gf or wife you're having trouble with to go car shopping... Odds of her later sleeping with the car salesman are not zero to put it mildly.


Nowadays, incel is synonymous with the Andrew Tate “top g”/alpha sex obsession. Backwards? Definitely, but more symbolic of the arrested development/lack of maturity than the lack of sex.

Sleeping with others’ significant others, or actively trying to/bragging about it is a perfect example. Low-class behavior that will get you props from low-class folks.


No one, or atleast very few, were sleeping with someone's significant other. Who wants to do that and then be a sitting duck at work where the angry bf/husband can walk in and see you any moment. When the women are ready to leave the bfs, that's when they contact the salesperson. I've seen it happen atleast 10 times, 2 to me personally. Even dated that nightmare for 2 years once.

But go ahead and label everything you misconstrued as low class.


Never mind what was misconstrued or not. The behavior I mentioned is low class.

If you feel it applies to you/your experience, whatever, but it’s not a personal jab.


This is a weird way to explain away a mistake. "Ya I totally strawmaned what you said, but look, my point is valid!"


I've certainly heard that fans of Andrew Tate are incels, or that their world view and advice is built for incels, but I've not heard of many people calling Andrew Tate himself an incel.

An incel maker though for sure.


I'm far from an incel (thank you height and sales skills), but I've been around the 'manosphere' online even since early "Ladder Theory" and before pickup artists learned how to internet market.

Reality is some guys are not genetically blessed PLUS they have been lied to when it comes to what women really want. If you take an average real incel, and give him advice from his mother+sister+female friends, and advice from Andrew Tate, I promise you he will get further with Andrew Tates advice.

His mom's advice MIGHT get him a girlfriend that uses him for money, and divorced later in life.. andrew Tate will atleast tell him to get actually attractive through working out and performance at work/business.


> Datadog prices are out of this world.

I was never sure of what exactly Datadog did, so I looked at their pricing. At first, I thought "$23/month/host isn't THAT bad...", then notice that was only for one product.

If you used their full suite, those costs could REALLY add up.


Especially when you're realize that the billing features have been turned up to 11 on turn-key rollout. Sampling set at 100%, gosh that's expensive right out of the box, but no accident.


This is the company that released synthetic API checks at like 100X the price of Pingdom checks. Ended up using their free Pingdom integration to pull in the check data..


The best way to deal with them is to never engage with their sales team. Use what you need, don’t let them talk you into more.

Every year, I get a barrage of phone calls and emails from them, and I actively choose to not engage.

Pricing has remained pretty much flat aside from the expected growth. They work hard to get you to overcommit and overpay.

Also, you don’t need all your logs indexed. Saved a company I’m under contract with a massive amount of money (10s of thousands/month) by pointing out that you should just index (sample) a percentage of them to identify if there is a trend, and you can rehydrate later.


My understanding their stack is Python. So they probably have high costs as well :)


Sentry is python and has extremely reasonable costs (and you can even self-host it).

Honestly if you want to complain at the cost of hosting (which, we don't know if they would) then licensing the software and allowing people to self-host would be the solution.

$65M is enough that I could fund a team running google's monarch system for 7-8 years.


Oh neat, it's a big Django project!


I thought Sentry was Ruby?


You can take a look at the Python codebase for Sentry over here: https://github.com/getsentry/sentry


There's rust stuff too: https://github.com/getsentry/relay


Originally, yes, but it's been rewritten in Go.


oh, nice to know! thanks


We tried Datadog, and due to how they bill, our DD bill was more than the services being monitored. (At the time, we used a lot of short-term spot instances, which billed by the second, but DD billed by the hour per instance)

We reached out to DD to try to work something out, and they agreed to cut the bill in half, but only IF we signed up for additional services.

I will never use their services again, and will always share this story when their name is brought up.


After a sales call that I was unimpressed with, they began calling my engineers in order to put pressure on me to re-engage.

Harassing engineers is a hard no.


Yup, they did that to our team too. Calling my coworker during the evening, emailing me and messaging me on LinkedIn at the same time. When I said "no thanks" they replied with a message to set up a meeting. Went from a maybe someday to a never real quick. They really need to rethink their marketing.


I wonder if that kind of marketing is actually quite successful. You'd think they wouldn't risk their reputation otherwise right?


Could be salespeople that are paid by contracts closed, with zero regards for the consequences to the company's reputation.


How can you secure an opportunistic and resourceful sales team that's encouraged by commission, but not corrupted by it?

What are some of the best ways to ensure the folks developing business leads don't sell out your future for their present?

What are the best incentive structues you can provide so they don't want or need to?


My company came up with a structure where we have an overall quarterly revshare in replace of bonuses for the entire company. It works where engineers get X% and sales reps get ~4 x X% of the revshare pool for each team. Of each team's bonus pool, 2/3 is given out guaranteed based off a few tangible factors and then the remaining 1/3 is given out to individuals who have performed above & beyond.

The idea behind this are a few-fold but essentially:

As an engineer (now CEO/CTO), I've hated having to wait the full year for my bonus. It's just a way to lock me in for the year when my incentive to stay should be to love the work & team. I don't want to create a place to work where you're forced to stay because of some guaranteed bonus - if you want to leave, leave & then let's hire someone who finds the work engaging + we all know performance slips as you wait for the bonus.

For the sales team, it means they're incentivized to work with the engineering & product teams to make sure they get the engineers the proper feedback in order to build a better product that they can sell more easily.

We've found this has generally built a better more team-oriented & results-oriented culture. Happy to expand but overall I think a quarterly revshare for everyone is a much better end-result (other than the fact I'm now forced to care more about making sure engineers are happy but that should be a huge focus regardless...).

Edit - also worth noting that we give everyone equity so there's still a long-term focus of building a company, not just cashing out quickly.


Google and Facebook ads support folks do the same thing, I'm guessing they have some incentive and some metrics to hit, such as number of people who they actually talk to etc.


With bills that large and when they’re on commission, it’s very logical


You can probably pervert those metrics to something that looks good in some slide decks. While actually destroying the image with most potential customers.


> I wonder if that kind of marketing is actually quite successful.

I think their revenue says it is, unfortunately.


It is, believe it or not. Hard no for me though.


It feels to me like they have a culture that enables out-of-control sales staff. Probably juicy commissions with a tendency to look the other way.


They have been harrassing our whole company, tracking people on social networks, cold calling, tons of emails.

We decided to never work with them after that. That and the horror stories of people that experienced the product itself in their former companies.


OpenTelemetry is going to be an existential threat to DataDog and other companies that effectively rely on vendor lock-in to exploit customers. Not sure how companies rationalize these types of services at scale when there are so many open source options to run for a fraction of the cost. You could hire 100+ engineers and still save money compared to a 65M bill

https://news.ycombinator.com/item?id=34540419


OpenTelemetry (or OTel) is in no way an existential threat to DataDog. Primarily because OTel is simply the substrate/protocol by which data is collected from your apps/systems. DD does _a lot_ more than what OTel provides (RUM, SIEM, synthetics, on-call, dashboarding, anomaly detection, and much much more). If anything it's an existential threat to the more legacy vendors that aren't equipped to provide an OTel ingest layer.

OTel _does_ prevent lock-in on the agent side (making it easier to switch vendors) with open source components and consistent schemas, but OTel doesn't enable you to do anything that you couldn't do before with a specific vendor. It just empowers you to take ownership of your observability data, should you want to. Many don't, though. They want to throw money at someone else who can do it relatively well, hence the ridiculous DD pricing.

Regarding:

> Not sure how companies rationalize these types of services at scale when there are so many open source options to run for a fraction of the cost

At Coinbase's scale (which I assume is a lot due to the DD bill, but I haven't looked closely at it) the open source options simply won't cut it. Plus there are no scalable open source options for many of the things that DD does (synthetics, SIEM come to mind - not to mention onerous regulatory requirements). 65M seems like a lot, but it also means their cloud costs are insane - so maybe it lets them put focus elsewhere?


I don't know the exact history of OTel, but I agree with you. A few years ago, open source applications basically included one type of telemetry: Prometheus metrics. (Maybe Jaeger for tracing if you're really, really lucky.) What OTel is is a way to write an open-source library that emits telemetry, but without dictating that the end user use a particular storage system. Basically, you can use Datadog instead of the open-source stuff, even if the library authors have never heard of Datadog.

Datadog specifically, I don't know if they care. They had an army of junior engineers that existed to hack Datadog into every open source project imaginable. If something had monitoring, they would just add their vendor-specific stuff and upstream it. That was probably expensive but probably accounts for a vast amount of their early marketshare. The other vendors wanted in on that racket without having to do too much work; OTel was born.


> OTel _does_ prevent lock-in on the agent side (making it easier to switch vendors) with open source components and consistent schemas, but OTel doesn't enable you to do anything that you couldn't do before with a specific vendor

This is exactly the reason why we are moving away from NewRelic's SDK to an OpenTelemetry SDK (even though we are still using NewRelic to ingest everything). If (more like when) we decide to switch vendors, it will be much easier to do so.


> OpenTelemetry (or OTel) is in no way an existential threat to DataDog. Primarily because OTel is simply the substrate/protocol by which data is collected from your apps/systems.

As a vendor building in this space [1] - it definitely is. We're able to onboard teams faster to do side-by-side comparisons because they can simply point their existing Otel telemetry to both us and their existing provider with just a few lines of config. That wasn't possible before otel, and levels the playing field more than before. As otel matures, it'll continue to erode against DD's position.

It also allows us as a company to focus on what users care about (as you mention that's things like dashboarding, search performance, RUM, etc.) as opposed to spending all our time building basic integrations into every platform (though we still do plenty of work to polish places where Otel hasn't). Again, levels the playing field.

[1] https://www.hyperdx.io/


So your claim is that Datadogs pricing is ridiculous and that Otel allows you to not be locked in, but somehow Otel isn't a thread to Datadog? I don't follow.


The only part of overlapping functionality between DataDog and Otel is the agent.

In theory you could use the Otel Collector (or any other Otel-compatible agent) instead of the DD Agent to collect metrics/logs/traces. This would then make it easier for you to switch from DD to another Otel-compatible provider (Grafana, for example)... but 99.9% of what DD provides is _not_ the agent, it's dashboarding, alerting, RUM, synthetics, etc.

Basically Otel has made _agent_ switching costs effectively drop to zero, but that is a very small part of the whole picture. Like I said above, this primarily hurts vendors with proprietary agents that can't/won't adopt Otel for ingesting data.


We are building SigNoz (https://github.com/SigNoz/signoz) - an open source alternative to DataDog. We are natively based on opentelemetry and see lots of our users very interested in that.

As mentioned in some other places in the thread, DataDog pricing is very unpredictable and high - and I think more open standards based solutions are the way forward which provides users more predictability and flexibility


I’ve seen an attempted move from DD to OT and it was a nightmare of undocumented features and little compounded issues. Tracing was non functional. It doesn’t seem mature enough yet.


Were you watching me?

Because that's what happened.

I am a user of New Relic. Not because I'm happy. But because OpenTelemetry doesn't come close to the same features. Fortunately, at least OT is about 10x harder to set up with worse documentation.

Wait a minute...


yup at my last company CTO wanted to switch away from New Relic to save money for Open Telemetry and I can tell you we wasted months of work because OpenTelemetry is dogshit.

New Relic is really really good too, so it was even more painful.


Yeah, New Relic is just an excellent product. No way around it.


Apparently OT is a threat to DD to the point of them asking contributors to not add support for their agents… https://github.com/open-telemetry/opentelemetry-collector-co...


I am not sure when you tried OpenTelemetry, but it is decently mature now, esp. for tracing. I am a maintainer at SigNoz (https://github.com/signoz/signoz) and we have good support for tracing using Otel for most of the common frameworks.

I agree it was a bit rapidly evolving in early days, but now its much more mature.

You can check out our docs for distributed tracing here - https://signoz.io/docs/instrumentation/


We had to use otel with Datadog because Datadog did not support Elixir with an official SDK.

There are a lot of subtle issues, but we've been able to work through them to get usable traces. (Metrics and log ingestion already have pretty good existing open-source tooling, like statsd).


There's some purposeful friction with DD when it comes to OTel.

Incumbents in this space are in for a rough time as more applications provide meaningful telemetry, beyond just logs. Fortunately for them that timeline is 'fuzzy' at best.


> You could hire 100+ engineers and still save money compared to a 65M bill

I see cloud costs like this a lot and it really puzzles me. It seems like people would rather pay 10X+ more to just not have to think about it than even to hire other people to think about it, because then you have to think about hiring and HR.

"Here's a blank check. Just make it go away."

Of course corporate consultants run on that, so I guess it's not without abundant precedent elsewhere. I guess if you work for a big company with budget and it's not your money you really have little incentive not to take the easy path.


The real issue is finding the right 100+ engineers and then managing them.

That takes calendar time.

There's a multiple to what people are willing to pay for SaaS solutions precisely because HR for knowledge work is such a pain.


This is indeed the case and one can argue that it makes sense for a small company to focus on the MVP and initial growth. But every such decision needs to be reexamined from time to time as the company scales up. That is unpleasant, requires an expert, so many businesses procrastinate on this and make expensive mistakes. My 2c.


Operational costs can be billed to a project. It's not really that the business doesn't want to save money on this stuff, but it's much easier to wind up in this situation when the loops to hire some engineers are 4x as complicated as just adding another sub-org to the data dog billing..

Does seem like a pretty wild bill, thou.


FWIW, trying to hire 100+ engineers now is probably a lot easier than it would have been in early/mid 2021....


I would argue it's a much easier blank check to write than for consultants. Software complexity over time is a major headache and new initiatives happen all the time.

Take for instance GDPR - in AWS it was a company wide effort to get all the services GDPR compliant and that was basically a non-existent pricing change to consumers.

Also the fact that I can call up AWS support and have them look into a bug immediately with real devs on the other end is invaluable when my business needs rely on a certain feature working.


Presumably these companies have higher ROI projects to do with 100 engineers than reimplementing and maintaining an in-house datadog alternative.


I worked at Coinbase until very recently and can confirm this is Coinbase

They paid upfront for 3 years of usage, and yes they were burning > $20m/year on datadog


Dear Lord...the commission on that hog. The sales team eating good.


I was previously in sales and SaaS is considered the pinnacle of industries to be in for sales people. Medical device sales is the only real competitor when it comes to earning potential and my understanding is that's US specific and also comes with atrocious work life balance.


Back in the day I was auditing our support contracts for a place that I worked at. Basically figuring out if we were getting what we were paying for.

My favorite "overpriced support contract" was for an Oracle product. The cost of support was seven figures, and in the entire year a single phone call had been placed to support.


The best bit when I worked at one of those companies that had an expensive Oracle contract was this dynamic:

1. Can we use MySQL for <new product>?

No, use Oracle, we have a support contract

2. <oracle related issue occurs> Can we call that support contract in now?

No, let's try the inhouse expertise first.

3. <inhouse expertise comes up with barely passable hack> Can we check if they have any better solutions?

No, it's "solved" now

Like, what were we paying for? I have to assume there's per-engagement costs as well as the ongoing costs, given how hestitant our contract owning team were to let us anywhere near Oracle.


Did that end up being a “support” call where they sold you more services?


> 3 years

Okay, it seemed like it was 1 year.

So, only insane, not insane^2.


So you’re saying it’s $65 million over three years?

The headline seems misleading, like it’s for a single month.


yes but 20 million dollars a year.

20 fucking million

getting splunk with 1pb a month ingest isnt that expensive.


Please tell me I am allowed to say "you could just use grep and rsync". I must be allowed!


but only if you have "as-a-service" attached to the end of it.

or something with chatgpt


They'll probably throw in SignalFX for free if you ask them nicely.


Headline is slightly misleading

Earnings call said it was an upfront cost


Having recently dealt with a surprise runaway DataDog bill and personally completed lots of work to reduce our spend internally all I have to say is that their pricing is outrageously high and there is almost no way to put controls in place to prevent overspend.

If you're considering using them keep that in mind and tbh I would strongly recommend considering if CloudWatch or some other cheaper alternative is suitable for your needs.


I feel nostalgic for the days of grepping logs in some ways.

At a previous company, we had DD, and I got asked to find some problem. I was able to sift through the data in it and zero in on the instance and find the bad data that came in.

Then it got expensive and so they turned on 'sampling' and the next time I was asked to look for a particular problem, we had no idea if it was even logged.


That reminds me of the quip in gnu parallel about log search: https://www.gnu.org/software/parallel/parallel_examples.html...


(Disclaimer: I work at Chronosphere)

Our company helps avoid these kinds of observability bills and issues like scaling for fast-growing cloud deployments. Generally speaking, many vendors let you fall into the cardinality trap b/c they have an economic incentive to let you do so. One of our biggest selling points is that we provide an observability control plane that helps drill down into wasted queries, shows how metrics can be aggregated, and other ways of avoiding wasted cost/effort. tbh no one should have to pay more to observe a service than to operate it. Where’s the ROI in that? Another plus is that we're all in on open source instrumentation with OpenTelemetry & Prometheus so none of that annoying vendor lock-in.


None of this should be even remotely necessary. It’s like being frugal with table salt.

“We’ll show you how to make sure you don’t have even one crystal fall off the plate.”

My personal pet peeve is Azure Application Insights which uses Log Analytics under the hood… at a rate of $2.75 per ingested GB of logs stored for one month. That’s highway robbery.

Let that sink in: They charge $2,800 to store a TB of text that takes a few hundred dollars of overpriced cloud disk and maybe $10 of CPU time for the actual processing. That’s the cost of a serviceable used car or a brand new gaming PC!

But wait! There’s more.

In reality that 1 TB is column compressed down to maybe 100 MB, making it about $30K charged per terabyte stored on disk.

It doesn’t stop there! Thanks to misaligned incentives, the ingested data format is fantastically inefficient JSON that re-sends static values for every metric sample collected. Why would anyone ever bother to optimise their only revenue?!

They won’t.

The reality is that a numeric metric collected once a second (not minute!) is just 21 MB if stored as a simple array. Most metrics are highly compressible, and that would easily pack to 100 KB per metric per month.

A typical Windows server has about 15,000 performance metrics. We could be collecting these once a second and use a grand total of… 1.5 GB per month. That’s every metric for every process, every device, every error counter, everything.

Modern server monitoring is inefficient and overpriced by 5 orders of magnitude. It’s that simple.

That fact that your company can exist at all is a testament to that.


"They charge $2,800 to store a TB" !!! good god

Totally agree about the compressibility of metrics and toying with the scraping interval. I started out working for an enterprise monitoring vendor that had a proprietary agent that already decided sane intervals to emit metrics, when I learned that Prometheus let users configure that to me...just sounds like an expensive foot gun.

My real beef with metrics is at least for app layer insights is the waste. I'd so much rather have a span/event configured with tail sampling so you can derive metrics from traces and tie them to logs in a native contextualized way vs having to do that correlation on the backend and within different systems and query langs. Seems much more efficient and cost-effective that way, I'm scarred from seeing a zillion "service_name.http_response.p95.average" metrics that are imo useless


> for app layer insights is the waste.

I’m starting to come to the same conclusion, but the point I’m making is a general one: efficient formats would allow finer grained telemetry to be collected without having to be tuned and carefully monitored.

What’s the point of a monitoring system that itself needs baby sitting?!


Folks on this thread might want to check out SigNoz (https://github.com/SigNoz/signoz). It's an open source alternative to Datadog.

I am one of the maintainers at SigNoz. We have come across many more horror stories around Datadog billing while interacting with our users.

We recently did a deep dive on pricing, and found some interesting insights on how it is priced compared to other products.

Datadog's billing has two key issues:

- Very complex SKU based pricing which makes it impossible to predict how much it would cost

- Custom metrics billing ($0.05 per custom metric) - we found that custom metrics can account for up to 52% of the total billing which just does not make sense

More details in the blog here with a complete spreadsheet for detailed calculation https://signoz.io/blog/pricing-comparison-signoz-vs-datadog-...


Datadog prices are insane. We're leaving them now at my work because they've gotten to be more expensive than our entire AWS bill.


Datadog is a pretty amazing product, and if you are careful and use it in the right way, it is very powerful, and cheaper than rolling your own LGTM Grafana stack (or similar). If you are not careful, or at a decent scale, you can easily spend obscene amounts of money. The metrics pricing is completely insane for example, and its easy for people to emit high cardinality metrics from apps and explode your bill. I think by this point you need to run an internal solution, and that is when it makes sense to double down on a combo of elastic, and grafanas stack for logging, tracing and metrics.


The internet is mostly machines talking other machines or monitoring what other machines are monitoring. But then you can also pay for machines to watch the machines that watch the machines.


We need one more layer, for machines that watch how much you're spending on the machines to watch the machines that watch the machines.


Seems like there's opportunity for compition in SaaS monitoring then. I'd imagine a few small efficient teams could beat the price on the core part of what datadog offers. At 5000 employees Your probably paying for corporate bloat


There are plenty of competitors in the observability space, what's another one? The real issue is once your company is on a platform, it's very costly to move off. The biggest consideration is that it won't be a drop in change for all employees, so the retraining needed is substantial across all teams. Far easier to train employees in cardinality, and the cost implied by it, and to expose the cost for their particular monitoring to their teams.

This may come as a surprise but when giving money to a for-profit company, not only are you paying for corporate bloat, but you're also paying for the CEO's lavish compensation package, free lunches, and very costly health insurance for their employees. You're even paying for employee salaries while they're not doing work while on vacation!

If that's a big problem for you, Graphana may be the better product for you.


I don't consider that corporate bloat. That's just life in the software business if you want good talent.

Corporate bloat is like the mini empires people build, headcount for the sake of headcount, that guy who's been here forever passion project that doesn't make money. Process because it helped someone's resume. Those kinds of inefficiencies. This stuff is different then treating employees nicely.


Thats completely wild. Did coinbase go into trace mode and log every blockchain append to datadog? I cant even figure out how anything could cost that much.


Likely logging something for every order placed/removed/modified on every order book, which can be thousands per second per order book.


I’m not gonna claim that Datadog is cheap, but that screams “we didn’t bother to optimize our usage.” Lots of logs, long retention maybe? Really heavy RUM with the replay feature turned on?


What is best cost effective and decent alternative to DD in 2023? I also feel like they are robbing me blind. Great product though.


Depends on what you want to monitor. Grafana is pretty decent, but the real draw to Datadog is their APM stack. The UI for tracing and looking at stuff is pretty awesome.

Though you could get most things into Grafana with something like Prometheus. The problem with Grafana is understanding what the limitations are. If you're not careful with the number of panels and such it can get quite slow.

I've used Grafonnet before for doing Grafana at scale. Simply put, I hate it. Apparently an alternative is being worked on at Grafana so I'm waiting for that. But if you need to make hundreds of panels....it works well enough.

If you need to monitor some infrastructure you can just use Telegraf and output it to Grafana if needed. It kinda falls apart though because another great benefit of something like Datadog is not managing a time series db. That can get ugly real quick.

I guess it all just depends. If my bill was super high I wouldn't mind spending some resources on Prom/Grafana if you're in the Kube space or some Telegraf/InfluxDB if you're not.

I've also heard good things about Timescale but haven't used it.


> I've used Grafonnet before for doing Grafana at scale. Simply put, I hate it. Apparently an alternative is being worked on at Grafana so I'm waiting for that. But if you need to make hundreds of panels....it works well enough.

Hi, I run the Grafana team at Grafana Labs. I'd love to learn more about your Grafonnet use to help us build something better. I'm david at grafana com


Depends a lot what sort of scale you are on too. Grafana Cloud will be cheaper than DD but is not quite as end-user friendly.

Running it yourself is not too hard up if you are not having to do clustering ( say 1m metric series, 100GB/day logs). But different people have different comfort levels for that.

With any monitoring system most of the work is actually making use of the data. Tagging, Alerts, Dashboards and especially onboarding all the teams. You can spend a lot of time and money rolling something out and then barely anybody uses it.


An OpenSource alternative to DataDog built on ClickHouse database which is also native to OpenTelemetry https://github.com/SigNoz/signoz


Take a look at Coroot: it's open-source, ebpf-powered, and can be integrated in minutes: https://github.com/coroot/coroot


(disclaimer, I'm with Grafana). We added a lot to our free Grafana Cloud so you can kick it pretty hard (and harder during the first 14 days when everything is beefed up). Free tier comes with 3 Grafana front end users fully managed, backend (with storage) 50gb Loki logs, 50gb Tempo traces, 10k monthly active series prom metrics, IRM/on call, k6 user testing hours and other stuff too. And for the quick solution integrations we made a K8s monitoring solution with out of the box dashboards, KPIs and alerts. Same thing we did with many others. We absolutely have more work to do in simplifying the user experience too.


Plug: If you're looking for something a bit more "few-clicks-and-you-are-up-and-running", check out OpsVerse ObserveNow: https://opsverse.io/observenow-observability/ .. Entirely powered by OSS tools, ingestion-driven pricing, and without the hassle of managing the stack and scaling up.

Best of all, can also be run entirely within your own AWS/GCP/Azure so you only pay OpsVerse for maintaining the stack based on your ingestion (and we also monitor the monitoring system for you ;))


Use Prometheus or Influx for storing metrics, ELK or Opensearch or loki for logs, Grafana for visualization, and Jaeger for tracing.

You would need a team to configure this setup and make it right over time. It's worth the investment instead of paying a cloud market leader.


My small team had to choose between New Relic and DD and I found New Relic's billing model to be more appetizing. It was per seat and you could switch who was in the seat. Unlimited instances and most of the features were covered under that seat besides some extra things like HIPAA / Finance related stuff. They also have "regular" users that are free that can make dashboards and such. DD drove me nuts with their crazy amount of sales calls that just seemed to balloon.


> you could switch who was in the seat

For others reading this - you can’t just switch back and forth a few times a week. A full platform user can be moved to a basic user only twice in a 12-month period.


Self hosted signoz for opentelemetry? with grafana, prometheus, loki for metrics and logs?


Thoughts on Appdynamics? aernt they DD's competitor.


qryn is solid and very affordable because you self-host. it supports DD vectors out of the box. /dev/null is always a strong contender too.


Thanks for mentioning qryn! We are a non-corporate alternative and feature full ingestion compatibility with DataDog (including Cloudflare emitters, etc), Loki, Prometheus, Tempo, Elastic & others for both on-prem (https://qryn.dev) and Cloud (https://qryn.cloud) deployments, without the killer price tag.

Note: in qryn s3/r2 are as close to /dev/null as it gets!


Are you joking or is there actually a datadog-like startup named dev null?


I'm saying most logs are pointless to keep and would be better directed to /dev/null. Keep important transaction related logs and sample the rest.

The notion that every single log or metric across your entire technical architecture is worth keeping is one implanted by SaaS providers with a financial interest in naive engineers doing just that.


I agree. Many applications log for the hello of it and log unnecessary information.

We have concepts of debug, info, warn and error… but I think we need apps to be developed with the concept of log concern.

For example take sshd. For infosec they are interested in IP of failed attempts, operations might want to know connection failures, etc.


Think of it like DataDog 2.0: https://devnull-as-a-service.com/


That sounds ridiculous, but on the other hand people trust their code to a service called bitbucket.


Kubernetes with Graphana is free. The combo provides logging, performance stats and graphs and lets you auto-scale based on usage.

Unfortunately, avoiding insanely costly SaaS solutions requires engineers to plan ahead and design the entire stack on top of certain open source solutions. I suspect that many engineers today receive kickbacks from SaaS providers to lock-in their employers. Employers are none the wiser and rarely push back when an engineer suggests a big-name SaaS solution with insane lock-in factor. Nobody seems to care about lock-in these days, it's only when your costs reach almost 100 million and interest rates are going up that you start thinking "Damn, I could have had all that for free if I had planned ahead and resisted all these platform lock-ins and unnecessary proprietary tools..."


> Unfortunately, I suspect that many engineers today receive kickbacks from SaaS providers to lock-in their employers.

Cmon man, really? Drop the conspiracy theories. I’ve personally been the guy advocating for datadog at 4 startups. Mainly because of opportunity cost - we have 10-100 engineers, I want them building product not figuring out how to deploy a whole ecosystem of observability tools. IF we get big let’s reevaluate… but in the meantime. am I doing it wrong? If others are getting kickbacks I want in


Same, where do I sign up for these kickbacks?

The difference between datadog and doing it yourself is that datadog is a well thought through product rather than a cobbled together set of various tools

Having a single interface for everything makes life so much easier across a number of different teams

Search is fast and easy to use for logs and traces

Being able to see what a user actually clicked on in their session is absolutely game changing for support teams

I’m not a huge fan of the bill but it’s so much better than anything we could do ourselves without a team of engineers dedicated to observability (which would cost far more than datadog)


> Same, where do I sign up for these kickbacks?

You should have sorted it out before the integration, not after. Now you have no leverage.


I do a lot of negotiating on products like this. The most I've ever gotten was a shirt and some stickers for my kid. Definitely not enough to move the needle on $250k/year deals. I feel like I'm missing out!

I love how good DataDog is. It's a great product. Too expensive though. I love most of the people I've worked with at Grafana Cloud but it's a painful product. The price makes up for it though, so we use Grafana Cloud.

We may end up with something like signoz, when we have the cycles but the ROI is bad when I already have twice as much work as people and that barely more than KTLO.


> I love most of the people I've worked with at Grafana Cloud but it's a painful product. The price makes up for it though, so we use Grafana Cloud.

Hi, I run the Grafana team at Grafana Labs. If you could fix one thing, what would it be?


General usability. DataDog is intuitive and easy. Grafana is rough and requires a solid understanding of statistics and data analytics. The bar for using it is pretty high, so most engineers I know push it to someone else, which means there's one team doing the toil and working on creating simpler abstractions to hide the complexity.


HN has a tendency to explain every little thing with conspiracy theories. It can’t be a clear explanation based on incentives and people taking the path of least resistance, it must be malice. I’m not a psychologist, I don’t want to psychoanalyse why they think this way. But it is a bit tiring to interact with such people.


People who aren't harmed by these things don't notice them. Their incentive is to ignore as much as possible. Turning a blind eye is literally the safest option. But when you've had it rough, you literally can't stop seeing this stuff everywhere.

If you change your mindset from ignoring problems to looking for problems, you will find that there are problems everywhere. I'd rather be biased in that way than in the former. In my position, I can't afford to ignore even the tiniest problems.


> we have 10-100 engineers, I want them building product not figuring out how to deploy a whole ecosystem of observability tools. IF we get big let’s reevaluate…

Moving away from those SaaS tools can be extremely painful and a lot more costly due to vendor lock in. In practice, typically, this "let's reevaluate" time never happens.

On the other hand, I don't really care. I normally suggest open source tools, but if people want to throw money at some vendor, fine by me.


Obviously SaaS providers will not offer kickbacks to startups, the deals aren't usually big enough. I've witnessed it in a big corporation once. One of the engineers was VERY insistent on using a specific solution even though it didn't make sense technically and everyone else was against it but because they were more senior, they made the final call. If they don't get outright bribes, they will get lucrative job offers from these big companies in the future.

Imagine being the guy who convinced Coinbase to use DataDog... That person will probably end up working at DataDog sooner or later if not already there... You can bet they will be getting a very cushy salary.

I could probably make a living out of extorting corrupt engineers. It's so predictable.


I know many of the senior folks at Datadog and they aren't stupid.

They wouldn't hire someone dumb enough to spend $5M a month on their product.


And hiring someone corrupt enough to sell out their previous employer for their current one is rarely a smart move, as they are liable to do the same when angling for their NEXT job.


> I could probably make a living out of extorting corrupt engineers.

Why don’t you walk the walk instead of merely talking the talk.


I have always thought that the big kickback for engineers is getting skills on your resume for the next job. "Resume-driven development" basically.


Gergely Orosz dug into this a bit more and surfaced quite a build your own vs renew vs renew with highly negotiated terms. (https://blog.pragmaticengineer.com/datadog-65m-year-customer...)

A lot of this discussion reminds me of this talk:"Netflix built its own monitoring system - and why you probably shouldn't" (https://www.infoq.com/presentations/netflix-monitoring-syste...) where Roy Rappport describes Netflix as a "monitoring system that happens to stream movies"

As someone who spent a few years at New Relic and Lacework, I can also say that pricing observability fairly is crazy hard when you account for different architectures, usage pricing, and the humans experience the value.


Is the speculation that this was Coinbase just based on Coinbase being a big crypto company? I see nothing in those messages that implies who the customer is (rightly so) and I am wondering if there's some other information I'm missing.


Stupid question: What open source solution can give me an easy Datadog(ish) experience and is simple to implement?


You should check out SigNoz (https://github.com/SigNoz/signoz) - It's an open source alternative to DataDog with metrics, traces and logs in a single application. You can just self host it yourself or try the hosted version. PS: I am one of the maintainers at SigNoz


thanks! That looks really interesting.


Grafana, Prometheus, Loki, tempo


Sooo nothing yet :D

We attempted to migrate from datadog to prometheus at GitHub and that stack did not cover our use case at all. So much tooling had to be recreated. I took a lot of flak when I pointed out numbers made sense to stay on DataDog and migrate to a Microsoft product instead, but the cost savings spoke for itself


Grafana isn't incentived to simplify self hosting IMO, so this is the best you can do without paying them.


That's a bit like asking for an "open source alternative to AWS."

It depends what you're looking for: metrics, logs, APM, tracing, synthetics, web analytics, etc.


metrics, APM mostly


The list of targets Micrometer posts to is a good starting point for options:

https://micrometer.io/docs


We use sentry and seems to work well and the way the pricing is structured I don’t expect any runaway costs to happen.


Sentry is strictly code-first APM, which is only a part of what DD provides. What "APM" _is_ can get kind of blurry, but they are not direct competitors in a meaningful way.


This doesn't surprise me much. From what I've seen consulting/contracting, SaaS-based observability tends to cost 30-50% of cloud spend--EC2, storage, S3, RDS, maybe k8s, and other cloud services, or whatever the equivalent is on GCP/Azure. I wouldn't be surprised to see Coinbase with a >$150M quarterly cloud spend, so $65M on observability would make sense.

That said, managing observability yourself should result in <5% of cloud spend. So I'm figuring someone at Coinbase said "WTF" to this bill and migrated to Grafana/Loki or Kibana/OpenSearch or Kibana/Elastic. Well, that, and Coinbase's business also dropped off a cliff. Combined, I could easily see a one-time influx of $65M from one customer, gone the next quarter.


"it" = Datadog


Damn. Why not self-host something on that scale?!


Yeah seriously, normally the argument is: 'it will require N engineers to run our own and they cost N * 250k/yr...' but for 65million you could fund 5 Datadog competitors and still come out ahead.


They did build a self hosted alternative based on grafana, Loki, Prometheus

Had a whole team of 10+ engineers working on it for 2 quarters, then scrapped it because it performed terribly

The only thing that came of it was negotiation leverage with datadog ("give us X% off or we go self hosted")


What kind of scale of logs are we talking here? The company I work for run a self-hosted Grafana LGTM stack ingesting about 1TB of logs per day, it’s pretty snappy and works well enough, and only costs a few thousand dollars per month in GKE costs for the entire observability stack.

How much logging are we talking here?


GitHub has over 21TB of source code. Applications consistently pour through this data and emit logs and events. 1TB of data by breakfast maybe? In reality, we're not pushing logs to datadog, just metrics and event tags. Our level of cardinality, however, requires a lot of horsepower on the backend. Our attempted Prometheus transition was just not cost effective when attempting to view large sets of data over a large-ish period of time. Combined with the heavy lift of integration (we depended heavily on dogstatsd) it just didn't seem efficient to move to Prometheus, support the infrastructure required, all while migrating to microsoft's inhouse product.


This seems like such a failing in both sides.

DD for letting an obviously huge and important client run up a bill so crazy they have to quit DD out of shame and governance.

CB for demonstrating a complete failure of financial management, vendor management and any sort of ability to track expenses.

Doesn’t help either side to get to a point where you have to quit to demonstrate you’re not totally incompetent and crazy.


Having never heard of datadog, Wikipedia’s summary is:

> Datadog is an observability service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.

So it checks if your servers have crashed or slowed down with a nice dashboard?

Any better summaries or descriptions of what it does and how coinbase would have used it?


They also do client side monitoring, log aggregation, and provide metrics for your overall operations (not just when there are performance issues).

They also have quite an unethical sales operation: https://news.ycombinator.com/item?id=35837965


That's funny, DD is the only company to email my work email, add me as a connection on LinkedIn, txt me over WhatApp, and call my personal phone number multiple times. Amber flag after the LinkedIn connection/message, but red/purple after the WhatsApp/call on my personal phone.


Datadog from when I was looking at them (2017ish) appeared to be an automatic version of nagios with a nice user interface and super simple client side installer.

But it was super tied to VMs at that point, and we were running a bunch of lambdas, herokus and docker instances, along with a shit tonne of AWS services, and java lumps from the 90s


I would wager you could build something pretty close to what data dog is selling for $65M


Suppose you have a bunch of k8s clusters, an AWS Organization etc. You just follow a simple setup and see a nice Dashboard with practically every aspect of your infrastructure, from accounts to nodes to pods.


it ingests and can help you sort through logs, and also does performance monitoring. I guess I would describe it as prometheus + logstash + powerBI in a single unit.

And with a healthy dose of blue cross bolted on for the surprise bills and difficult bureaucracy.


it's a New Relic competitor


Observability is about more than crashes or slowdowns, serious investment in observability is a must-have for any SaaS/cloud product to have reliability, auto scaling and velocity. It’s more than just crashes/slowdowns.

My team use Grafana’s open source LGTM stack. We use Prometheus metrics to track anything from JVM/Go runtime stats, K8S metrics, saturation of CPU/memory, scalability issues, crashes/OOMs, custom metrics for business insights, debugging. We use USE/RED metrics (see: Google’s SRE handbook) to track our production services performance in an objective way. We track SLAs and SLOs so we know when it’s time to focus on features and business impact, and when it’s time to put that aside to focus on stability and maintenance before our customers notice reduced reliability.

As a developer it’s really helpful for testing changes. For example, I added a new database index in dev, then run some load tests and check our dashboards before and after. I look at Q95 latency of APIs and database load to see if it has the desired effect, then when I roll out to production I can monitor those same dashboards and make sure the same desired improvement can be seen for real-word usage.

I used traces recently to discover that something that should have been happening in parallel was instead happening sequentially leading to very long/timing out requests. Adding visualisations via traces helps get your head around how something is working.

I added annotations to our dashboards that shows when our K8S pods restart alongside the metrics. This made me realise that some requests were failing exactly around deployments because we were not cleanly handling SIGTERM in some services.

We have started adding horizontal auto scaling based on metrics for the number of queued messages on a specific Kafka queue. If a large number of messages are waiting we spin up more K8S replicas, and then once this reduces, we reduce the replicas to keep costs down.

I optimise the resource allocations on our services by looking at historical CPU/memory usage so we make the best use of our K8S cluster and avoid OOMs as we scale.

We use Loki for log querying and parsing, you can create really advanced/domain-specialised log querying dashboards and provide that to your support team, and integrate those logs with traces to debug different stages of a request as it traverses your microservices or different processing stages.

You can even build dashboards from logs, which is helpful when debugging a particular type of error over time that you were not specifically monitoring with metrics, or determine which customer(s) are affected by this error. Alternatively if you have a legacy system that does not have effective metrics, you can build metrics from its logs.

We use our metrics for alerting and paging in a way that provides a better signal-to-noise ratio than old-school alerts like “high memory usage” so people don’t get woken up as much (we’ve had zero pages since my product launched 6 months ago!). It’s better to alert only when we have a measurable impact on customer experience, like when a smoke test has failed more than 80% of the time, or HTTP requests 5xx rate is elevated to abnormal levels.

It’s also really reassuring when you do a prod rollout to easily see that stuff is still working without digging into logs, so you can spend more time coding and less time babying prod.

Overall I think having good observability is definitely a worthwhile investment. There are cheaper ways to do it than datadog. I expect much of the trouble is that switching providers is a huge job, we have invested so much time building our observability stack, the challenge of moving seems massive. Thankfully we picked Grafana’s open source LGTM stack and self-hosted it. Even if you picked their SaaS offering, switching to open-source self-hosted is an option so you are less tied in.


Why does the tweet author think this is wild? Sure, 65 million is a lot, but plenty of companies pay large bills for their major cloud services (especially AWS/Azure and the like).

I'm not an expert on monitoring/observability/telemetry etc, nor an expert on Datadog pricing/billing, but paying a lot of money for major infrastructure components doesn't surprise me.


Imagine you had 65 great engineers to build out your companies observability infra. They can use and contribute to OSS tools like Loki and Prometheus. They can split off small teams to build brand new infra and tools. This is THIRTEEN teams of 5 engineers!

You give them 1m each.

You never see them again.


Is anyone using the newer Grafana Cloud Mimir/Tempo/Loki stack that can compare pricing for real usage?

I know Datadog probably has a hell of a lot more add-on features, I'm more interested in head to head of comparable produts


It depends a lot on your usage pattern, but we are switching to Granfana cloud from Datadog and are looking at about 1/3 the cost per year. This is using logs, metrics, and traces with OpenTelemetry instrumentation.

The Grafana pricing is more cleanly volume based that is disconnected from number of “hosts” which is where datadog really squeezes you in a kubernetes setup with many pods.


Sounds like some kind of upfront payment discount.


Looks like it but what cartoon world do we live in where a bill for logging is 65 million.


This is crypto. Literally, the cartoon world.


> what cartoon world do we live in where a bill for logging is 65 million

Perhaps it unlocked insurance?


It could be a 65-year upfront payment. /s


Cloud


How does this compares with Grafana Cloud prices? Anyone knows any estimate and comparison?


Grafana Cloud is definitely cheaper especially if you have some volume.


How does Datadog compare against Azure Application Insights?


I have limited experience in both.

We use Datadog for VM and database monitoring.

When I worked at a place that was all in on Azure, application insights was so we needed because we had no dedicated VMs just all built in Azure services (Cosmos, queues, blob/table storage and functions etc)


Use the crypto to fund the AI that can actually process those logs, then the log analysis can determine how to buy more crypto.

It’s the simple mathematics of perpetual motion, as observed in all stable systems.


Oh that's why they have insane salaries


Why do we think this was Coinbase?


very few other crypto companies capable of paying that much. god knows ftx wasnt investing in observability


Not on purpose, but I could imagine someone at FTX signing up for datadog, configuring it to ingest their logs without doing any estimation or setting up any guardrails and then not checking on it because things were probably crazy over there.


Datadog's SEC filings indicated otherwise.

Since 2022 Q1, the allowance for doubtful accounts has remained under $6 million. Bankruptcy generally triggers a writedown of any associated receivables, so Datadog appears to not have had any material exposure to FTX or any other bankrupt customer.


A previous workplace switched from this to a competitor, which had much worse graphing as far as I could tell. Seems like an important function for these tools! I wondered why they did that, I guess this is the answer.


that's not possible




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: