Hacker News new | past | comments | ask | show | jobs | submit login

I'm the author of the article. Didn't expect it to blow up. Let me clarify a few points:

1. I like Snowflake and I think they brought several innovations to the field: Instant scale out/up, time-travel, unstructured data query support. 2. Snowflake obviously makes innovations and performance improvements, otherwise they would not be the market leader they are. But I'm also suspecting that they make just enough performance improvements to be at par and then use the vendor lock in features to make switching hard.

My argument is that their rate of performance innovation has considerably gone down and DataBricks, Firebolt, and open source alternatives just seem more attractive from a cost/performance ratio. I agree that Snowflake is still the best data-warehouse to start with if you have 100k, but not if you truly plan for a multi-year horizon and your usage expands.

- Redshift also brought a lot of innovation that allowed people to execute analytical queries 100x-1000x faster than any OLTP that existed out there. I've used Redshift for four years and they kept ignoring performance and features until Snowflake came out. All of a sudden because of competitor pressure, they put more effort into the product to maintain and gain market share. My hope is that Snowflake finds a solution to their innovator's dilemma, since competitors are hot on their tails.

- Some people point out that 70% usage growth just shows that Snowflake is useful. Nobody disagrees with that. The issue is that majority of the companies don't experience a 70% revenue growth to catch up with the growth in costs. At some point, you have to clamp down on costs, which means that you have to look for alternatives to run things more efficiently.




Totally agree with Redshift sentiments. It's been lovely seeing BigQuery and Redshift step their game up over the past 1.5yrs, because they really should have been doing certain things for many years prior.

Re: Firebolt, I don't consider it to be in the same class as Snowflake whatsoever (even though their advertising seems to indicate otherwise). Snowflake is like a very powerful swiss army knife. Firebolt is good for a very specific (dare I say niche?) workload but falls all over itself for the vast majority of data org needs.


> Firebolt is good for a very specific (dare I say niche?) workload but falls all over itself for the vast majority of data org needs.

It runs SQL queries on structured data. Is that niche?


Stas: "The issue is that majority of the companies don't experience a 70% revenue growth to catch up with the growth in costs"

I think you are misunderstanding something very fundamental here. Snowflake has usage pricing and no one is forcing companies to use Snowflake 70% more every year. In my experience, companies are typically evaluating spend on other platforms and after some testing, moving additional workloads there to displace cost elsewhere. Let's say your Snowflake bill was $100k and you were unhappy with your your security data lake provider and replace a $1M bill there with $200k of Snowflake. Your Snowflake bill has now increased 200% to $300k, but you are still $800k ahead overall. In other words, your existing workload (the original $100k) didn't get more expensive.

I've worked in data warehousing for a lot of years now and stepping back, I guess I don't understand what you are trying to accomplish here. I certainly think everyone should take a "trust but verify" approach with their vendors but honestly, I don't think you proven your case, especially since you appear to complete ignore the competitive reality these vendors live in. Beyond that, I don't think "speeds and feeds" are the most important improvements going on with these platforms at the moment. Check the monthly release notes:

BigQuery: https://cloud.google.com/bigquery/docs/release-notes Databricks: https://docs.databricks.com/release-notes/product/index.html Snowflake: https://docs.snowflake.com/en/release-notes.html

Performance is important but it doesn't exist in a vacuum. What percentage of features in the past two months for each of these platforms relate to performance? On the flip side, how much does your company spend on things like data governance? How much would a data breach cost? How many people maintain the platform? What do pipeline failures cost? How is connectivity to other solutions your company uses?

If you look at where innovation is happening (and this is a VERY interesting space these days), the bulk of improvements are in areas arguably more important to companies. BigQuery has added migration improvements, Databricks has added Photon and Unity Catalog improvements, Snowflake has added Java and Python stored procedures. The list is miles long for all of these vendors and I challenge anyone in the space to keep up with everything.

Another comment here said all of these vendors are within 10-20% performance of each other. If that is true, in my opinion you're focused on a problem that is an edge case at best. Something to watch, but not nearly as interesting or as impactful as the rapid pace of innovation across this space in all areas. IMHO.


"In my experience, companies are typically evaluating spend on other platforms and after some testing, moving additional workloads there to displace cost elsewhere"

Fair point, some of that net revenue increase is because of consolidation of workloads, although the majority of the cost is likely still driven by consumers expanding usage beyond what they expected. As I mention in my article, the second part of increase in costs has to do with data governance, and my argument is that snowflake doesn't make governance easy. Why can't they stand up a IAM-like service with a nice UI and dashboards? why can't they make integrations with pagerduty, slack, email work out of the box? Why can't I specify team based budgets and instead have to do it on a per warehouse-team basis? Why do I have to build custom bespoke tooling on top to make governance work?

I can unequivocally say that at a certain scale you need to move on and that Snowflake and many of the SaaS providers are too expensive even at medium scale companies. This article describes this paradox better than I could: https://a16z.com/2021/05/27/cost-of-cloud-paradox-market-cap...

Moreover Snowflake's enterprise pricing model is even more non-scalable. Why do companies often have to pay two times higher price per credit relative to the standard model? Shouldn't guarantees on security or support come with a fixed cost? Shouldn't enterprise offer economies of scale in pricing?

I also wish folks would read my article from end to end because my conclusion in the article is that you don't really have a choice but to use an enterprise solution when your scale is small. If I had to start my own company and had only 2 data engineers, you betcha I would use Snowflake and DataBricks.

--- btw, it really surprises me that nobody has commented on the workload manager. Am I the only one seeing that as an issue? I have enough exposure to compare it with Redshift and I can say that Snowflake's workload manager is just very bad at optimizing throughput.


I read your link. My immediate reaction:

1) I think Andreessen Horowitz has probably oversimplified the issue based on the Dropbox outlier. It's easy to say you can build your own datacenter to manage stuff but the costs in people are really hard to offset, especially with the security posture and level of 999s that most companies need. Not only that, but throw in disaster recovery, so now you've doubled the costs (two data centers). Etc. Plus hardware ages rapidly--you want to pay for the "floor sweeps" (as Teradata used to call them) every few years?

Beyond the complexity, some of these companies simply could not exist without the Cloud. Take Snowflake. How big of a data center would they need? How many servers? How much disk? How do they know if Dropbox wants to load 1 GB, 1 TB, or 1 PB of data? Answer: they don't. This type of model only works if you can leverage the essentially unlimited scale of the cloud providers. I don't miss the days of loads failing due to being out of disk space and having to scramble around trying to find things to delete.

2) Regarding pricing policy, Snowflake makes it very clear which features are included with which edition: https://docs.snowflake.com/en/user-guide/intro-editions.html

Your link also says Snowflake paid 44% of their revenue in 2021 on Cloud. If that is true, perhaps Snowflake loses money at standard edition, and presumably there is a larger internal cost to supporting some of the higher end features like Private Link that Snowflake needs to recapture. Regardless, as a Snowflake customer, I can determine what features I need and decide if the price they are charging is worth it or if I should look elsewhere. I can say from experience that some of these features and even paper certifications aren't easy and can be very expensive to maintain.

I will tell a story that will age me. Decades ago I used to work for a company that needed a "business continuity" plan. We had to show that we could continue to function if our data center was destroyed by natural disaster. We paid a company in another region that had essentially a copy of all of our hardware, and once a year we'd send our backups there and bring up all of the systems to prove we could. As you might imagine, this service was insanely expensive.

Flash forward to now. Snowflake has a feature called failover/failback with connection redirect. With a few commands, you can replicate your entire database elsewhere, you can incrementally keep the remote target up-to-date, and you can test it as often as you like with connections failing over generally in under 1 minute. If your company needs something like this, how much would that cost to build yourself? Maintain? Test? Clearly there must be customers who did that evaluation and decided that Snowflake's approach is way cheaper. If you disagree, don't use that level of service, or build it yourself. You say SaaS providers are "too expensive" and that even "medium scale" companies can do better themselves, but that isn't my experience.

3) As discussed in the previous comment, no doubt Snowflake can make improvements. However, what I see from my limited view as a (probably much smaller) customer, Snowflake is doing that. In fact, two of those improvements you call out are already in private preview and were discussed at their recent conference. If my company was briefed on these features post Summit, I'd be highly surprised if yours wasn't.


Thanks for sharing your perspective. It's always useful to get a more experienced viewpoint. I agree that managing hardware is not something that should be taken lightly and that only companies at scale can do that and should do that: uber, facebook, dropbox. I'm not pushing for managing your own data-centers, but I'm overall more hopeful that open source gets better and more data engineers learn the craft, it would be cheaper to run things yourself once your Snowflake bill is in the millions per month.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: