Great article. On the surface, it's about Snowflake. At a deeper level, the article is about the perverse incentives motivating SaaS businesses to do seemingly dumb, inefficient things and avoid seemingly obvious optimizations by default.
Many SaaS businesses are perfectly happy to let customers shoot themselves in the foot if it generates more revenue. The BigQuery example (presently, by default, `select * from table limit 10` obediently scans the entire table at your expense!) is spot-on.
As the article so well puts it, every SaaS company has a vested financial interest "to leave optimization gremlins in."
It's a terrible article. The author misunderstands competition and how much it drives products in this area. Snowflake is incentivized to make their product better on every dimension. If Snowflake don't improve, customers will leave in droves - like when they moved to Snowflake.
In practice, as has been pointed out in other comments, they do improve their performance (for competitive reasons) and it does cost them money when they do it.... They did it a couple qtrs ago and left $97 mill on the table.
There are many degrees of optimization and clearly there's some cost to bad performance, but Snowflake still has a massive perverse incentive to not spend too much effort on improving performance. If Snowflake is like every software company I've ever been involved with there are many competing projects at any given time and direct revenue impact is a big factor in what gets prioritized.
My own experience with Snowflake absolutely backs up the article's point. At my work we routinely encounter abysmal performance for certain types of queries, due to a flaw on Snowflake's side. We have had numerous talks with them and there is no question that they have an issue, but they have shown absolutely no urgency to fix it. Their recommendation is that we spend more money to work around the problem on their end.
It is a terrible article. I’ve been on the engineering side of these big data platforms including snowflake in its early days, Paraccel (redshift’s code ancestor), redshift, and others you probably use but don’t realize are actually hyper scale database engines. The author missed the mark consistently. I chortled when he discussed the redshift WLM which I helped design a very long time ago and it’s absolute garbage. Snowflakes entire point is you can decouple the storage and the database from the warehouse query engine to provide total isolation from noisy neighbors. If you’re encountering noisy neighbors you’re using the product entirely wrong.
And you’re right. The motivation snowflake has to improve is survival. It’s not like their architecture is impossible to replicate. Redshift is doing a total reorganization of the product and rewrite to compete more directly with snowflake (redshift aqua etc).
They also seem to completely discount the value of SaaS outsourcing database and storage operations to snowflake whose only focus is operating the database product. Running your own clusters is an exercise that seems smart in the first few months then like a puppy when it grows up you’re stuck with a dog. If you love dogs and train them well then great. But fact is most people are terrible dog owners, and the same is true for MPP clusters. Being able to focus on the query management operations exclusively is really ideal. Highly stateful distributed products are a PITA.
He also rants about snowflake not telling him the hardware. Snowflake runs in ec2, gcp, azure. You can literally guess the hardware types - there’s just not that many saddle point instance types for that sort of workload. Discussing ssd vs hdd is also an obvious sign of ignorance - it’s basic premise is it does very wide highly concurrent s3 gets and scans of the data using a foundation db metadata catalog to help prune. Being in aws, it’s implausible they use hdd and realistically they could elide ssds (I do not remember if they use local disks for caching, but it’s stateless regardless).
The unit costing being hardware agnostic is totally normal too - they don’t have to expose to you the details of their costing because they normalize it to a standard fictional unit.
I'm a snowflake customer and I've felt/am feeling all of the pain that this article talks about. There might be some handwaving over technical complexity that you don't like given your detailed understanding of how the thing is built, but the article is fundamentally right in its message.
The thing it's most right about is the power imbalance and the innovators dilemma. I've had more than one instance of the case where we've found that query performance/cost is too high, complained about it, and Snowflake have "made a configuration change" (undisclosed) that has brought the cost down.
Don’t you have the same issue with any query optimized product? If I’m using redshift and hit a bad execution plan that I can’t get around by tweaking the query I’m SOL, and redshift engineers aren’t going to tweak a configuration change to help me.
This is why products like DynamoDB were created - cost based optimizers are imperfect and unpredictable, and once you’ve stepped over some limit or threshold performance wildly changes. The reasons can be your query, or the data has changed, or there’s a noisy neighbor consuming a resource you depend on for your query. If you need highly predictable times you can reason about you won’t get it from any RDB solution.
Given that, what about snowflake feels different? That the details are obscured from you so you don’t understand why things are happening? Is the lack of ability to deeply introspect making you uncomfortable? My experience had been the ability to introspect rarely leads to any change in outcome but instead leads to me identify the query optimizer has done something stupid I can not do anything about, but at least I can point to the specific resource being exhausted by it.
We regularly benchmark the "big 3" Cloud Data warehouses - Redshift, Snowflake and Big Query at SingleStore. Their performance is very close to the same (within 10-20%) on most benchmarks on reasonable sized data sets (10s of TB).
I agree if the performance of one of them fell behind the others for any prolonged period of time the cost to the laggard in market share would be much much worse then short term revenue gain of "being slow on purpose".
I don't think it misunderstands business competition. In fact it understands the concept of competition very well, and develops an insightful critique into the perverse incentives that are borne from competition.
It benefits no one except for a couple thousand people to so blatantly play their customers in this way. In fact, it's worse, as it incentivizes that same behavior of other market actors in the space.
What exactly in the article suggest the author understands the pressure of competition on incentives?
The author states that Snowflake are not incentivized to increase performance due to short term revenue concerns but doesn't mention they are also incentivized to do the opposite from a competitive perspective. The result is incomplete enough that it ends up being flat wrong with respect to the behavior that the company actually engages in.
The author missed the fact Snowflake did the very thing he/she suggested they were incentivized not to do, recently, at a cost of $97 million. The CEO explained why they are doing it and how they are actually incentivized. I don't know how the article could miss the mark by more than it has. The company literally does the opposite of what he/she suggested.It's not like they are the only one either, AWS has a history of reducing prices. Why? Once again, competition.
> The CEO explained why they are doing it and how they are actually incentivized.
The CEO explained why he thinks it's a good long term plan... but for now, they get money i.e. are actually incentivized by slow code. The CEO's incentives are theoretical ones.
And the market, which ultimately control whether the CEO gets to continue that plan or not, did not seem to agree it was a good plan.
By this reasoning, everyone would shirk at work. If you think incentives only act over short time horizons, I don't know how you explain an enormous amount of human behavior.
The market didn't even understand it. Most of the people trading equities, especially around earnings announcements, don't know what a data warehouse is or what matters in that market. All they saw was "miss".
I didn't say the CEO was wrong or that long-term thinking is bad! I said the actual incentives are still misaligned. (I mean, a lot of people do shirk at work, and it even works out well for them.)
I think you have a weird and probably not useful definition of "actual" if "monthly revenue" is not actual but "projected monthly revenue two years from now" is actual. (Or maybe I've just lived in Germany too long.)
You are right, I've used the word "actual" incorrectly. What I should have said was "net". Ie, both short term and long term revenue incentivize behavior and in this case the net result was increasing performance, ie long term incentive > short term incentive.
I think you’re providing a false dichotomy here. The structure may provide an opportunity to maximize short term profits but there is no reason to believe they, or any one, has to follow that opportunity especially if they rationally believe investing energy and money now has a much higher NPV.
When I read these comments about incentives to screw customers and a naked belief everyone must be, I really wonder who traumatized the authors. There are tons of excellent engineering cultures that prioritize excellence for long term gain. Find a better job.
While I think it is definitely in a company's best long term interest to implement features that benefit its customers; it might not be in the best interest of those who are currently running the company.
We have seen many, many examples of executives who are willing to sacrifice the future of the company to get a personal short-term gain. Jack up the revenues (or slash costs) in ways that alienate customers is a great strategy when you plan to jump off with your golden parachute in a couple years when all your stock options vest.
Sure but to not even mention churn as something Snowflake is worried about is pretty silly. With the funding environment taking a dramatic turn they (and every other SaaS company) are going to be deeply concerned about price competition and churn
Agreed. But a good article should have shown an example rather than a counter example. Intel might have been a good example. A good article would have shown the competing incentives at play rather than a single incentive.
> It's a terrible article. The author misunderstands competition and how much it drives products in this area.
Agree, but the author has one thing right. Snowflake is not transparent about product behavior, which makes it hard to reason about costs and performance.
Open source data warehouses like ClickHouse and Druid don't have this problem. If you want to know how something works, you can look at the code. Or listen to talks from the committers. This transparency is an enduring strength of open source projects.
I'm not complaining, of course. It's just an observation. Snowflake is very similar to Oracle in that respect, which is not surprising given where the founders came from.
Personally I think Snowflake is very impressive on the things they optimize for, which includes complex queries on enterprise data sources. The same could be said for BigQuery.
Standard disclaimer: I work at AWS in consulting and could easily be accused of drinking the Kool Aid.
Everyone from consultants, SAs, Sales, support etc is constantly working toward getting customers to “optimize” their spend. Of course any business wants you to give them more money. But, none of us are pushed to get them to spend money on services or methods to do things inefficiently.
I specifically work in consulting specializing in “application modernization”. That means most of my implementations are cheap and I’m constantly spending time making sure my implementation is cheap as possible and still meet the requirements. I first noticed this attitude from AWS when I was working for a startup.
This isn’t just with AWS. I spent years working in enterprise shops and saw the same attitude working with Microsoft.
I can’t speak for any other large organizations - AWS and Microsoft are the only two I’ve worked with as either a customer or employee where there was huge spending on infrastructure or software.
Now I could easily get started about my opinion of Oracle from the customer standpoint. But I won’t.
Well said. I'd also add a cynical note that the recurring revenue model is incentivized to keep the gremlins around not just because of the impact to metered costs, but also because off-ramping is that much more difficult once engineers implement workaround/solutions to mitigate the impact of those smells.
Just another way that vendor lock-in occurs (intentionally or otherwise).
> As the article so well puts it, every SaaS company has a vested financial interest "to leave optimization gremlins in."
It depends on the time scale. A SaaS optimizing for, say, a 1-3 year financial return will see their interests through a different lens than one optimizing for a multi-decade return. Leaving optimization gremlins in isn't aligned with customers' interests in the long run, so the customers will eventually find alternatives if the SaaS doesn't eventually align itself with customers.
"As an investor, I expect Snowflake to show amazing profitability and record-breaking revenue numbers. As an Engineer, if Snowflake continues on the current path of ignoring performance, I expect them to lose share to the open-source community or some other competitor, eventually walking down the path of Oracle and Teradata. Here are a few things I think they can do to stay relevant in five years."
FWIW, BigQuery tables can be configured to require a partition filter clause [0] in the SQL query, so that you cannot shoot yourself in the foot like that. Now if they'd just make an Organization Policy to let you turn it on by default for all new tables.
Depends who the "you" is; someone just getting started with Cloud, or a savvy enterprise operator?
GCP has sided with an "easy out of the box experience". For example, a new project has a "default" network with some permissive firewall rules. A savvy operator wouldn't build things this way, but for a first time user, the cloud is daunting and a JustWorks™ experience gets them moving quickly (e.g. so they can SSH into their VMs easily).
Now, once you've gotten your feet under you, and want to build a solid cloud setup, you'll add Organization Policies [1] like "Skip default network creation", and all new Projects will be completely closed off from the web by default, at the cost of all networking being more complex. Once you're ready for this, turn it on.
So, how should a SaaS database work? Should you have to learn all the intricacies of sharding, partitioning, indexing, SELECT, FROM, HAVING, FULL/INNER/OUTER JOIN, WHERE, GROUP BY, LIMIT, before you write your first query? This is a long standing yin/yang question of product UX. What user persona and UX do you design for on the experience and complexity spectrum.
So they have a system for enforcing rules but still haven't built the rule that would reduce their revenue - seems like an example in favor of the article.
Wow their statement about not participating in benchmarking wars is alarming. In this day and age, when benchmarking tools are so inexpensive and almost everything is very transparent, why not participate.
Or even better engage with a neutral third party such as Jepsen to get on an even playing field and duke it out.
Because their business is providing a solution that IT failed to. Despite the large cost, which the business was already accustomed to from previous IT attempts, pales in comparison to the additional costs of doing it themselves.
It's like the cloud in general, the cost is high but so is the hype. When all that dust settles over the coming years the business will start shopping on price. They will then realize they have been locked in to some extent and will need to start wriggling loose of the lock-in.
> Wow their statement about not participating in benchmarking wars is alarming.
I found the Snowflake statement pretty reasonable. [0]
Vendor benchmarks are largely propaganda. What actually counts is performance on real-world workloads, starting with your own. Plus good bencharks are costly to do well. If vendors are going to invest in load testing, it's way better to do it as part of the QA process, which directly benefits users. The other thing for vendors to do is to drop DeWitt clauses so others can run benchmarks and share the results. Snowflake announced this in the statement and also changed their acceptable use policy accordingly. [1]
Funny that most people here advocate aws while they have tons and tons of foot shooting tools that cost people 1000s of usd all the time. And we just accept it. Like if you want to kill a complex cluster with one api call or button click, it won’t let you for xyz; that’s not because they cannot, it’s because you will just let it be and that makes money.
I worked at a BiqQuery shop and they have a terrific feature where right next to the “Run query” button there is an estimate of the cost of the query, in bytes. It becomes extremely obvious when a query is a full table scan.
Ha! I wonder if we worked at the same place ... it's in the travel space, because when I worked there someone wrote a plugin that did this, and it was a real eye opener at times!
Snowflake is nowhere near a monopoly, and plenty of customers have moved from other vendors (Teradata, Netezza, etc) to Snowflake - showing that vendor lock-in is not as strong as it might seem.
If that's the case, then why aren't Snowlake and Google targeting query optimiziation with higher priority to lower end-user costs? There's no incentive in the market for them to do so - once you switch to them, you'll eat up the cost of quirks and learn how to avoid them the hard way.
I don't think their customers could quantify it if they tried (and i'm not implying Snowflake doesn't give value, it probably does but how does a company attribute it)
I was thinking about this too. Why don’t SaaS companies just force price increases to offset their broken pricing model? Nobody would care, you’re paying the same you were paying yesterday. If you’re still the best in class product with sticky features people will stay. If not and you’re competing, then you have the opportunity to reduce the price in the future or simply not increase it and let users see lower bills which might also retain them.
It's pretty easy to limit the number of results returned by each partition to by limited to 10, then have that further reduced to 10 total during the reduce step.
One more deeper level: almost all consulting exists in a world of consultants driven to limit efficiency lest their billables decline. I know a few people who seemed to aggregate their entire personality to "hard worker" when they refused to progress.
It depends. Many large companies have internal “Professional Services” departments with “consultants” who are full time employees.
Standard disclaimer: I work in ProServe at AWS.
When you “consult” and are employed by the company selling the software, billable hours and utilization is not the be all end all. Consulting is just the “nose of the camel in the tent”. They want you to be as efficient as possible so they can make ongoing revenue.
Trust me, AWS is not going to complain if it only took me 20 hours to do work that was estimated for 40 and brings in half as much consulting revenue if it means ongoing revenue from the customer.
There isn’t just a singular focus on utilization rates.
If that lowers the barrier to entry without having expert level knowledge to know what a full table scan even means why not? Instead of hiring a dba maybe you could hire an intern instead and happily eat the cost of Snowflake.
I think the point of the article is that an optimizer doesn't affect the barrier to entry at all, but adding it would save end users quite a bit of money. So they don't do it because end users' money is revenue for Snowflake/Alphabet
It doesn't lower barriers to entry, it's contrary to logical expectations for someone unfamiliar with how BQ works. If the query is limited to 10 results you wouldn't expect it to scan all 2 trillion of your records. Granted there are numerous warnings in the GUI for these types of things but make this mistake in Python and you're none the wiser.
Wait are you saying the BQ db engine is not following logical expectations? You do realize a "limit" clause doesn't prevent a full table scan in all cases, right?
Many SaaS businesses are perfectly happy to let customers shoot themselves in the foot if it generates more revenue. The BigQuery example (presently, by default, `select * from table limit 10` obediently scans the entire table at your expense!) is spot-on.
As the article so well puts it, every SaaS company has a vested financial interest "to leave optimization gremlins in."