Great article. On the surface, it's about Snowflake. At a deeper level, the arti...

danielmarkbruce · on Aug 22, 2022

It's a terrible article. The author misunderstands competition and how much it drives products in this area. Snowflake is incentivized to make their product better on every dimension. If Snowflake don't improve, customers will leave in droves - like when they moved to Snowflake.

In practice, as has been pointed out in other comments, they do improve their performance (for competitive reasons) and it does cost them money when they do it.... They did it a couple qtrs ago and left $97 mill on the table.

https://www.fool.com/earnings/call-transcripts/2022/03/02/sn...

rurp · on Aug 22, 2022

There are many degrees of optimization and clearly there's some cost to bad performance, but Snowflake still has a massive perverse incentive to not spend too much effort on improving performance. If Snowflake is like every software company I've ever been involved with there are many competing projects at any given time and direct revenue impact is a big factor in what gets prioritized.

My own experience with Snowflake absolutely backs up the article's point. At my work we routinely encounter abysmal performance for certain types of queries, due to a flaw on Snowflake's side. We have had numerous talks with them and there is no question that they have an issue, but they have shown absolutely no urgency to fix it. Their recommendation is that we spend more money to work around the problem on their end.

geoduck14 · on Aug 22, 2022

>At my work we routinely encounter abysmal performance for certain types of queries, due to a flaw on Snowflake's side.

Do tell! I'm a current Snowflake customer, I'd like to know what to look out for.

fnordpiglet · on Aug 23, 2022

Don’t you see this with any cost based query optimizer based product?

fnordpiglet · on Aug 23, 2022

It is a terrible article. I’ve been on the engineering side of these big data platforms including snowflake in its early days, Paraccel (redshift’s code ancestor), redshift, and others you probably use but don’t realize are actually hyper scale database engines. The author missed the mark consistently. I chortled when he discussed the redshift WLM which I helped design a very long time ago and it’s absolute garbage. Snowflakes entire point is you can decouple the storage and the database from the warehouse query engine to provide total isolation from noisy neighbors. If you’re encountering noisy neighbors you’re using the product entirely wrong.

And you’re right. The motivation snowflake has to improve is survival. It’s not like their architecture is impossible to replicate. Redshift is doing a total reorganization of the product and rewrite to compete more directly with snowflake (redshift aqua etc).

They also seem to completely discount the value of SaaS outsourcing database and storage operations to snowflake whose only focus is operating the database product. Running your own clusters is an exercise that seems smart in the first few months then like a puppy when it grows up you’re stuck with a dog. If you love dogs and train them well then great. But fact is most people are terrible dog owners, and the same is true for MPP clusters. Being able to focus on the query management operations exclusively is really ideal. Highly stateful distributed products are a PITA.

He also rants about snowflake not telling him the hardware. Snowflake runs in ec2, gcp, azure. You can literally guess the hardware types - there’s just not that many saddle point instance types for that sort of workload. Discussing ssd vs hdd is also an obvious sign of ignorance - it’s basic premise is it does very wide highly concurrent s3 gets and scans of the data using a foundation db metadata catalog to help prune. Being in aws, it’s implausible they use hdd and realistically they could elide ssds (I do not remember if they use local disks for caching, but it’s stateless regardless).

The unit costing being hardware agnostic is totally normal too - they don’t have to expose to you the details of their costing because they normalize it to a standard fictional unit.

bennyelv · on Aug 23, 2022

I'm a snowflake customer and I've felt/am feeling all of the pain that this article talks about. There might be some handwaving over technical complexity that you don't like given your detailed understanding of how the thing is built, but the article is fundamentally right in its message.

The thing it's most right about is the power imbalance and the innovators dilemma. I've had more than one instance of the case where we've found that query performance/cost is too high, complained about it, and Snowflake have "made a configuration change" (undisclosed) that has brought the cost down.

fnordpiglet · on Aug 23, 2022

Don’t you have the same issue with any query optimized product? If I’m using redshift and hit a bad execution plan that I can’t get around by tweaking the query I’m SOL, and redshift engineers aren’t going to tweak a configuration change to help me.

This is why products like DynamoDB were created - cost based optimizers are imperfect and unpredictable, and once you’ve stepped over some limit or threshold performance wildly changes. The reasons can be your query, or the data has changed, or there’s a noisy neighbor consuming a resource you depend on for your query. If you need highly predictable times you can reason about you won’t get it from any RDB solution.

Given that, what about snowflake feels different? That the details are obscured from you so you don’t understand why things are happening? Is the lack of ability to deeply introspect making you uncomfortable? My experience had been the ability to introspect rarely leads to any change in outcome but instead leads to me identify the query optimizer has done something stupid I can not do anything about, but at least I can point to the specific resource being exhausted by it.

AdamProut · on Aug 22, 2022

We regularly benchmark the "big 3" Cloud Data warehouses - Redshift, Snowflake and Big Query at SingleStore. Their performance is very close to the same (within 10-20%) on most benchmarks on reasonable sized data sets (10s of TB).

I agree if the performance of one of them fell behind the others for any prolonged period of time the cost to the laggard in market share would be much much worse then short term revenue gain of "being slow on purpose".

uoaei · on Aug 22, 2022

I don't think it misunderstands business competition. In fact it understands the concept of competition very well, and develops an insightful critique into the perverse incentives that are borne from competition.

It benefits no one except for a couple thousand people to so blatantly play their customers in this way. In fact, it's worse, as it incentivizes that same behavior of other market actors in the space.

danielmarkbruce · on Aug 22, 2022

What exactly in the article suggest the author understands the pressure of competition on incentives?

The author states that Snowflake are not incentivized to increase performance due to short term revenue concerns but doesn't mention they are also incentivized to do the opposite from a competitive perspective. The result is incomplete enough that it ends up being flat wrong with respect to the behavior that the company actually engages in.

The author missed the fact Snowflake did the very thing he/she suggested they were incentivized not to do, recently, at a cost of $97 million. The CEO explained why they are doing it and how they are actually incentivized. I don't know how the article could miss the mark by more than it has. The company literally does the opposite of what he/she suggested.It's not like they are the only one either, AWS has a history of reducing prices. Why? Once again, competition.

morelisp · on Aug 22, 2022

> The CEO explained why they are doing it and how they are actually incentivized.

The CEO explained why he thinks it's a good long term plan... but for now, they get money i.e. are actually incentivized by slow code. The CEO's incentives are theoretical ones.

And the market, which ultimately control whether the CEO gets to continue that plan or not, did not seem to agree it was a good plan.

danielmarkbruce · on Aug 22, 2022

By this reasoning, everyone would shirk at work. If you think incentives only act over short time horizons, I don't know how you explain an enormous amount of human behavior.

The market didn't even understand it. Most of the people trading equities, especially around earnings announcements, don't know what a data warehouse is or what matters in that market. All they saw was "miss".

morelisp · on Aug 22, 2022

I didn't say the CEO was wrong or that long-term thinking is bad! I said the actual incentives are still misaligned. (I mean, a lot of people do shirk at work, and it even works out well for them.)

I think you have a weird and probably not useful definition of "actual" if "monthly revenue" is not actual but "projected monthly revenue two years from now" is actual. (Or maybe I've just lived in Germany too long.)

danielmarkbruce · on Aug 22, 2022

You are right, I've used the word "actual" incorrectly. What I should have said was "net". Ie, both short term and long term revenue incentivize behavior and in this case the net result was increasing performance, ie long term incentive > short term incentive.

fnordpiglet · on Aug 23, 2022

I think you’re providing a false dichotomy here. The structure may provide an opportunity to maximize short term profits but there is no reason to believe they, or any one, has to follow that opportunity especially if they rationally believe investing energy and money now has a much higher NPV.

When I read these comments about incentives to screw customers and a naked belief everyone must be, I really wonder who traumatized the authors. There are tons of excellent engineering cultures that prioritize excellence for long term gain. Find a better job.

didgetmaster · on Aug 22, 2022

While I think it is definitely in a company's best long term interest to implement features that benefit its customers; it might not be in the best interest of those who are currently running the company.

We have seen many, many examples of executives who are willing to sacrifice the future of the company to get a personal short-term gain. Jack up the revenues (or slash costs) in ways that alienate customers is a great strategy when you plan to jump off with your golden parachute in a couple years when all your stock options vest.

jjfoooo4 · on Aug 22, 2022

Sure but to not even mention churn as something Snowflake is worried about is pretty silly. With the funding environment taking a dramatic turn they (and every other SaaS company) are going to be deeply concerned about price competition and churn

danielmarkbruce · on Aug 22, 2022

Agreed. But a good article should have shown an example rather than a counter example. Intel might have been a good example. A good article would have shown the competing incentives at play rather than a single incentive.

hodgesrm · on Aug 23, 2022

> It's a terrible article. The author misunderstands competition and how much it drives products in this area.

Agree, but the author has one thing right. Snowflake is not transparent about product behavior, which makes it hard to reason about costs and performance.

Open source data warehouses like ClickHouse and Druid don't have this problem. If you want to know how something works, you can look at the code. Or listen to talks from the committers. This transparency is an enduring strength of open source projects.

danielmarkbruce · on Aug 23, 2022

Sure but if you want full transparency you don't use Snowflake. They never sold themselves as that.

I wouldn't buy a Ferrari and complain about lack of trunk space.

hodgesrm · on Aug 23, 2022

I'm not complaining, of course. It's just an observation. Snowflake is very similar to Oracle in that respect, which is not surprising given where the founders came from.

Personally I think Snowflake is very impressive on the things they optimize for, which includes complex queries on enterprise data sources. The same could be said for BigQuery.

mr_toad · on Aug 23, 2022

> The author misunderstands competition and how much it drives products in this area.

Snowflake compete on marketing.

Plenty of people rave about Snowflake and have never heard of Databricks, BigQuery or Redshift.

simo7 · on Aug 22, 2022

The main flaw of the article is not controlling for product category.

I suspect most data warehouses have similar NDRs.

In many companies a data warehouse is the place where you dump all your data and let everyone run poorly written programs against it.

Add to that poor engineering culture in data teams (often lead by non-technical people) and costs are bound to skyrocket.

danielmarkbruce · on Aug 24, 2022

> In many companies a data warehouse is the place where you dump all your data and let everyone run poorly written programs against it.

Hilariously accurate description of a data warehouse.

scarface74 · on Aug 22, 2022

Standard disclaimer: I work at AWS in consulting and could easily be accused of drinking the Kool Aid.

Everyone from consultants, SAs, Sales, support etc is constantly working toward getting customers to “optimize” their spend. Of course any business wants you to give them more money. But, none of us are pushed to get them to spend money on services or methods to do things inefficiently.

I specifically work in consulting specializing in “application modernization”. That means most of my implementations are cheap and I’m constantly spending time making sure my implementation is cheap as possible and still meet the requirements. I first noticed this attitude from AWS when I was working for a startup.

This isn’t just with AWS. I spent years working in enterprise shops and saw the same attitude working with Microsoft.

I can’t speak for any other large organizations - AWS and Microsoft are the only two I’ve worked with as either a customer or employee where there was huge spending on infrastructure or software.

Now I could easily get started about my opinion of Oracle from the customer standpoint. But I won’t.

spmurrayzzz · on Aug 22, 2022

Well said. I'd also add a cynical note that the recurring revenue model is incentivized to keep the gremlins around not just because of the impact to metered costs, but also because off-ramping is that much more difficult once engineers implement workaround/solutions to mitigate the impact of those smells.

Just another way that vendor lock-in occurs (intentionally or otherwise).

makk · on Aug 22, 2022

> As the article so well puts it, every SaaS company has a vested financial interest "to leave optimization gremlins in."

It depends on the time scale. A SaaS optimizing for, say, a 1-3 year financial return will see their interests through a different lens than one optimizing for a multi-decade return. Leaving optimization gremlins in isn't aligned with customers' interests in the long run, so the customers will eventually find alternatives if the SaaS doesn't eventually align itself with customers.

smugma · on Aug 22, 2022

"As an investor, I expect Snowflake to show amazing profitability and record-breaking revenue numbers. As an Engineer, if Snowflake continues on the current path of ignoring performance, I expect them to lose share to the open-source community or some other competitor, eventually walking down the path of Oracle and Teradata. Here are a few things I think they can do to stay relevant in five years."

danielmarkbruce · on Aug 22, 2022

The point is incentives.

twistedpair · on Aug 22, 2022

FWIW, BigQuery tables can be configured to require a partition filter clause [0] in the SQL query, so that you cannot shoot yourself in the foot like that. Now if they'd just make an Organization Policy to let you turn it on by default for all new tables.

[0] https://cloud.google.com/bigquery/docs/querying-partitioned-...

cs702 · on Aug 22, 2022

Yes. That's exactly the OP's point: It's up to you to remember to do the extra work necessary to avoid shooting yourself in the foot by default.

twistedpair · on Aug 23, 2022

Depends who the "you" is; someone just getting started with Cloud, or a savvy enterprise operator?

GCP has sided with an "easy out of the box experience". For example, a new project has a "default" network with some permissive firewall rules. A savvy operator wouldn't build things this way, but for a first time user, the cloud is daunting and a JustWorks™ experience gets them moving quickly (e.g. so they can SSH into their VMs easily).

Now, once you've gotten your feet under you, and want to build a solid cloud setup, you'll add Organization Policies [1] like "Skip default network creation", and all new Projects will be completely closed off from the web by default, at the cost of all networking being more complex. Once you're ready for this, turn it on.

So, how should a SaaS database work? Should you have to learn all the intricacies of sharding, partitioning, indexing, SELECT, FROM, HAVING, FULL/INNER/OUTER JOIN, WHERE, GROUP BY, LIMIT, before you write your first query? This is a long standing yin/yang question of product UX. What user persona and UX do you design for on the experience and complexity spectrum.

[1] https://cloud.google.com/resource-manager/docs/organization-...

itsdrewmiller · on Aug 23, 2022

So they have a system for enforcing rules but still haven't built the rule that would reduce their revenue - seems like an example in favor of the article.

deepGem · on Aug 22, 2022

Wow their statement about not participating in benchmarking wars is alarming. In this day and age, when benchmarking tools are so inexpensive and almost everything is very transparent, why not participate.

Or even better engage with a neutral third party such as Jepsen to get on an even playing field and duke it out.

datavirtue · on Aug 22, 2022

Because their business is providing a solution that IT failed to. Despite the large cost, which the business was already accustomed to from previous IT attempts, pales in comparison to the additional costs of doing it themselves.

It's like the cloud in general, the cost is high but so is the hype. When all that dust settles over the coming years the business will start shopping on price. They will then realize they have been locked in to some extent and will need to start wriggling loose of the lock-in.

hodgesrm · on Aug 24, 2022

> Wow their statement about not participating in benchmarking wars is alarming.

I found the Snowflake statement pretty reasonable. [0]

Vendor benchmarks are largely propaganda. What actually counts is performance on real-world workloads, starting with your own. Plus good bencharks are costly to do well. If vendors are going to invest in load testing, it's way better to do it as part of the QA process, which directly benefits users. The other thing for vendors to do is to drop DeWitt clauses so others can run benchmarks and share the results. Snowflake announced this in the statement and also changed their acceptable use policy accordingly. [1]

[0] https://www.snowflake.com/blog/industry-benchmarks-and-compe...

[1] https://www.snowflake.com/legal/acceptable-use-policy/

Disclaimer: My company runs a cloud service for ClickHouse that competes against Snowflake.

lokar · on Aug 22, 2022

Benchmark results rarely predict actual application perf. You need to run your own queries against your own data. Do a real POC.

danielmarkbruce · on Aug 22, 2022

Because their value prop isn't being #1 on benchmarks. It's about

* being easy to manage * being able to scale up and down compute so you can get good performance without having to keep a bunch of machines running.

tluyben2 · on Aug 22, 2022

Funny that most people here advocate aws while they have tons and tons of foot shooting tools that cost people 1000s of usd all the time. And we just accept it. Like if you want to kill a complex cluster with one api call or button click, it won’t let you for xyz; that’s not because they cannot, it’s because you will just let it be and that makes money.

thehappypm · on Aug 22, 2022

I worked at a BiqQuery shop and they have a terrific feature where right next to the “Run query” button there is an estimate of the cost of the query, in bytes. It becomes extremely obvious when a query is a full table scan.

philjohn · on Aug 23, 2022

Ha! I wonder if we worked at the same place ... it's in the travel space, because when I worked there someone wrote a plugin that did this, and it was a real eye opener at times!

thehappypm · on Aug 23, 2022

Nope, not travel! For us it was a built in feature, not a plugin. This was earlier this year also.

Aulig · on Aug 22, 2022

It feels like these companies haven't found the right value metric to price along. Ideally it should align with the value the customer receives.

polskibus · on Aug 22, 2022

Only competition can enforce this. The article ideally demonstrates the problems with monopolies and vendor lock-in.

danielmarkbruce · on Aug 22, 2022

Snowflake is nowhere near a monopoly, and plenty of customers have moved from other vendors (Teradata, Netezza, etc) to Snowflake - showing that vendor lock-in is not as strong as it might seem.

polskibus · on Aug 23, 2022

If that's the case, then why aren't Snowlake and Google targeting query optimiziation with higher priority to lower end-user costs? There's no incentive in the market for them to do so - once you switch to them, you'll eat up the cost of quirks and learn how to avoid them the hard way.

danielmarkbruce · on Aug 23, 2022

They are. See other comments about snowflake leaving $97m on the table recently, doing exactly that.

carimura · on Aug 22, 2022

Close. Product pricing is based on a variety of perceived factors (value, cost of change, risk of loss, etc.)

altdataseller · on Aug 22, 2022

But that's almost impossible to measure by Snowflake. How would they know how much more revenue you earned because you use Snowflake?

Rastonbury · on Aug 22, 2022

I don't think their customers could quantify it if they tried (and i'm not implying Snowflake doesn't give value, it probably does but how does a company attribute it)

ed25519FUUU · on Aug 22, 2022

> The BigQuery example (presently, by default, `select * from table limit 10` obediently scans the entire table at your expense!) is spot-on.

This bit me on big queries Public patent search, which I was just noodling with for fun. Each query was $4. Ow!

dcow · on Aug 22, 2022

I was thinking about this too. Why don’t SaaS companies just force price increases to offset their broken pricing model? Nobody would care, you’re paying the same you were paying yesterday. If you’re still the best in class product with sticky features people will stay. If not and you’re competing, then you have the opportunity to reduce the price in the future or simply not increase it and let users see lower bills which might also retain them.

kolinko · on Aug 22, 2022

in case of BigQuery it makes sense though - they use map reduce on distributed clusters, so there is no easy way to stop after 10 results are found

JimmyAustin · on Aug 22, 2022

It's pretty easy to limit the number of results returned by each partition to by limited to 10, then have that further reduced to 10 total during the reduce step.

cyanydeez · on Aug 23, 2022

One more deeper level: almost all consulting exists in a world of consultants driven to limit efficiency lest their billables decline. I know a few people who seemed to aggregate their entire personality to "hard worker" when they refused to progress.

scarface74 · on Aug 23, 2022

It depends. Many large companies have internal “Professional Services” departments with “consultants” who are full time employees.

Standard disclaimer: I work in ProServe at AWS.

When you “consult” and are employed by the company selling the software, billable hours and utilization is not the be all end all. Consulting is just the “nose of the camel in the tent”. They want you to be as efficient as possible so they can make ongoing revenue.

Trust me, AWS is not going to complain if it only took me 20 hours to do work that was estimated for 40 and brings in half as much consulting revenue if it means ongoing revenue from the customer.

There isn’t just a singular focus on utilization rates.

aiisjustanif · on Aug 23, 2022

That’s one massively vague take on the whole industry of consulting, including on-prem software, open-source solutions.

My billable hours do fine while making operations more efficient and cost less.

soheil · on Aug 22, 2022

If that lowers the barrier to entry without having expert level knowledge to know what a full table scan even means why not? Instead of hiring a dba maybe you could hire an intern instead and happily eat the cost of Snowflake.

kalimoxto · on Aug 22, 2022

I think the point of the article is that an optimizer doesn't affect the barrier to entry at all, but adding it would save end users quite a bit of money. So they don't do it because end users' money is revenue for Snowflake/Alphabet

soheil · on Aug 22, 2022

If you could just add an optimizer why doesn't the db engine just do that?

whimsicalism · on Aug 22, 2022

Take a step back and reread the article and the comments you are replying to.

florbo · on Aug 22, 2022

It doesn't lower barriers to entry, it's contrary to logical expectations for someone unfamiliar with how BQ works. If the query is limited to 10 results you wouldn't expect it to scan all 2 trillion of your records. Granted there are numerous warnings in the GUI for these types of things but make this mistake in Python and you're none the wiser.

soheil · on Aug 22, 2022

Wait are you saying the BQ db engine is not following logical expectations? You do realize a "limit" clause doesn't prevent a full table scan in all cases, right?

horsawlarway · on Aug 22, 2022

and that db expert you just recommended against hiring could surely tell you that... The intern won't.