Hacker News new | past | comments | ask | show | jobs | submit login
Databricks Strikes $1.3B Deal for Generative AI Startup MosaicML (wsj.com)
188 points by jmsflknr on June 26, 2023 | hide | past | favorite | 110 comments



I can’t help but feel this is mostly hype driven by a company that’s looking to reinvent themselves while facing the prospect of irrelevance. The piece reads like it’s a press release, with abstract things such as “corporate leaders under pressure to get their data ready for AI”. That has nothing to do with LLMs, as the current AI “hype” cycle has been going on for almost a decade already. Corporate leaders have always been under pressure to get their data cleaned and in the correct shape, and Databricks’ business model is to sell to these corporate leaders expensive packages combined with consultancy with generally very high operational costs. Compared to 5 years ago, nowadays I rarely encounter people that actually care about Databricks’ product offering.


I don't know if I'd say this is reinventing themselves, or at least not from the ground up, they've been somewhat active in the open source LLM scene for some time now. Not that releasing dolly alone caused some major foundational shift, but it was a solid contribution. Additionally, acquiring Mosaic fills a pretty gaping hole in their existing product offering. The whole process of pretraining, fine-tuning, and evaluating open source LLMs in production is a total nightmare currently. Mosaic's stack at least attempts to make that better (on top of releasing solid foundational models themselves).

> “corporate leaders under pressure to get their data ready for AI”. That has nothing to do with LLMs

I agree that its buzzwordy and a little abstract. But also in my experience, getting "data ready for AI" is actually the primary constraint many orgs have with respect to using LLMs in an enterprise context. Their data is not stored in a way thats easy to tokenize/label/embed for effective training or fine-tuning. And you could argue the preprocessing actually should the easy part for non-ML devs to tackle, as its primarily a software engineering problem. Yet, its still the thing that keeps many folks from getting started (before you even get to attempt to tackle the inference problem).


From my understanding, Databrick's core value proposition is -- 'We let you do things you want to, with the people you have.'

It may surprise HN to learn, but most companies don't have top-tier technical talent.

Consequently, how do you do {cutting edge thing board and investors are demanding company do} with the people you have/can afford? Use an offering that decreases the necessary skill level by providing powerful prebuilt components.

From this perspective, LLM integration sounds like a perfect fit. It's something every company is being asked to have a plan on, but one few are technically staffed to execute on their own.


Databricks' core value proposition is Apache Spark in the cloud, optimized with their special sauce (eg Proton engine).

Running your own Spark, especially on prem, is a lot of work. Most companies would prefer to just provide their data and let someone else handle the query engine.

The parent is right however that Databricks has a feature store (tokenization) but it's not simple to set up and just getting content in and out of it is a major pain point right now.


> [...] LLM integration [...] is something every company is being asked to have a plan on, but one few are technically staffed to execute on their own.

But this is still the "language of hype". LLM integration to do what? I'm not saying they're not doing anything useful, just that this is not a way to describe either a productivity-enhancing technology (feature) or an actual value proposition (benefit) in a meaningful way.


Investors don't understand AI past "It will change things." Which means there needs to be a plan to deal with change.

What the end deliverable of the plan is doesn't matter: effectively it's the company demonstrating it can "do AI," so that if it becomes existentially necessary, the company's ability to execute is already known.

Absent demonstration, hell, the company might not even be retaining useful historical data. (E.g. theft data at a major US retailer)


Well, there was a time when making XML-related software would have been a driver of valuations of companies in a similar way, and engineers were asked by boards to "start doing XML".

I'm not even saying it's unmitigatedly a negative thing. When the question "What are we going to actually do with XML?" came up, many had good ideas, and useful tech investments started getting past boards that wouldn't otherwise have. -- So there was a lot of "collateral goodness".

But people also started ripping out databases and purpose-built DSLs from existing applications just so they could replace them with something XML-based.

As a way of making tech investments, it's weak. If it's 1999 and I'm asked to invest in a tech company that makes a damned good XML parser and nothing else, I'm well advised to not invest. And if I'm a CTO in a non-tech business asked to start replacing database servers with XML files, I'm well advised to not do it.


> this is not how you describe a meaningful value proposition.

It depends who you're selling to. If customers are asking for a way to deal with LLMs, then providing that is a perfectly fine value proposition. The "do what" is going to depend on the customer.


A rising tide lifts all boats.

Databricks may have its thunder stolen by snowflake. But the true AI boom happening right now, benefits many data product vendors.

The basic requirement for enterprises to use LLMs is having their data in order, which basically requires a cloud data warehouse. It is simply responsible to profit off the hype for a established data company.


> The basic requirement for enterprises to use LLMs is having their data in order, which basically requires a cloud data warehouse.

I'm curious why you would say this. The main use cases for LLMs involve things like customer service chatbots, knowledgebase search, document summarization, co-pilot/code generation, and content generation for product descriptions and marketing emails. The main data sources would be things like document repositories on Sharepoint and transcripts of customer calls. Not a bunch of historical sales data or financial data sitting in the typical data warehouse. I think there's a big misconception about this. Data sources for Generative AI are very different than data sources for 2013 era data science projects (which perhaps not coincidentally is what led to the development of things like Databricks).


As someone new to the data ops platform world.. what does Databricks even do? They get so much hype as a great employer, but why are they attracting people if their secret sauce is irrelevant?


Databricks was founded by the creators of Apache Spark, and their platform is a fully hosted spark solution. Spark is a flexible distributed computing framework which came as a successor to Hadoop and map/reduce. Spark processes big data at scale on clusters, this includes ML training and inference pipelines. Now though, their value prop is less clear since you can implement these pipelines (for example using AWS sagemaker) without having to learn spark.


I’m a solution architect at Databricks, so feel free to skip if you think I’m too biased - just know that I worked with Databricks before working for them, and I liked the platform then as well.

While Databricks definitely started with Spark, and it’s still a significant foundation, there’s done much more on top and around it. For instance:

- MLFlow for ML lifecycle management from experiments to real-time ML serving;

- Delta Storage format, extending parquet to leverage cloud storage and enable efficient updates and very fast access;

- SQL Warehouses, which expose Databricks as a SQL engine for Analytics;

- Jobs/Workflows, which is one of the most used orchestration engines in the world;

- Unity Catalog, which will replace (at least in Databricks) Hive Metastore for metadata, access control, lineage and data governance tooling in general.

And now LLMs, on top of the data and ML capabilities mentioned above, much extended by the Mosaic deal (still to be approved).

The interesting thing is that yes, Databricks can do the whole “Data Warehousing” thing, but it can also do very large scale streaming, machine learning, process unstructured data like text, audio or images, support BI applications, etc - all accessing the same data with compatible tooling. So, it’s a full blown multi-workload data platform for any kind of use case and company size.

One can argue that most components are open-source and can be deployed independently - and Databricks has open-sourced Spark, MLFlow and Delta. It’s just that most companies simply don’t have enough (if any) staff with skills to deploy and operate all these things, let alone as one integrated platform. With Databricks, I’m used to deliver a demo where it takes me about 20 minutes from having a new cloud account to be running data workloads against a cluster or SQL, with all the functionality above.


Further, what is it different from snowflake that they do?


The biggest difference to consider is storage, I guess. Databricks clusters can store in S3 or wherever you want really. Snowflake has their own storage layer and it always lives in their cloud. This precludes any on-prem deployments.


You can’t be more wrong? Snowflake also stores in s3 or elsewhere, in fact it’s like the only common feature. Snowflake uses proprietary formats mostly though and only gives a sql interface. Think of it as Apple vs android.


> You can’t be more wrong? Snowflake also stores in s3 or elsewhere

Snowflake stores data in S3 in Snowflake's various AWS accounts. Egress fees are necessary when performing anything outside of Snowflake.

Databricks operates on data stored in your S3 on your AWS account(s). Databricks also runs on your compute contained within your AWS account(s).

Both approaches have valid use cases.


> Snowflake stores data in S3 in Snowflake's various AWS accounts. Egress fees are necessary when performing anything outside of Snowflake.

Egress fees are only if you are unloading into a different region/cloud. If your data is in SF AWS us-east-1, and you unload to your S3 in us-east-1, there is no egress fee.

> Snowflake charges a per-byte fee for data egress when users transfer data from a Snowflake account into a different region on the same cloud platform or into a completely different cloud platform. Data transfers within the same region are free.

https://docs.snowflake.com/en/user-guide/cost-understanding-...


Not in your account.


The compute itself can be in your account. The control plane is usually managed by databricks


Databricks are not on-premises either.


> Databricks are not on-premises either.

Usually, no. But in some cases, yes.


I must have missed the product announcement. Do you have a link, because everywhere I look I can't find it.


You can get most products on prem at a certain price & size. A lot of companies will apply resistance though unless the contract size is right b/c on-prem contracts tend to be less unit-profitable, risky, + unique annoying terms or constraints.

If you * need * it, you find a human to talk to (sales or connect from your network).


Snowflake only really offers SQL (with UDFs) to do data transforms. Spark offers SQL, but also code-based solutions (Python, Scala, Java) that can be used to scale things like ML pipelines and non-SQL transforms.


As far as being an attractive employer they pay really well. Not sure about other dimensions. This is anecdotal but a couple of people in my network had bad things to say about the work life balance and culture there.


What has replaced Databricks, in your opinion? Doesn’t seem «dead» to me, yet. Serious question.


Their moat was to be able to spin up spark clusters for you. This was very useful when companies just starting in that space and don’t have the skills. But with K8s and clusters becoming more common for normal workload, it’s not hard to manage your own spark clusters anymore. There goes their moat. Their other offerings just aren’t that amazing. People now don’t just want to buy hand tools anymore, they want power tools. AI/ML and its cousins are in the power tool space.

Databrick cost is stupid high, like snowflake. Any companies looking to chop budget would put those contracts up first.


Their moat is their phenomenal sales team.

Like, I'm at company 3 that pays for databricks.

Amount of those companies that use Spark = 0.

I've given up complaining now.


So what are they using, the notebooks?


But in the notebooks you run pyspark / scala spark / spark sql code?


Yes, but for most of these companies they don't actually run (much or any) spark stuff, they just use the notebooks.

It's really depressing, tbh but I've made my peace with it now.


I can’t blame them really. Regular pandas-in-notebook can deal with quite a bit of data and is reasonably convenient. Even if the org and datasets are large, it’s rare to have huge single tables. Then you can go full spark when you have to.


Sure, but why bother paying databricks for that? By default you'll be running 3+ machines while doing all your work in memory on the master node. It's just so very, very silly.


True. I’ve been wondering why they don’t facilitate non-spark use better. But then you can’t reasonably invoice 3 VMs extra, obviously.


Yes, you can, if you want.


So Databricks is being replaced by open source, self-managed spark-on-k8s? Broadly speaking? Any concrete recommendations on how to do that in a project?


We tried this at a previous company I was at, it was incredibly tough to maintain and tune. We ended up using databricks which made it significantly more productive.


No, it's not about the technology. It's about your stack and skill set. If you are still on the school of one box for all. Then just pay for Databrick or EMR. If you are already running K8s because you invested into that world, then adding Spark isn't a huge jump.


Looking at the Mosaic website, and looking at many AI startups, I can't help feeling like they are all small AWS Sagemaker, Bedrock, Trainium, ... (etc.) competitors, and not sure how they will compete with the sheer capital that Amazon has, and potential to offer that compute power cheaper (at least long enough to kill off most of the competition). Maybe it is it the off chance that one of these companies might become a big player worth billions that makes the odds worth it?


A lot of AWS services (especially SageMaker) aren’t very good in customer experience. People buy them for nominal capabilities and AWS core bread and butter — short-term and long-term reliability.

Most of these startups (AI and others) have to offer a compelling product before even being notable.

Besides, AWS top level doesn’t care if you use sagemaker or not. There’s a premium but if you’re still using EC2 via another startup, they’re still capturing lions share of value.


Sagemaker is probably a bad example since most folks who run ML workloads on AWS don't use it (from what I've seen talking to many ML teams). It's partially a scattered focus on their customer type (are they for experts/non-experts/something in between?, what exact use cases are they covering?) and partially I think just bad UI/UX (related to customer focus). AWS will converge on what is working for ML platforms and just build their own as time goes on. First mover advantage won't matter and these earlier ML platforms will be consumed imho


The devil is in the details. Training large LLMs requires a lot of custom infra (handling GPUs going down, efficiently pushing data to keep the accelators busy, deciding on which mechanism of parallelizing model training is better - data vs model parallelism or both, tuning hyperparams of optimizers which can be different for larger batch sizes, etc)

Mosaic is one of the better providers for this. AWS is nowhere near ready at this current point in time, it is pretty much a "dumb" infra provider in large LLM training at this point. (Of course they won't be standing still and will prob acquire that capability one way or another)


Just curious, how is Databricks going to be irrelevant? I thought they were a reasonably powerful database provider for data analytics platforms in many companies?


Yeah that's an absolutely garbage idea. Databricks is basically the opposite of irrelevant at this point.


This was my guess as well. I am wondering what you see as the "prospect of irrelevance" they are facing. I have my guess but I would like to hear your take.


People are doing more and more self-hosting and desire PaaS-like offerings that run in their own cloud, rather than SaaS “we manage your spark cluster for you” which is what DataBricks was founded upon. PaaS has significantly thinner margins, but it’s desirable from the customers’ perspective as it’s cheaper and much better from a (data) security point of view.

As such, they’ll be forced to do a lot more services oriented work rather than product / platform oriented work, because it pays well. Their sales team is also excellent. I see a similar fate for them as Cloudera.


You can find an article on Bloomberg stating that Databricks’ annual revenue has grown by 60%, which sort of backs the SaaS hypothesis it bets on. Do you have data on your PaaS hypothesis that is bigger than that?

Source: https://www.bloomberg.com/news/articles/2023-06-13/databrick...


> People are doing more and more self-hosting and desire PaaS-like offerings that run in their own cloud, rather than SaaS

Do you have some data to support this? This is a pretty bold claim.


+1. I would be delighted to see that data as well. I mean we all like the idea of local control but local PaaS is hard to do in a general way.


Coincidentally this comes after MosaicML released the best open source commercially usable LLMs on huggingface: mpt-30b, the first open source LLM with 8k context length that can be extended even further with ALiBi and has been trained on a whopping 1 trillion tokens vs. 300 billion for Pythia and OpenLLaMA, and 800 billion for StableLM.


OpenLLaMA models up to 13B parameters have now been trained on 1T tokens:

https://github.com/openlm-research/open_llama


unfortunately not openllama-33b yet


20b done


do we know how it ranks vs the other models yet? its not yet up on https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...


Is it better than Falcon 40B?


This class of startup, "build domain specific LLMS using your own data", is extremely crowded right now but I am not optimistic about their future. For large companies, the actual modeling work for this is already easy for any ML team, thanks to existing FOSS work on stuff like PEFT and LoRA. The hard part is figuring out what data goes into the fine tuning process and how to get this data in a usable form, but this is very business specific and can't be automated in a SaaS process.

For SMBs, the value would be in using the LLM to generate responses to customer Q&A/search queries. But these companies aren't going to integrate some external third party service, they'll only use it if it's already baked into their CMS - Wordpress/Shopify/Wix/etc. I just don't see who the final consumer for this product would be.


> "build domain specific LLMS using your own data",

It seems to me that the vast majority of these people would be better off just doing semantic search with their documents chunked, run through an embeddings process, and stored in a vector database, with the search queries and results then run through an LLM at the final step to create an actual "answer". For applications where this is not practical, I agree that LoRA should be the next approach. I have a hard time believing that the future is everyone training their own domain specific LLMs from the ground up.


I wholeheartedly agree with this. Vector databases are easily updatable, searchable by recency, and you can verify where the information came from. Training a custom frozen LLM for every company seems insane. Each company’s data is not that unique - it’s just the numbers that matter, for which you need a vector or traditional database.


> thanks to existing FOSS work on stuff like PEFT and LoRA

YMMV. Sometimes a LORA is fine, but sometimes a full finetune is necessary for higher quality output.

That being said, backwards pass free training keeps making more and more progress. Seems like a short matter of time before it becomes practical.


Look at QLoRA. The QLoRA can be attached to all layers, allowing you to alter behavior with much less data than the original LoRA implementation. It seems to "stick" better.

I just fine tuned a ~30b parameter model on my 2x 3090s to check it out. It worked fantastically. I should be able to fine tune up-to 65b parameter models locally but wanted to get my dataset right on a smaller model before trying.


Are there any repos and steps you can point to to do this? I'd love to try to do exactly what you describe. I have been trying to do the same and have run into a lot of repos with broken dependencies.


I used: https://github.com/artidoro/qlora but there are quite a few others that likely work better. It was literally my first attempt at doing anything like this, and took the better part of an evening to work through CUDA/Python issues to get it training, and ~20 hours of training.


> is extremely crowded right now but I am not optimistic about their future

Why not? Every larger "cloud" company seems to be randomly buying 1 at the moment to offer "AI" so might get some good deals. This is clearly 1 of them - panic buy.


Unless if this 1.3B cash I wouldn’t celebrate so soon.

This is likely mostly stock deal, backed by databricks equity that, like most other private companies, is illiquid and has taken a nose dive in value in the last couple of years.

I don’t know the exact split, but I’d imagine it is 300m cash, 1B in databricks stock at some last round private valuation that’s likely untested

Databricks last raises at 38B valuation at 1B ARR near peak valuation hype in August 2021.

Assuming they can get 10x and their ARR has doubled since then, they’re down maybe 50 percent in valuation.

Internally they say they are marking themselves down to about 34B.

Is the deal using databricks shares valued at 38B? 34B? 20B? Who knows. I hope it’s the latter.

And I hope that whenever Mosaic folks get to sell their databricks stocks that it is worth more than that


Congrats to the MosaicML and wish all the luck! If anyone is interested in MosaicML alternatives, check out dstack, we are building its OSS and cloud-agnostic alternative: https://dstack.ai. Disclaimer: I'm the founder and CEO at dstack.


It would be really cool to support more vendors (eg Lambda Labs, Runpod, etc) - getting GPU availability feels like 'any port in a storm' sometimes :)


Yup, we're already working on an experimental support for Lambda Labs! Ping us if you'd like to test it out. Perhaps we could show something already next week.


Thrilled to hear about MosaicML's successful exit! Nonetheless, this could serve as an indication to explore other options. Given Databricks' past acquisition of Redash which led to its downfall, there's no absolute assurance that Mosaic's fate won't be similar.


Why do you say it led to redash's downfall? What happened to redash since the acquisition?


They've essentially killed Redash, which was one of the best open-source dashboard/data visualization tools. They had promised to keep it open-source and improve upon it (https://web.archive.org/web/20211019150919/https://blog.reda...).

However, a year later, their SaaS has been discontinued. The open-source repository is now stagnant, with hundreds of unresolved merge requests. On a positive note, they recently shared the repository with some other open-source maintainers, so there's hope that Redash will be reborn.


> On a positive note, they recently shared the repository with some other open-source maintainers, so there's hope that Redash will be reborn.

Yeah, the root of the problem (in my opinion) was that Redash has sooo many Python dependencies (due to supporting so many databases and similar) that it's become a real hairball source code wise to keep them all playing together nicely.

Especially as over time (say) library package FOO has some security vulnerability reported that gets fixed in a new release... but the dependencies of the newer release are too new to work with (say) package BAR. Times that by 50 and it's a real pita.

Simultaneously to that, the Redash team got busy with their work at Databricks (mostly not Redash related). Then the automatic CircleCI checks on PRs started failing (ugh), etc.

---

But, as @bratao mentioned above that's all getting worked through now.

Admin and maintainer permissions have been given to a group of dev volunteers / known Redash enthusiasts. CI is working again now (as of last night), and we're currently untangling the dependency hairball.

It's likely to be a few weeks (minimum I guess) before any new official releases are ready, but it will happen. :)


How old was MosaicML? How many customers did they have? Revenue? Profits? This feels like pure hype for LLMs to be honest. Good for the founders of MosaicML to exit at the top, but I'd bet good money that DataBricks winds up writing this off in a year or two max.


huge congrats to the team - we interviewed Jonathan and Abhi from the team last month ( https://www.latent.space/p/mosaic-mpt-7b#details ) and it was blindingly obvious that they were an incredible team of engineers building one of the most valuable training platforms in the industry. congrats!


Good to hear they are responsive to PR opportunities. Too bad their interview process doesn't match this level of enthusiasm.

I had a recruiter reach out from DB late last year. I wasn't looking for new work but it seemed like it'd be worth a chat. I had to move my schedule around to fit it in and then get up extra early to be prepared for the chat...only the recruiter never showed. Didn't follow up about missing a meeting they had scheduled. It's been crickets. That is enough of a lack of professionalism really stood out as a red flag. Hard pass.


~50 employees? I'd love to hear what regular employees will see from this (if anything). Even if it's completely controlled by NDA, I think even confirming the existence of that NDA would be helpful.

So often we see these "success" stories and we don't hear exactly how the "regular joes" made out. And it can color our perception of the startup gamble, without real evidence.


Falcon 7B/40B is good, but hasn't really caught on like LLaMA has. One big reason is that its not well supported by 4-bit inference code (namely llama.cpp and GPTQ).

Mosaic's 7B (and 30B?) models have the same issue, and 7B kinda paled in comparison to LLaMA 7B... But maybe it would be better if finetuned?

To me, its kinda baffling that Mosaic didn't work on adding highly quantized inference to the popular frameworks.


Mosaic's MPT models are already supported in GGML: https://github.com/ggerganov/ggml

Here's MPT-30B running in 4-bit precision on CPU :) https://twitter.com/abacaj/status/1673133443339763712?s=20


Oh I didn't realize this. Everything is moving so fast that I can't even keep up with the features.


Falcon is also slow for inference compared to LLaMA models of similar size. Speed can turn into a quality all its own when you break up problems into small pieces and use the LLM to iterate.


Thats because its unoptimized, right? My impression is that the Falcon GPTQ (and 8 bit bnb inference?) code is just immature.



What do these companies do? I consider myself mildly well versed in the ML space at least, but I’ve never heard of MosaicML before (have they trained anything popular?). I’ve heard of Databricks but know very little about what service they provided before/after the recent deep learning craze.


Databricks provides Jupyter lab like notebooks for analysis and ETL pipelines using spark through pyspark, sparkql or scala. I think R is supported as well but it doesn't interop as well with their newer features as well as python and SQL do. It interfaces with cloud storage backend like S3 and offers some improvements to the parquet format of data querying that allows for updating, ordering and merged through https://delta.io . They integrate pretty seamlessly to other data visualisation tooling if you want to use it for that but their built in graphs are fine for most cases. They also have ML on rails type through menus and models if I recall but I typically don't use it for that. I've typically used it for ETL or ELT type workflows for data that's too big or isn't stored in a database.


Now say it even dumber as if I were a dog or a ceo.


They create services for data scientists (notebooks) and machine learning engineers (Spark).


Thanks


Perhaps, it is a fancy .csv importer?


Thanks!


Replit recently used the platform to train their open source code completion model [1]. Theres a decent video where Reza Shabani talks about the process they went through [2] (mosaic gets mentioned around 20m in)

[1] https://huggingface.co/replit/replit-code-v1-3b [2] https://www.youtube.com/watch?v=roEKOzxilq4


This $1.3B dollar figure is surely relative to Databricks 2021 valuation from their Series H (most recent round), and doesn’t represent real money since multiples have plunged since then

Estimates of Databricks revenue is $400M and “valuation” of $36B, which would never hold today


Where do you get the $400m figure? Just a couple weeks ago Bloomberg ran an article stating their revenue is >$1B


Yea, with what they charge, I'd be surprised if it was as low as 400 mil. I've seen mid-sized companies with 8-figure Databricks contracts.


I heard Ali the CEO of Databricks speaking on a podcast recently and stated their revenue last year was north of 1B.


Databricks wasn't the one purchased, MosaicML was. Mosaic's last round had a valuation of $136m, so this is a pretty big jump for them.


The Databricks valuation is relevant if any part of the acquisition was paid for in Databricks equity.


Any part? My guess is 100% of it.


Actually seems Databricks got a great deal for Mosaic. Real qustion is why Mosaic took it v. hold out or do another round

Rough math plugging in public #s and comments here:

- All stock deal at Aug 2021 val of 38B (1B ARR)

- Assume rev doubled to 2B (which may even be aggressive)

- SAAS multiples are down 6x since Aug 2021

- 38B x 2 / 6 = $12.7B

- 12.7B / 38B * 1.3B = 434M = effective price

- Assume 100M to pref stock

--> Comes out to 334M, with a chunk of that (1/3? 1/4?) potentially subject to earn out


Realized there's probably pref on Databricks too, which would further lower the value of its common. On the other hand, there could have been a markdown from the 38B since August '21


I didn't realize Databricks was doing so well they could spare $1B in equity/cash. Congrats not only to MosaicML but Databricks as well!


Yeah, they're one of the strongest private software companies right now. I'd say easily top 10.


It's so suprising to me that there's enough dropped on the floor/use cases ignored from AWS/Azure that they could build such a large business.


Turns out Databricks couldn't do it all internally with just training the dolly model.

I think it's a good accretive deal because Databricks already has a good number of enterprise relationships and Mosaics key offering has been training software with their own optimizations that increase gpu efficiency.


I'm less surprised by the acquisition but more confused by seeing MosaicML labeled as 'Generative AI startup'. To my knowledge MosaicML is an open source lib for optimising training/inference taking a "speedup" tricks from research papers and making them available easily. At least a few months back that was the case. When did they become a Gen AI startup ? Or they rebranded lately?


Wonder what the outcome for early employees looks like here?

This is not a very big company and hasn't been around for very long. 1.3B is a lot of money


> bringing down the cost of using generative AI—from tens of millions of dollars to hundreds of thousands of dollars per model,

That's a lot of people you have to replace with one model--even if you get the cost basis down. Do those costs include the period refitting?


It really depends on what you are using it for. There are plenty of high-value use cases where you can free up resources that are spending time inefficiently due to handling tasks that a domain specific LLM could help with. Maybe that allows you to not expand your team by allowing your existing resources to work on things that bring more value.


In light of the story I see just above this one ( https://news.ycombinator.com/item?id=36475376 ) Imma call inflated bs....


Congratulations to folks at MosaicML and Databricks!


Congratulations to all involved!!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: