Hacker News new | past | comments | ask | show | jobs | submit | plaidfuji's comments login

This is where things are headed. All that ridiculous busywork that goes into ETL and modeling pipelines… it’s going to turn into “here’s a pile of data that’s useful for answering a question, here’s a prompt that describes how to structure it and what question I want answered, and here’s my oauth token to get it done.” So much data cleaning and prep code will be scrapped over the next few years…

I'm definitely biased because my day job is writing ETL pipelines and supporting software, and my current side project is a data contracts library for helping the above[0]. Still I'm not sure I see this happening.

80% of the focus of an ETL pipeline is in ensuring edge cases are handled appropriately (i.e. not producing models from potentially erroneous data, dead letter queing unknown fields etc).

I think an LLM would be great for "take this json and make it a pandas dataframe", but a lot less great for interact with this billing API to produce auditable payment tables.

For areas that are reliability focused, LLMs still need a lot more improvments to be useful.

[0] https://github.com/benrutter/wimsey


> I think an LLM would be great for "take this json and make it a pandas dataframe", but a lot less great for interact with this billing API to produce auditable payment tables.

Yeah, it's great....so long as you don't care that it randomly screws up the conversion 10% of the time.

My first thought, when I saw the post title, was that this is the 2025 equivalent to people using MapReduce for a 1MB dataset. LLMs certainly have good applications in data pipelines, but cleaning structured data isn't it.


Yes, LLMs are not always the best option, they are an option. Sometimes requirements of the project are such that they are also the best option.

There is one browser that uses price matching example that is impossible to do without a full-blown data science team right now: https://github.com/Pravko-Solutions/FlashLearn/tree/main/exa...


Inappropriate tools are always an option? I can cut a cake with a jackhammer, but....

Anyway, like I said, there are certainly good applications of LLMs, and this is probably one? I wouldn't describe "do market research on prices" as a traditional "data pipeline", but that's just me, I guess.


I think you'd tell the LLM to design the pipeline, not be the pipeline. That way you can see exactly what it's done and tweak as needed. Plus should be way more cost effective.

Hah. I remember being forced to use MapReduce for a tiny dataset, back in the early 2010's. Hadoop was all the rage.

"lemme just fire up a dbt workflow to analyse this CSV file"

You may have meant that sarcastically, but i just did that for 2 csv files that i needed to do a bunch of cleanups and joins to analyze. With llm help the whole adventure was easy.

What I really like to do for this is loading it into SQLite, there are built in macros for reading/writing CSV files. And they're queryable with SQL which makes for a great jumping point to do some basic cleaning, joining and analysis.

This also I'd argue makes the job easier with LLMs since you can ask it to write a SQL query which you can validate / reason about rather than relying on it for transforming the data itself (which I've seen a lot under this post)


Honestly I spend ten times as much effort figuring out people's sloppy notebooks or pandas stuff than when they just use DBT and SQL. And 90% of the time SQL is all they needed.

For your wimsey library, using “pipe” to validate the contracts would seem to me to drastically slow down the Polars query because the UDF pushes the query out of Rust into Python. I think a cool direction would be to have a “compiler” which takes in a contract and spits out native queries for a variety of dataframe libraries (pandas/polars/pyspark). It becomes harder to define how to error with a test contract but that can be the secret sauce.

Actually you're almost 100% describing how Wimsey works! It's using native df code rather than a UDF of some kind. Under the hood it uses Narwhal's which converts polars style expressions into native pandas/polars/spark/dask code with super minimal overheads.

If you're using a lazy dataframe (via polars, spark etc) Wimsey will force collection, so that can have speed implications. Reason being that I can't find a cross-language way yet of embedding assertions for fail later down the line.


I belive that LLMs will become better and better in the near future and pipelines will replace classic approaches with LLM-enriched pipelines will drastically simplify the ETL flows.

Not that I don't love LLMs and play with them and their potential but if we don't get proper mechanism that ensure quality and consistency then it's not really a substitute for what we have.

It's very easy to produce something that seemingly works but you can't attest to its quality. The problem is producing something resilient, that is easy to adapt and describes the domain of what you want to do.

If all these things are so great, them why do I still need to do so many things to integrate a bigtech cloud agent with popular tool? Why is it so costly or limited?

UX matters, validation matters, reliability matters, cost matters.

You can't simply wish for a problem not to happen. Someone owns the troubleshooting and the modification and they need to understand the system they're trying to modify.

Replacing scrapers with LLM is an easy and obvious thing, specially when you don't care about quality to a high degree. Other systems such as financial ones don't have that luxury.


You may be right! I guess we'll find out soon.

One thing I'd be wary of is what "LLM-enriched pipelines" look like. If it's "write a sentence and get a pipeline" then I think that does massively simplify the ammount of work, but there's another reality where people use LLMs to get more features out of existing data, rather than doing the same transformations we do now. Under that one, ETL pipelines would end up taking more time, and being more complex.


But at what cost?

We're in an energy/environmental crisis, and we're replacing simple pipelines with (unreliable) gas factories?


Cost per token has cratered a thousand percent over the last two years, and that's not just lighting VC on fire, efficiency gains are made left and right.

How much do we need to progress before it becomes comparable in terms of energy to the (often already rather energy-inefficient) data pipelines we've been using so far?

Recall that while the cost per token may decrease, CoT multiplies the number of tokens by several orders of magnitude.


LLMs are not the most efficient way to solve the problem, but they can solve it.

They can do it, they're just slower, less reliable and orders of magnitude more energy-expensive.

But yes, they're potentially easier to setup.


This is a head-scratcher of a take. Have you actually done any in-depth work on data pipelines and analytics tooling? If so, what precisely do you see LLMs making easier?

I tried using enterprise chat gpt to write a query to load some json data into a data warehouse. I was impressed with how good a job it did, but it still required several rounds of refinement and hand-holding and the end result was almost, but not quite, correct. So I'm not coming at this from the perspective of hating LLMs a priori, but I am unimpressed with the hype and over-selling of its capabilities. In the end, it was no faster than writing the query myself, but it wasn't slower either, so I can see it being somewhat helpful in limited conditions.

Unless the technology makes another quantum leap improvement at the same time the price drops like a stone, I don't see LLMs coming anywhere close to your claim.

That said, I expect to see a huge amount of snake oil and enterprise dollars wastefully burned on executive pipe dreams of "here's a pile of data now magic me a better business!" in the next few years of LLM over-hyped nonsense. There's always a quick buck to make in duping clueless execs drooling over replacing pesky, annoying, "over-paid" tech people.


Let me give you a complementary perspective. Same problems all of you have but I work in a small lab team of PhD biologist who generate huge omics data set and even larger lightsheet microscopy and MRI datasets but don’t know how to do a VLOOKUP in Excel. And who do not know the exotic acronyms: LIMS, QA, QC, or SQL. Yes, really.

What do we typically do in academic biomedical research in this situation?

The lead PI looks around the lab and finds a grad student or postdoc who knows how to turn on a computer and if very lucky also has had 6 months of experience noodling around with R or Python. This grad or postdoc is then charged with running some statistical analyses without any training whatsoever in data science. What is an outlier anyway, what do you mean by “normalize”, what is metadata exactly?

You get my drift: It is newbies in data science and programming (often 40-and 50-year-olds) leading novices (20- and 30-year-olds) to the slaughter. Might contribute to some lack of replicability ;-)

And it has been this way in the majority of academic labs since I started using CPM on an Apple 2 in 1980 at UC Davis in an electrophysiology lab in Psychology, to the first Macs I set up at Yale in a developmental neurobiology lab in 1984, and up to the point at which I set up my own lab in neurogenetics at the University of Tennessee with a pair of Mac IIs in 1989 and $150,000 in set-up funds, just enough for me to hire one very inexperience technician to help me do everything.

So in this context I hope all of you can appreciate that ANY help in bringing some real data science into mom-and-pop laboratories would be a huge huge boon.

And please god, let it be FOSS.


I feel you, and LLMs are no doubt a boon in tooling to help in this kind of scenario. I'm not poo-pooing LLMs in general; they are very cool! I wish they were allowed to just be very cool while we incorporate them into our tooling and workflows, rather than over-hyped.

You have more faith in LLMs than I do. The reality is it will probably get you 70 to 80% there, then you'll spend a ton of time debugging / fixing your pipelines, only to realize it would've been simpler, faster, and more reliable to not involve an LLM in the first place.

I believe that we'll learn how to incorporate LLMs to improve parts of data pipelines, particularly those that involve extracting unstructured or semistructured data into structured data, especially if it can provide a reliability score or confidence level with the extract. I'm much more skeptical of claims beyond that.

I also think there are unanswered questions about reliability, cost (dollar and energy), and AI business models; I don't think OpenAI can burn $2+ to make a dollar forever.


Unless you can provide some "citation", I don't think you are right. I do this every day now and it gets me 99 % there with very little debugging.

As always, "it depends." How simple are your pipelines? Single CSV? Sensible column names that are totally unambiguous? Consistent, clean data? Then LLMs are probably fine...

This is completely wrong, if anything an increase in the usage of LLMs to generate small pipelines will lead to increased demand for professional pipelines to be built. Because if any small thing breaks the dashboards/features break which is immediately noticeable. I think you'll see a big increase in the number of models a data scientist can create, but making those python notebooks production ready can't be done by an LLM. That's to say as analysts create more potential use cases, there will be more demand to get those implemented.

There's so much that goes into ensuring the reliability, scalability and monitoring of production ready data pipelines. Not to mention the integration work for each use case. An LLM will give you short term wins at the cost of long term reliability - which is exactly why we already have DE teams to support DA and DS roles.


>This is completely wrong, if anything an increase in the usage of LLMs to generate small pipelines will lead to increased demand for professional pipelines to be built. Because if any small thing breaks the dashboards/features break which is immediately noticeable. I think you'll see a big increase in the number of models a data scientist can create, but making those python notebooks production ready can't be done by an LLM. That's to say as analysts create more potential use cases, there will be more demand to get those implemented.

I agree. There is a lot of data people want that isn't made because of labor costs. Not just in quantity, but difficulty. If you can only afford to hire one analyst, and the analyst's time is only spent on cleaning data and generating basic sums, then that's all you'll get. But if the analyst can save a lot of time with LLMs, they'll have time to handle more complicated statistics using those counts like forecasts or other models.


> If you can only afford to hire one analyst, and the analyst's time is only spent on cleaning data and generating basic sums, then that's all you'll get. But if the analyst can save a lot of time with LLMs, they'll have time to handle more complicated statistics using those counts like forecasts or other models.

That applies to so many other jobs.

My productivity as a single IT developer, making a rather large and complex system mostly skyrocketed when LLM's became actually useful (around GPT4 era).

Work where i may have spend hours dealing with a bug, being maybe 10 minutes because my brain was looking over some obvious issue that a LLM instantly spotted (or gave suggestions that focused me upon the issue).

Implementing features that may have taken days, reduces to a few hours.

Time taken to learn things massive reduces because you can ask for specific examples. Where a lot of open source project are poorly documented or missing examples or just badly structured. Just ask the LLM and it puts you in the right direction.

Now, ... this is all from the perspective of a 25+ year experienced dev. The issue i fear for more, is people who are starting out, writing code but not understanding why or how things work. I remember people before LLM's coming in for Senior jobs, that did not even have basic SQL understanding, because they non-stop used ORM's. But they forgot that some (or a lot) of this knowledge was not transferable to different companies that used SQL or other ORM's that may work different.

I suspect that we are going to see a generation of employees that are so used to LLMs doing the work but not understanding how or why specific functions or data structures are needed. And then get stuck in hours of LLM loop questioning because they can not point the LLM to the actual issue!

At time i think, i wish this was available 20 years ago. But then question that statement very fast. Was i going to be the same dev today, if i relied non-stop on LLMs and not gritted by teeth on issues to develop this specific skillset?

I see more productivity from Senior devs etc, more code turnout from juniors (or code monkies), but a gap where the skills are a issue. And lets not forget the potential issue of LLM poisoning with years of data that feeds back on itself.


I see it as a gray area - long term there will be a need for both and you will have just one tool to choose from when presented with time-budget-quality constraints.

Yeah I can also see it very much depending on the demands - I'm definitely not saying every pipeline has to be the most reliable, scalable piece of software ever written.

If a small script works for you and your use case / constraints there's nothing I can say against it, but when you do grow past a certain point you'll need pipelines built in a proper way. This is where I see the increased demand since the scrappy pipelines are already proving their value.


Exactly, scale after you need to.

This would require massively more compute than regular pipelines...

(1) that delta will decrease quickly, and (2) corporations will gladly pay for compute over headcount to maintain fragile data pipelines

> (1) that delta will decrease quickly

Is your data pipeline o(n^3) in the number of tokens? If not, then no, it won't.


The price will go down, but LLMs reaching 100% accuracy and reliability is another story. We are nowhere close right now.

If your problem is compute, you are already optimizing. This is here for all the steps before you start thinking latency-compute. Not all use cases are made equal.

no, not so simple.. the simplicity of this idea is like a gravitational pull for your human mental model mind. Meanwhile, LLMs are like a non-reproducible cotton-candy machine. Quality will be an elusive light at the end of the tunnel, not a result, for non-trivial systems IMHO. Simple systems? sure, but economics will assign low-skill humans to the task, and other problems emerge.

What is the intoxication that assumes the engineering disciplines are now suddenly auto-automatable ?


not data pipelines, not yet at least since usually those require high degree of accuracy (depending on the company, of course). Where I see it (already) move in is data exploration, which effectively are data pipelines before data pipelines are being developed.

Good point! LLMs are best when you are starting from point 0.

Exactly!

Here’s a take I haven’t seen yet:

If training and inference just got 40x more efficient, but OpenAI and co. still have the same compute resources, once they’ve baked in all the DeepSeek improvements, we’re about to find out very quickly whether 40x the compute delivers 40x the performance / output quality, or if output quality has ceased to be compute-bound.


> If training and inference just got 40x more efficient

Did training and inference just get 40x more efficient, or just training? They trained a model with impressive outputs on a limited number of GPUs, but DeepSeek is still a big model that requires a lot of resources to run. Moreover, which costs more, training a model once or using it for inference across a hundred million people multiple times a day for a year? It was always the second one, and doing the training cheaper makes it even more so.

But this implies that we could use those same resources to train even bigger models, right? Except that you then have the same problem. You have a bigger model, maybe it's better, but if you've made inference cost linearly more because of the size and the size is now 40x bigger, you now need that much more compute for inference.


Actually inference got more efficient as well, thanks to the multi-head latent attention algorithm that compresses the key-value cache to drastically reduce memory usage.

https://mlnotes.substack.com/p/the-valleys-going-crazy-how-d...


That's a useful performance improvement but it's incremental progress in line with what new models often improve over their predecessors, not in line with the much more dramatic reduction they've achieved in training cost.


If H800 is a memory-constrained model that NVIDIA built to avoid the Chinese export ban on H100 with equivalent fp8 performance, it makes zero sense to believe Elon Musk, Dario Armodei and Alexandr Wang's claims that DeepSeek smuggled H100s.

The only reason why a team would allocate time on memory optimizations and writing NVPTX code rather than focusing on posttraining is if they severely struggled with memory during training.

I mean, take a look at the numbers:

https://www.fibermall.com/blog/nvidia-ai-chip.htm#A100_vs_A8...

This is a massive trick pulled by Jensen, take the H100 design whose sales are regulated by the government, make it look 40x weaker and call it H800, while conveniently leaving 8-bit computation as fast as H100. Then bring it to China and let companies stockpile without disclosing production or sales numbers, and have no export controls.

Eventually, after 7 months, US govt starts noticing the H800 sales and introduces new export controls, but it's too late. By this point, DeepSeek has started research using fp8. They slowly build bigger and bigger models, work on the bandwidth and memory consumptions, until they make r1 - their reasoning model.


What's surprising is anyone would repeat Elon musk related things.

Tech or politics related, he's off the deep end.


Especially since he seems intent on everyone talking about him all the time. I find it questionable when a person wants to be the centre of attention no matter. Perhaps attention is not all we need.


Yet another casualty of laypersons browsing arXiv. That paper was like flypaper to his narcissism.


The problem is he's only wrong some of the time and then people arguing about which one it is this time generates attention, a valuable commodity.


Maybe “some” applied in the past but his recent history might best be described as “almost always”.


Drugs. Dont do that much drugs for so long.


He's like a broken smart network switch, smart as in managed. Packets with switch MAC on it are all broken, but erroneously forwarded ones often has valuable data. We through L3 don't know which one is which.


I'm wrong some of the times.

He's a lucky mensch, no more, no less.


Interesting how people keep calling it “the Chinese export ban”. Isn’t an American export ban?


I don’t think that it got more efficient. It’s that smaller models can train via larger ones cheaply. Think teacher/student relationship

https://en.m.wikipedia.org/wiki/Knowledge_distillation


I think what got cheaper are models with up to date information.


You almost never reintegrate new information with training, its by far the most expensive way to do that.


...and that got cheaper? Not sure your point.

At some point, the models _have_ to do "continuous integration" to provide the "AGI" that's wanted out of this tech.


> DeepSeek is still a big model that requires a lot of resources to run

I can run the largest model at 4 tokens per second on a 64GB card. Smaller models are _faster_ than Phi-4.

I've just switched to it for my local inference.


Isn't the largest model still like 130GB after heavy quantization[1] and 4 tok/s borderline unusable for interactive sessions with those long outputs?

[1] https://unsloth.ai/blog/deepseekr1-dynamic


I told it to skip all reasoning and explanations and output just the code. It complied, saving a lot of time)


Wouldn't that also result in it skipping the "thinking" and thus in worse results?


Yes, but even that can still be run (slowly) on cpu-only systems down to about 32gb. Memory virtualization is a thing. If you get used to using it like email rather than chat, it’s still super useful even if you are waiting 1/2 hour for your reply. Presumably you have a fast distill on tap for interactive stuff.

I run my models in an agentic framework with fast models that can ask slower models or APIs when needed. It works perfectly, 60 percent of the time lol.


OP probably means "the largest distilled model"


So not an actual DeepSeek-R1 model but a distilled Qwen or Llama model.

From DeepSeek-R1 paper:

> As shown in Table 5, simply distilling DeepSeek-R1’s outputs enables the efficient DeepSeekR1-7B (i.e., DeepSeek-R1-Distill-Qwen-7B, abbreviated similarly below) to outperform nonreasoning models like GPT-4o-0513 across the board.

and

> DeepSeek-R1-14B surpasses QwQ-32BPreview on all evaluation metrics, while DeepSeek-R1-32B and DeepSeek-R1-70B significantly exceed o1-mini on most benchmarks.

and

> These [Distilled Model Evaluation] results demonstrate the strong potential of distillation. Additionally, we found that applying RL to these distilled models yields significant further gains. We believe this warrants further exploration and therefore present only the results of the simple SFT-distilled models here.


How are you running it, can you be more specific?


DeepSeek-R1-Distill-Llama-70B on triple 4090 cards.


In the long run (which in the AI world is probably ~1 year) this is very good for Nvidia, very good for the hyperscalers, and very good for anyone building AI applications.

The only thing it's not good for is the idea that OpenAI and/or Anthropic will eventually become profitable companies with market caps that exceed Apple's by orders of magnitude. Oh no, anyway.


Yes! I have had the exact same mental model. The biggest losers in this news are the groups building frontier models. They are the ones with huge valuations but if the optimizations becomes even close to true, its a massive threat to their business model. My feet are on the ground but I do still believe that the world does not comprehend how much compute it can use...as compute gets cheaper we will use more of it. Ignoring equity pricing, this benefits all other parties.


My big current conspiracy theory is that this negative sentiment toward Nvidia from Deepseek's release is spread by people who actually want to buy more stock at a cheaper price. Like, if you know anything about the topic, it's wild to assume that this will drive demand for GPUs anywhere but up. If Nvidia came out with a Jetson like product that can run the full 670B R1, they could make infinite money. And in the datacenter section, companies will stumble over each other to get the necessary hardware (which corresponds to a dozen H100s or so right now). Especially once HF comes out with their uncensored reproduction. There's so much opportunity to turn more compute into more money because of this, almost every company could theoretically benefit.


Can you guys explain what this would be bad for the OpenAI and Anthropic of the world?

Wasn't the story always outlined to be we build better and better models, then we eventually get to AGI, AGI works on building better and better models even faster, and we eventually get to super AGI, which can work on building better and better models even faster... Isn't "super-optimization"(in the widest sense) what we expect to happen in the long run?


First of all, we need to just stop talking about AGI and Superintelligence. It's a total distraction from the actual value that has already been created by AI/ML over the years and will continue to be created.

That said, you have to distinguish between "good for the field of AI, the AI industry overall, and users of AI" from "good for a couple of companies that want to be the sole provider of SOTA models and extract maximum value from everyone else to drive their own equity valuations to the moon". Deepseek is positive for the former and negative for the latter.


Because building a frontier model is expensive. But building a model as good as an existing frontier model is cheap (re: distillation).

https://en.m.wikipedia.org/wiki/Knowledge_distillation

So the takeaway is they have no moat



Beautiful and concise, much better than my word salad.


I believe in general the business model of building frontier models has not been fully baked out yet. Lets ignore the thought of AGI and just say models do continue to improve. In OpenAIs case they have raised lots of capital in the hopes of dominating the market. That capital pegged them at a valuation. Now you have a company with ~100 employees and supposedly a lot less capital come in a get close to OpenAIs current leading model. It has the potential to pop their balloon massively.

By releasing a lot of it opensource everyone has their hands on it. Opens the door to new companies.

Or a simple mental model, there has been this ability for third parties to get quite close to leading frontier models. The leading frontier models takes hundreds of millions of dollars and if someone is able to copy it within a years time for significantly less capital, its going to be hard game of cat and mouse.


If I can use LLMs for free, why would I give money to OpenAI or Anthropic?


Yes, but I think most of the rout is caused by the fact that there really isn't anything protecting AI from being disrupted by a new player - They're fairly simple technology compared to some of the other things tech companies build. That means openai really doesn't have much ability to protect it's market leader status.

I don't really understand why the stock market has decided this affects nvidia's stock price though.


Does line go up forever?


Isn't the question closer to "/when/ does line stop going up?"


That is a matter of hope.

If line keeps going up, line does catastrophic or potentially apocalyptic harm, given our current circumstances.


It goes up at least until LLMs match humans - ie until an LLM can write Windows


I want the LLM to decide not to do anything, or write a new OS.

Whenever I prompt: "Do not do anything"

It always does <something>.


Do not think of a pink elephant. Were you able to do so?


> Whenever I prompt: "Do not do anything" It always does <something>.

Yep. A lot of times, the responses I get remind me of Simone in Ferris Bueller's Day Off: https://www.youtube.com/watch?v=swBtLPWeKbU

If you end up making a new model, please teach it that less is more and call it "LAIconic".


Slightly tangential, but I want an LLM which can debug windows.


Debug And locally fix security holes.


Or sell the found exploit to the friendly nation state actor to pay for its compute behind your back


I've missed the stories on this until now. Is it known (and is there an ELI5) how they were able to do it so much more efficiently?


This article has good background, context, and explanations [1] They skipped CUDA and instead used PTX which is a lower level instruction set where they were able to implement more performant cross-chip comms to make up for the less-performant H800 chips.

[1]: https://stratechery.com/2025/deepseek-faq/


> Moreover, if you actually did the math on the previous question, you would realize that DeepSeek actually had an excess of computing; that’s because DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA.

You can do this just fine in CUDA, no PTX required. Of course all the major shops are using inline PTX at the very least to access the Tensor cores effectively.


So can people do the same in SPIR for OpenCL or amdgcn?

https://en.wikipedia.org/wiki/Standard_Portable_Intermediate...

https://www.khronos.org/spir/

Or even better in the unified language like SYCL?

https://cdrdv2-public.intel.com/786536/Heidelberg_IWOCL__SYC...


IIUC they released a paper, it's partially algorithmic improvements partially good old low level optimization.


There's something to be said about the idea that instead of just dumping oceans of money into buying Nvidia cards they just...optimized what they had

I'd say the wider industry could learn a thing or two, but as other commentors have joked. The line must go up


The double edged sword of export controls.


Creativity always delivers when minds are constrained.


Is that what we can call Project2025?

Creativity from constrained minds?


surely more relevant comments could exist in the aether of the internet.


Your reply doesn't read like a refute to me, mnky9800n.


Ah but why care about efficiency when you have basically unlimited investor money?

Reminds me of Japanese cars and the OPEC boycott in the 1970s...


there's this and that little desktop computer they announced earlier this month - digits

they claim it's able to run models with 200B parameters on a single node and 400B when paired with another node


Like everything, I expect improvement to be logarithmic


That's a take I've seen in many HN comments


That seems to be the key question.


Yeah this was my first thought as well. If it got so efficient how good all the models will be 2-3 months from now


>If training and inference just got 40x more efficient

The jury is still out on how much improvement DeepSeek made in terms of training and inference compute efficiency, but personally I think 10x is probably the actual improvement that's being made

But in business/engineering/manufacturing/etc if you have 10x more efficiency, you're basically going to obliterate the competitions.

>output quality has ceased to be compute-bound

You raised an interesting conjecture and it seems that it's very likely the case.

I know that it's not even a full two years that ChatGPT-4 has been released but it seems that it take OpenAI a very long time to release ChatGPT-5. Is it because they're taking their own sweet time to release the software not unlike GIMP, or they genuinely cannot justify the improvement to jump from 4 to 5? This stagnation however, has allowed others to catch up. Now based on DeekSeek claims, anyone can has their own ChatGPT-4 under their desk with Nvidia project Digits mini PCs [1]. For running DeepSeek, 4 units mini PCs will be more than enough of 4 PFLOPS and cost only USD12K. Let's say on average one subscriber user pays OpenAI monthly payment of USD$10, for 1000 persons organization it will be USD$10K, and the investment will pays for itself within a month, and no data ever leave the organization since it's a private cloud!

For training similar system to ChatGPT-4 based on DeepSeeks claims, a few millions USD$ is more than enough. Apparently, OpenAI, Softbank and Oracle just announced USD$500 Billions joint ventures to bring the AI forward with the new announced Stargate AI project but that's 10,000x money [2],[3]. But the elephant in the room question is that, can they even get 10x quality improvement of the existing ChatGPT-4? I really seriously doubt it.

[1] NVIDIA Puts Grace Blackwell on Every Desk and at Every AI Developer’s Fingertips:

https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...

[2] Trump unveils $500bn Stargate AI project between OpenAI, Oracle and SoftBank:

https://www.theguardian.com/us-news/2025/jan/21/trump-ai-joi...

[3] Announcing The Stargate Project:

https://openai.com/index/announcing-the-stargate-project/


This is such a great read. The only missing facet of discussion here is that there is a valuation level of NVDA such that it would tip the balance of military action by China against Taiwan. TSMC can only drive so much global value before the incentive to invade becomes irresistible. Unclear where that threshold is; if we’re being honest, could be any day.


The part of this that doesn’t jibe with me is the fact that they also released this incredibly detailed technical report on their architecture and training strategy. The paper is well-written and has a lot of specifics. Exactly the opposite of what you would do if you had truly made an advancement of world-altering magnitude. All this says to me is that the models themselves have very little intrinsic value / are highly fungible. The true value lies in the software interfaces to the models, and the ability to make it easy to plug your data into the models.

My guess is the consumer market will ultimately be won by 2-3 players that make the best app / interface and leverage some kind of network effect, and enterprise market will just be captured by the people who have the enterprise data, I.e. MSFT, AMZN, GOOG. Depending on just how impactful AI can be for consumers, this could upend Apple if a full mobile hardware+OS redesign is able to create a step change in seamlessness of UI. That seems to me to be the biggest unknown now - how will hardware and devices adapt?

NVDA will still do quite well because as others have noted, if it’s cheaper to train, the balance will just shift toward deploying more edge devices for inference, which is necessary to realize the value built up in the bubble anyway. Some day the compute will become more fungible but the momentum behind the nvidia ecosystem is way too strong right now.


What has changed is the perception that people like OpenAI/MSFT would have an edge on the competition because of their huge datacenters full of NVDA hardware. That is no longer true. People now believe that you can build very capable AI applications for far less money. So the perception is that the big guys no longer have an edge.

Tesla had already proven that to be wrong. Tesla's Hardware 3 is a 6 year old design, and it does amazingly well on less than 300 watts. And that was mostly trained on a 8k cluster.


The perception only makes sense if it is "that's it, pack up your stall" for AI.

I think what really happened is day to day trading noise. Nothing fundamentally changed, but traders believed other people believed it would.


I mean, I think they still do have an edge - ChatGPT is a great app and has strong consumer recognition already, very hard to displace.. and MSFT has a major installed base of enterprise customers who cannot readily switch cloud / productivity suite providers. So I guess they still have an edge it’s just nore of a traditional edge.


Microsoft don't have to use OpenAI though, they could swap that out underneath for the business applications.


and it is even questionable whether "bundling" AI in every product is legal wrt anti-competitive laws (i.e. the IE case)


Yes, it is still a valid business model and I would expect MSFT to continue to make profits.


> The part of this that doesn’t jibe with me is the fact that they also released this incredibly detailed technical report on their architecture and training strategy. The paper is well-written and has a lot of specifics. Exactly the opposite of what you would do if you had truly made an advancement of world-altering magnitude.

I disagree completely on this sentiment. This was in fact the trend for a century or more (see inventions ranging from the polio vaccine to "Attention is all you need" by Vaswani et. al.) before "Open"AI became the biggest player on the market due and Sam Altman tried to bag all the gains for himself. Hopefully, we can reverse course on this trend and go back to when world-changing innovations are shared openly so they can actually change the world.


Exactly. There's a strong case for being open about the advancements in AI. Secretive companies like Microsoft, OpenAI, and others are undercut by DeepSeek and any other company on the globe who wants to build on what they've published. Politically there are more reasons why China should not become the global center of AI and less reasons why the US should remain the center of it. Therefore, an approach that enables AI institutions worldwide makes more sense for China at this stage. The EU for example has even less reason now to form a dependency on OpenAI and Nvidia, which works to the advantage of China and Chinese AI companies.


Even the "Language Models are Unsupervised Multitask Learners" paper was pretty open; I'd say even more open than the R1 paper.


I’m not arguing for/against the altruistic ideal of sharing technological advancements with society, I’m just saying that having a great model architecture is really not a defensible value proposition for a business. Maybe more accurate to say publishing everything in detail indicates that it’s likely not a defensible advancement, not that it isn’t significant.


Here is a great interview. They don’t seem to care that much about money. They are already profitable.

https://www.chinatalk.media/p/deepseek-ceo-interview-with-ch...

> Money has never been the problem for us; bans on shipments of advanced chips are the problem.


I always thought AMZN is the winner since I looked into Bedrock. When I saw Claude on there it added a fuck yeah, and now the best models being open just takes it to another level.

AMZN: no horse picked, we host anything

MSFT: Open AI

GOOGLE: Google AI

AMZN is in the strongest position.


AWS’s usual most doesn’t really apply here. AWS is Hotel California — if your business and data is in AWS, the cost of moving any data-intensive portion out of AWS is absurd due to egress fees. But LLM inference is not data-transfer intensive at all — a relatively small number of bytes/tokens go to the model, it does a lot of compute, and a relatively small number of tokens come back. So a business that’s stuck in AWS can cost-effectively outsource their LLM inference to a competitor without any substantial egress fees.

RAG is kind of an exception, but RAG still splits the database part from the inference part, and the inference part is what needs lots of inference-time compute. AWS may still have a strong moat for the compute needed to build an embedding database in the first place.

Simple, cheap, low-compute inference on large amounts of data is another exception, but this use will strongly favor the “cheap” part, which means there may not be as much money in it for AWS. No one is about to do o3-style inference on each of 1M old business records.


You can also use Claude, Mistral, Llama and others on Google Vertex, similar to Bedrock.


You are not taking into account why people are willing to pay exceedingly high prices for GPUs now and that the underlying reason may have been taken away.


Build trust by releasing your inferior product for free and as open as possible. Get attention, then release your superior product behind paywall. Name recognition is incredibly important within and outside of China.

Keep in mind, they’re still competing with Baidu, Tencent and other AI labs.


My partner is trying to start a career in intensive care research and submitted for a major NIH grant a few months ago that was given a high score (= would probably be funded). Because of the “pause” we’re now unsure if it will get funded at all. These things take months of prep (years if you consider all of the prior publication work) and if you miss your window it could just sink your career before it gets started.

Anyone doing medical research can easily make more money as a clinician in private practice. If we force these people out, it’s the taxpayers loss. Unless you believe medical research as a whole is a waste of money, which… I would disagree on. If it’s simply a matter of changing research priorities, there are already mechanisms to influence that without shutting down the whole system. This is just ham-handed incompetence as a show of force.


> As I wrote in the book, “If creators are speaking their authentic truths, how can they also be accountable to audience feedback? I am personally bemused to see “authenticity” invoked as a criterion for what is ultimately and obviously a performance;

Who are you to deny their authenticity, though? If authenticity is being true to one’s own character, and one’s entire character is driven by YouTube video metric optimization (and perhaps ultimately by the profit thereby obtained), then isn’t their behavior on screen authentic?

Put another way, if MrBeast says “your goal is to make me excited to be on screen”, he’s explicitly saying he doesn’t want to have to act or otherwise be inauthentic on screen. Whether his excitement about a certain topic is tied to the views he expects it to garner is immaterial, if that’s his authentic motivation.

Or put yet another way, what drives anybody’s “authentic” behavior? What audience are they playing to? It may not be the entire internet, but it’s certainly influenced by “performance” in front of friends and family. We’re all Creations of our environment. MrBeast has just kind of found himself in an environment where feedback from YouTube videos motivates him and creates a ton of positive feedback loops.


At this point why wouldn’t a slightly crazy treasure hunter investor offer to front tens of millions to like, buy the landfill from the city?


> The council said that excavating the landfill site would let harmful substances escape into the environment, endangering residents with "potentially serious risks which raises public health issues and environmental concerns," the ruling said.

The extreme unlikelihood of finding the drive in a salvageable condition combined with the near certainty of polluting the local community makes it a pretty clear decision to me.


I don't get why there would be any risk of polluting the local community. Just create a second-landfill, properly lined - in exactly the same way thousands of landfills each year are created near to the first one - then just run an industrial-conveyor from Landfill A -> Landfill B. Create a spill-container underneath if you want to be 100% super certain 0.000% of pollution.

Run at a reasonable rate - should be relative straightforward for a team to pick through the garbage, particularly because it's metal.

This project gets rebooted when/if bitcoin hits $1mm.


It would be terrible for air pollution - this isn’t just a big pile of trash, it’s years of compacted municipal refuse that has to be excavated and pulled apart. If you dig through 100k tons of landfil, you will fill the surrounding air with toxic/pathogenic dust.


Liquids.


Monkey butter.


They'd need to provide sufficient confidence that they were going to continue to operate the landfill in accordance with all regulations, not just declare bankruptcy when the hunt is over and leave an unsafe mess and a financial black hole. Possibly the set intersection of treasure hunters with that character AND optimism that the drive remains useful is rather small.


They'd still need permission from the licensing authority technically, waste is a pretty highly regulated thing here and rightly so. Plus, this guy could be misremembering.


Yeah a lot of us that knew of bitcoin but didn’t really get in to it seem to remember generating a coin or two but can’t remember where we left it.


Agreed. I became interested in Bitcoin in 2013. In 2017 I realized it was never going mainstream for payments when I saw CNBC talking about it without one reference to actually using it as money. I've become pretty cynical about the movement since then, though I wouldn't mind stumbling upon an old private wallet.


I don’t think people fully appreciate yet how much of LLMs’ value comes from their underlying dataset (I.e. the entire internet - probably .. quadrillions..? of tokens of text) rather than the model + compute itself.

If you’re trying to predict something within the manifold of data on the internet (which is incredibly vast, but not infinite), you will do very well with today’s LLMs. Building an internet-scale dataset for another problem domain is a monumental task, still with significant uncertainty about “how much is enough”.

People have been searching for the right analogy for “what type of company is Open AI most like?” I’ll suggest they’re like an oil company, but without the right to own oil fields. The internet is the field, the model is the refining process (which mostly yield the same output but with some variations - not dissimilar from petroleum products).. and the process / model is a significant asset. And today, Nvidia is the only manufacturer of refining equipment.


This is an interesting analogy. Of course oil extraction and refining are very complex, but most of the value in that industry is simply the oil.

If you take the analogy further, while oil was necessary to jumpstart the petrochemical industry, biofuels and synthetic oil could potentially replace the natural stuff while keeping the rest of the value chain in tact (maybe not economical, but you get the idea). Is there a post-web source of data for LLMs once the well has been poisoned by bots? Maybe interactive chats?


Over the past year+ I’ve developed a litmus test for whether a new AI feature is valuable or not:

If it were presented to me without using the words “AI” or “Intelligence”, would I give a s%!t about it? Can it even be described without using those words? If not, it is not valuable.


Do you just pay the bill for the resources indefinitely?


I'm not the person you're asking, but I maintain a number of scraping projects. The bills are negligible for almost everything. A single $3/mo VPS can easily handle 1M QPS (enough for all the small projects put together), and most of these projects only accumulate O(10GB)/yr.

Doing something like grabbing hourly updates of the inventory of every item in every Target store is a bit more involved, and you'll rapidly accumulate proxy/IP/storage/... costs, but 99% of these projects have more valuable data at a lesser scale, and it's absolutely worth continuing them on average.


Inbound data is typically free on cloud VMs. CPU/RAM usage is also small unless you use chromedriver and scrape using an entire browser with graphics rendered on CPU. We're taking $5/mo for most scraping projects


I paying < $0.50 a month, and that's primarily driven by S3. For the scraping itself I'm using lambda, with maybe minutes of runtime per day.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: