I lead an applied AI research team where I work - which is a mid-sized public enterprise products company. I've been saying this in my professional circles quite often.
We talk about scaling laws, superintelligence, AGI etc. But there is another threshold - the ability for humans to leverage super-intelligence. It's just incredibly hard to innovate on products that fully leverage superintelligence.
At some point, AI needs to connect with the real world to deliver economically valuable output. The ratelimiting step is there. Not smarter models.
In my mind, already with GPT-4, we're not generating ideas fast enough on how best to leverage it.
Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
We're seeing much less of "it's making mistakes" these days.
If we have open-source models that match up to GPT-4 on AWS / Azure etc, not much point to go with players like OpenAI / Anthropic who may have even smarter models. We can't even use the dumber models fully.
Your paycheck depends on people believing the hype. Therefore, anything you say about "superintelligence" (LOL) is pretty suspect.
> Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
So, what, you're going to build a model to instruct the model? And how do we instruct that model?
This is such a transparent scam, I'm embarrassed on behalf of our species.
Snarky tone aside, there are different audiences. For example, I primarily work with web dev and some DevOps and I can tell you that the state of both can be pretty dire. Maybe not as much in my particular case, as in general.
Some examples to illustrate the point: supply chain risks and an ever increasing amount of dependencies (look at your average React project, though this applies to most stacks), overly abstracted frameworks (how many CPU cycles Spring Boot and others burn and how many hoops you have to jump through to get thigns done), patterns that mess up the DBs ability to optimize queries sometimes (EAV, OTLT, trying to create polymorphic foreign keys), inefficient data fetching (sometimes ORMs, sometimes N+1), bad security practices (committed secrets, anyone? bad usage of OAuth2 or OIDC?), overly complex tooling, especially the likes of Kubernetes when you have a DevOps team of one part time dev, overly complex application architectures where you have more more services than developers (not even teams). That's before you even get into the utter mess of long term projects that have been touched by dozens of developers over the years and the whole sector sometimes feeling like wild west, as opposed to "real engineering".
However, the difference here is that I wouldn't overwhelm anyone who might give me money with rants about this stuff and would navigate around those issues and risks as best I can, to ship something useful at the end of the day. Same with having constructive discussions about any of those aspects in a circle of technical individuals, on how to make things better.
Calling the whole concept a "scam" doesn't do anyone any good, when I already derive value from the LLMs, as do many others. Look at https://www.cursor.com/ for example and consider where we might be in 10-20 years. Not AGI, but maybe good auto-complete, codegen and reasoning about entire codebases, even if they're hundreds of thousands of lines long. Tooling that would make anyone using it more productive than those who don't. Unless the funding dries up and the status quo is restored.
I think at its core it's not that there isn't value or future value, but currently there is an assertion, maybe some blind faith, that it's inevitable that a future version will deliver a free lunch for society.
I think the testimonies often repeated by coders that use these code completion tools is that "it saved me X amount of time on this one problem I had, therefore it's great value". The issue is that these all fall into a research of n=1 test subjects. It's only useful information for the subject. It appears we don't realize in these moments that when we use those examples, even to ourselves, we are users reviewing a product, as opposed to validating if our workflow is not just different but objectively better.
The truth lies in the aggregate data of the quality and crucially the speed by which fixes and requirements are being implemented at scale across code bases.
Admittedly, a lot of code is being generated, so I don't think I can say everyone hates it, but until someone can do some real research on this, all we have are product reviews.
> I think at its core it's not that there isn't value or future value, but currently there is an assertion, maybe some blind faith, that it's inevitable that a future version will deliver a free lunch for society.
Except in the case of "AI" we get new releases that seem somewhat impressive and therefore extend the duration for which the inflated expectations can survive. For what it's worth, stuff like this is impressive https://news.ycombinator.com/item?id=41693087 (I fed my homepage/blog into it and the results were good, both when it came to the generated content and the quality of speech)
> The truth lies in the aggregate data of the quality and crucially the speed by which fixes and requirements are being implemented at scale across code bases.
Honestly? I think we'll never get that, the same way I cannot convincingly answer "How long will implementing functionality X in application Y with the tech stack Z for developer W take?"
We can't even estimate tasks properly and don't have metrics for specific parts of the work (how much creating a front end takes, how much for a back end API, how much for the schema and DB migrations, how much for connecting everything, adding validations, adding audit, fixing bugs etc.) because in practice nobody splits them up in change management systems like Jira so far, nor are any time tracking solutions sophisticated enough to figure those out and also track how much of the total time is just procrastination or attending to other matters (uncomfortable questions would get asked them, way too metrics would be optimized for).
So the best we can hope for is some vague "It helps me with boilerplate and repeatable code which is most of my enterprise CRUD system by X% and as a result something that would take me Y weeks now takes me Z weeks, based on these specific cases." Get enough of those empirical data points and it starts to look like something useful.
I think lots of borderline scams and/or bad products based on overblown products will get funded but in a decade we'll probably have mostly those sticking around that have actual utility.
The top comment of your HN link is exactly the issue at hand
> don't know what I would use a podcast like this for, but the fact that something like this can be created without human intervention in just a few minutes is jaw dropping
AI has recently gotten good at doing stuff that seems like it should be useful, but the limitations aren’t obvious. Self driving cars, LLM’s, Stable Diffusion etc are awesome tech demos as long as you pick the best output.
The issue is the real world cares a lot more about the worst outcomes. Driving better than 99% of people 24/7 for 6 months and then really fucking up is indistinguishable from being a bad driver. Code generation happens to fit really well because of how people test and debug code not because it’s useful unsupervised.
Currently balancing supervision effort vs time saved depends a great deal on the specific domain and very little about how well the AI has been trained, that’s what is going to kill this hype cycle. Investing an extra 100 Billion training the next generation of LLM isn’t going to move the needles that matter.
1. It's not actually refuting the point being made which is "it's seems hard to take advantage of LLMs' capabilities", which to me seems like a good point that stands on it's own, regardless of who is saying it.
2. The original post is, to my eyes, not implying anything about training a model to help use other models, so the second part seems irrelevant.
I disagree. The comment is striking directly at the claim that people aren't taking "advantage of LLMs' capabilities". This is their capability and no amount of "clear communication" is going to change that
One of the rules of HN is to assume good intent. People do and say the right thing often even when it's antithetical to their source of income. If there is a substantive way to otherwise then say it, don't immediately write-off people's opinions, especially people in their fields because of a possibility of bias.
OK, but the post this was in response to was bloviating about all kinds of sci-fi stuff like "super-intelligence" and the like. It was the opposite of "antithetical to their source of income", instead it was playing into some techno-futurist faith cult.
There is a strong argument that super-intelligence is already in the rear-view mirror. My computer is better than me at almost everything at this point; creativity, communication, scientific knowledge, numerical processing, etc. There is a tiny sliver of things that I've spent a life working on where I can consistently outperform a CPU, but it is not at all clear how that could be defensible given the strides AI has made over the last few decades. The typical AI seems more capable than the typical human to me. If that isn't super-intelligence then whatever super-intelligence is can't be far away.
> given the strides AI has made over the last few decades
This is where I lost the plot. The techno-futurists always seem to try to co-opt Moore's law or similar scaling laws and claim it'll somehow take care of whatever scifi bugaboo du jour they're peddling, without acknowledging that Moore's law is specifically about transistor density and has nothing to do with "strides" in "AI".
> whatever super-intelligence is can't be far away.
How do you figure? Or is it just an article of your faith?
It's been the same old, tired argument for seven decades--"These darned computers can count real real fast so therefore any day now they'll be able to think for themselves!" But so far nobody's shown it to be true, and all the people claiming it's right around the corner have been wrong. What makes this time different?
We're still seeing exponential upswing in compute and we appear to already be probing around human capacity in the models. Past experience suggests that once AIs are within spitting distance of human ability they will exceed what a human mind can do in short order.
I'm not sure what the amount of computer time spent on training the models has to do with anything, the article states it "is the best predictor of broad AI capabilities we have" without attempting to defend the claim. The "humies" (to use a dated term) benchmarks are interesting but clearly not super indicative of real world performance--one merely has to interact with one of these LLMs to find their (often severe) limitations, and it's not clear at all that more computer time spent on training will actually make them better.
EDIT: re: the computer time metric, by the same token shouldn't block chains have changed the world by now if computer time is the predictor of success? It makes sense for the industry proponents of LLMs to focus on this metric, because ultimately that's what they sell. Microsoft, NVidia, Google, Amazon, etc all benefit astronomically from computationally intensive fads, be it chatbot parlor tricks or NFTs. And the industry at large does as well--a rising tide lifts all boats. It's not at all obvious any of this is worth something directly, though.
> I'm not sure what the amount of computer time spent on training the models has to do with anything
Fair enough. What do you think is driving the uptick of AI performance and why don't you think it will be correlated with the amount of compute invested?
The limitations business looks like a red herring. Being flawed and limited doesn't even disqualify an AI from being super-intelligent (whatever that might mean). Humans are remarkably flawed and limited, it takes a lot of setup to get them to a point where they can behave intelligently.
> EDIT: re: the computer time metric, by the same token shouldn't block chains have changed the world by now if computer time is the predictor of success?
That seems like it would be a defensible claim if you wanted to make it. One of the trends I keep an eye on is that log(price) is similar to the trend in log(hash rate) for Bitcoin. I don't think it is relevant though because Bitcoin isn't an AI system.
> The typical AI seems more capable than the typical human to me
Your microwave is more capable than a typical human.
If of course your definition of capabilities is narrowly defined to computations and ignores the huge array of things humans can do that computers are no where close to.
anything involving an interface with the physical world.
For example, running a lemonade stand.
You'd need thousands, if not millions of dollars to build a robot machine with a computerized brain capable of doing what a 6 year old child can do - produce lemonade from simple ingredients (lemon, sugar, water) and sell it to consumers.
Same with basically all cooking/food-service and hospitality tasks, an physical therapy type tasks (massage, chiropractor, etc)...
heck, even driving on public roads still doesn't seem to be perfect, despite 10+ years on investment and research by leading tech companies, although there is also a regulatory hurdle here.
You seem to have shifted the conversation's goalposts there - those are things that computers can do, it just costs a lot.
And, more to the point, they aren't indicative of intelligence. Computers have cleared the intelligence requirements to run a lemonade stand by a large margin - and the other tasks too for that matter.
> those are things that computers can do, it just costs a lot
One could travel between continents in minutes on an ICBM with a reentry vehicle bolted to the front but we don't because it's too expensive. It's a perfectly reasonable constraint to demand that a technology be cost effective. Otherwise it has no practical value.
> Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself
Ehm yes. That's because it actually doesn't work as well as the hype suggests, not because it's too "high bandwidth".
Moreover, this is exactly the frustration I've experienced when working with outsourced developers.
Which tells me the problem may be fundamental, not a technical one. It's not just a matter of needing "more intelligence". I don't question the intelligence or skill of the people on the outsourced team I was working with. The problem was simple communication. They didn't really know or understand our business and its goals well enough to anticipate all sorts of little things, and the lack of constant social interaction of the type you typically get when everybody's a direct coworker meant we couldn't build that mind-meld over time, either. So we had to pick up the slack with massive over-specification.
Not just with outsourced developers. Computer science graduates hired in house to assist in writing specialised engineering software can often be a dead weight in the team for quite some time. This is not just because they and the engineers don't speak the same language but also because the CS graduates know nothing about physics and engineering so they cannot properly evaluate requirements documents.
Left to their own devices they often implemented requirements that were full of errors or even completely unnecessary because they did not understand the domain well enough to ask pointed questions.
I'd contribute that we, the engineering class, are as a whole terrible communicators that confuse and cannot explain their own work. LLMs require clear communications, while the majority of LLM users attempt to use them with a large host of implied context that anyone would have a hard time following, not to mention a non-human software construct. The key is clear communications, which is a phrase that many in STEM don't have the education to know what that phrase really means, technically, realistically.
No, this is absolutely not the reason. The reason is that many people benefit financially as long as the hype train keeps going choo choo. So they lie to our faces and sleep like babies at night.
There is that too, I agree. The hype train is many people's entire careers. But AI does work, is indeed capable of many of the claims, I'm saying the naysayers are using it wrong. But you know who will use it very well? The hype train.
It works for some annoyances in life, though. For example, you can get it to write a complaint to an administration. It's good enough unless you'd rather be witty and write it yourself.
The "it's making mistakes" phase might be based on the testing strategy.
Remember the old bit about the media-- the stories are always 100% infalliable except strangely in YOUR personal field of expertise.
I suspect it's something similar with AI products.
People test them with toy problems -- "Hey ChatGPT, what's the square root of 36", and then with something close to their core knowledge.
It might learn to solve a lot of the toy problems, but plenty of us are still seeing a lot of hallucinations in the "core knowledge" questions. But we see people then taking that product-- that they know isn't good in at least one vertical-- and trying to apply it to other contexts, where they may be less qualified to validate if the answer is right.
I think a crucial aspect is to only apply chatbots' answers to a domain where you can rapidly validate their correctness (or alternatively, to take their answers with a huge pinch of salt, or simply as creative search space exploration).
For me, the number of times where it's led me down a hallucinated, impossible, or thoroughly invalid rabbit hole have been relatively minimal when compared against the number of times when it has significantly helped. I really do think the key is in how you use them, for what types of problems/domains, and having an approach that maximizes your ability to catch issues early.
* We're seeing much less of "it's making mistakes" these days.*
Perhaps less than before, but still making very fundamental errors. Anything involving number I'm automatically suspicious. Pretty frequently I'd get different answers for the same question (to a human).
e.g. ChatGPT will give an effective tax rate of n for some income amount. Then when asked to break down the calculation will come up with an effective tax rate of m instead. When asked how much tax is owed on that income will come up with a different number such that the effective rate is not n or m.
Until this is addressed to a sufficient degree, it seems difficult to apply to anything that involves numbers and can't be quickly verified by a human.
Yes. Numbers / math is pretty much instant hallucination.
But. Try this approach instead: have it generate python code, with print statements before every bit of math it performs. It will write pretty good code, which you then execute to generate the actual answer.
Simpler example: paste in a paragraph of text, ask it to count the number of words. The answer will be incorrect most of the time.
Instead, ask it to out each word in the text in a numbered list and then output the word count. It will be correct almost always.
My anecdotal learning from this:
LLMs are pretty human-like in their mental abilities. I wouldn't be able to simply look at some text and give you an accurate word count. I would point my finger / cursor to every word and count up.
The solutions above are basically giving LLMs some additional techniques or tools, very similar to how a human may use a calculator, or count words.
In the products we've built, there is an AI feature that generates aggregations of spreadsheet data. We have a dual unittest & aggregator loop to generate correct values.
The first step is to generate some unittests. And in order to generate correct numerical data for unittests, we ask it to write some code with math expressions first. We interpret the expressions, and paste it back into the unittest generator - which then writes the unittests with the correct inputs / outputs.
Then the aggregation generator then generates code until the generated unittests pass completely. Then we have the code for the aggregator function that we can run against the spreadsheet.
Takes a couple of minutes, but pretty bulletproof and also generalizable to other complex math calculations.
> Yes. Numbers / math is pretty much instant hallucination.
Programming is codified applied mathematics and involves numbers in all but the most trivial programs.
> LLMs are pretty human-like in their mental abilities.
LLM's are algorithms. Algorithms do not have "mental abilities", people do. Anthropomorphizing algorithms only serves to impair objective analysis of when they are, or are not, applicable.
> In the products we've built, there is an AI feature that generates aggregations of spreadsheet data. We have a dual unittest & aggregator loop to generate correct values.
> The first step is to generate some unittests. And in order to generate correct numerical data for unittests, we ask it to write some code with math expressions first. We interpret the expressions, and paste it back into the unittest generator - which then writes the unittests with the correct inputs / outputs.
> Then the aggregation generator then generates code until the generated unittests pass completely. Then we have the code for the aggregator function that we can run against the spreadsheet.
How is this not a classic definition of overfitting[0]?
Or is the generated code intentionally specific to, and only applicable for, a single spreadsheet?
> Anthropomorphizing algorithms only serves to impair objective analysis of when they are, or are not, applicable.
Actually, in this case, comparing how we as humans think to how LLM's work is in fact useful. It's hard for us to eyeball a word and say how many consonants are in it, we need to count. I wouldn't ask a human to eyeball a tax return and tell me what the totals are reliably without giving them the tools to add things up. LLM's are the same way.
It's true that anthropomorphizing in general can be a trap, but when working with LLM's it can be a useful guide in pointing the way towards workable solutions.
> Actually, in this case, comparing how we as humans think to how LLM's work is in fact useful.
Agreed. Contemplating the difference between what people and LLM's are is very useful IMHO. Understanding this is key to making informed decisions as to when LLM's can provide real value.
The assertion originally proffered, however, is quite different than your nuanced perspective:
> > LLMs are pretty human-like in their mental abilities.
>I wouldn't be able to simply look at some text and give you an accurate word count
Actually this ability is at very core of human brain. Yet we humans traded most of it for the ability to speak better. Nevertheless, brain can still read very fast as well as count objects/words really fast, but as you've never really trained, this part of the brain is optimised a lot to do other stuff. Try scanning the texts diagonally, and try to come up with a random word count and what this text is meaning. On the first tries you will be making a lot of errors, but eventually, just 100 hours of training (much less when you are kid) and you will be able to scan the texts in seconds and extract the meaning and the word count very accurately. This is a real technique people use for reading fast.
The thing is – the best and most useful ability of human brain is to adapt, from the very moment it adapted to the harsh reality of being expelled from trees and living on the ground – that's how language was created in the first place, as a result of this adaptation. LLM's can't adapt today, nor doesn't need to adapt, so their mental abilities will never come close to the brain which was adapting for millions of years. Something which is not adapting will become obsolete and come to it's end, this is a fundamental law.
> Perhaps less than before, but still making very fundamental errors.
Yes.
Suppose someone developed a way to get a reliable confidence metric out of an LLM. Given that, much more useful systems can be built.
Only high-confidence outputs can be used to initiate action. For low-confidence outputs, chain of reasoning
tactics can be tried. Ask for a simpler question. Ask the LLM to divide the question into sub-questions.
Ask the LLM what information it needs to answer the question, and try to get that info from a search engine.
Most of the strategies humans and organizations use when they don't know something will work for LLMs.
The goal is to get an all high confidence chain of reasoning.
If only they knew when they didn't know something.
There's research on this.[4] No really good results yet, but some progress. Biggest unsolved problem in computing today.
I remember watching IBM's Watson soundly beat Ken Jennings on Jeapordy. One of the things that sticks out most to me about the memory is that Watson had a confidence rating score for each answer it gave (and there were a few questions for which it had very low confidence). I didn't realize it at the time, but that was actually pretty impressive given the overconfidence issues LLMs have nowadays.
Of course, citing sources would always trump any kind of confidence rating in my mind; sources provide provenance, while confidence rating can be fudged just like a bogus answer can.
Providing sources gets interesting to because (a) LLMs have been known to hallucinate nonexistent sources and (b) there isn't a perfect way of knowing whether the source itself was generated by an LLM and contains hallucinations.
LLMs don't do math in that sense. They build a string of tokens out of a billion pre-weighted ones that gets a favorable probably distribution when taking your prompt into account. Change your prompt, get a different printout. There is no semantic understanding (in the sense of does what is printed make sense) and it therefore cannot plausibility check its response. A LLM will just print gibberish if it gets the best probability distribution of tokens. I'm sure that's something that will be addressed over time, but we are not there yet.
I'm not keen on marketing words like "superintelligence" but boiling it down that's what in my mind the OP said. These systems are limited in ways that we do not yet fully appreciate. They are not silver bullets for all or maybe even many problems. We need to figure out where they can be deployed for greater benefit.
> Perhaps less than before, but still making very fundamental errors. Anything involving number I'm automatically suspicious
Totally. They are large language models, not math models.
I think the problem is that 'some people' overhype them as universal tools to solve any problem, and to answer any question. But really, LLMs excel in generating pretty regular text.
Problem is also they DO work in SOME cases. e.g. ask ChatGPT to breakdown the calculation to determine tax owed on $xxx of income, and the result is correct in most cases. So there is this perception of intelligence but also fails spectacularly in very simple cases.
i tell chatgpt i pay 5 dollars for every excellent response, and i make it keep track of how much i spend per chat session (of course in addition to my normal course of work)
it does 2 things. 1 tells me how deep i am in the conversation and 2 when the computation falls apart, i can assume other things in its response will be trash as well. and sometimes number 3 how well the software is working. ranges from 20$ to 35$ on average but a couple days they go deep to 45$ (4,7,9 responses in chat session)
today i learned i can knock the computation loose in a session somewhere around the 4th reply or $20 by injecting a random number in my course work and it was latching onto the number instead of computing $
We're seeing much less of "it's making mistakes" these days.
Is this because it's actually making less mistakes, or is it just because most people have used it enough now to know not to bother with anything complex?
Today I carved a front panel for my cyberdeck project out of a composite wood board. I hand-drafted everything, and planned out the wiring (though I won't be onto the soldering phase for a while now). It felt good. I don't think having a 3d printer + AI designing my cyberdeck would feel the same.
Yeah I think the whole “humans won’t need to do <insert creative-adjacent-and-skilled-labour-here>” argument misses is the fundamental human aspect.
Computers might do some things “better” than humans, but we’re still going to do things because we _want_ to. Often the fun is in the doing.
LLM’s can vomit out code faster than me, I still enjoy writing software. They’ll vomit out a novel or a song, but I still like reading/listening to stuff a human has taken time and effort to create something, because they can.
Fifteen was a massive success partially because Taylor was only 19 when she recorded it and so it sounded authentic. It's a human singing about her experiences. Also, the lyrics both on macro and micro levels is exceptionally well done and genuinely new.
In other words, in another area: you can take all the art made in the world before Guernica and throw it in any probabilistic network but for sure you won't get Guernica out of that.
> In my mind, already with GPT-4, we're not generating ideas fast enough on how best to leverage it.
It's a token prediction machine. We've already generated most of the ideas for it, and hardly any of them work because see below
> Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
No. Gettin AI to work you need to make an AI, and not a token prediction machine which, however wonderful:
- does not understand what it is it's generating, and approaches generating code the same way it approaches generating haikus
- hallucinates and generates invalid data
> Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
Indeed. Instead of asking why, you're wildly fantasizing about running out of ideas and pretending you can make this work through other means of communication.
The model is doing the exact same thing when it generates "correct" output as it does when it generates "incorrect" output.
"Hallucination" is a misleading term, cooked up by people who either don't understand what's going on or who want to make it sound like the fundamental problems (models aren't intelligent, can't reason, and attach zero meaning to their input or output) can be solved with enough duct tape.
I like the analogy of an Ouija board: "This magical conduit to supernatural forces sometimes gives out nonsense, but soon we'll fix it so that it always channels the correct ghosts and spirits."
It was "cooked up" by people like Joshua Maynez, Shashi Narayan, et al, who I'm going to guess understand what is going on, and adopted by the rest of the field.
> Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
I find this funny because for what I use ChatGPT for - asking programming questions that would otherwise go to Google/StackOverflow - I have a much better time writing queries for ChatGPT than Google, and getting useful results back.
Google will so often return StackOverflow results that are for subtly very different questions, or ill have to squint hard to figure out how to apply that answer to my problem. When using ChatGPT, i rarely have to think about how other people asked the question.
We have ideas on how to leverage it. But we keep them to ourselves for our products and our companies. AI by itself isn’t a breakthrough product the same way that the iPhone or the web was. It’s a utility for others to enhance their products or their operations. Which is the main reason why so many people believe we’re in an AI bubble. We just don’t see the killer feature that justifies all that spending.
Your use of the word superintelligence is jarring to me. That's not yet a thing and not yet visible on the horizon. That aside, the point I like that you seem to be making is along the lines of: we overestimate the short term impact of new tech, but underestimate the long term impact. There is a lot to be done and a lot of refinement to come.
Yes, this is in my mind where people will find the fabled moat they search for too.
SOTA models are impressive, as is the idea of building AGIs that do everything for us, but in the meantime there are a lot of practical applications of the open source and smaller models that are being missed out on in my opinion.
I also think business is going to struggle to adapt and existing business is at a disadvantage for deploying AI tools, after all, who wants to replace themselves and lose their salary? Its a personal incentive not to leverage AI at the corporate level.
>Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
the problem is really, can it learn "I need to turn this over to a human because it is such an edge case that there will not be an automated solution."
"In my mind, already with GPT-4, we're not generating ideas fast enough on how best to leverage it."
This is the main bottle neck, in my kind. A lot of people are missing from the conversation because they don't understand AI fully. I keep getting glimpses of ideas and possibilities and chatting through a browser ain't one of them. On e we have more young people trained on this and comfortable with the tech and understanding it, and existing professionals have light bulbs go off in their heads as they try to integrate local LLMs, then real changes are going to hit hard and fast. This is just a lot to digest right now and the tech is truly exponential which makes it difficult to ideate right now. We are still enveloping the productivity boost from chatting.
I tried explaining how this stuff works to product owners and architects and that we can integrate local LLMs into existing products. Everyone shook their head and agreed. When I posted a demo in chat a few weeks later you would have thought the CEO called them on their personal phone and told them to get on this shit. My boss spent the next two weeks day and night working up a demo and presentation for his bosses. It went from zero to 100kph instantly.
Just the fact that I can have something proficient in language trivially accessible to me is really useful. I'm working on something that uses LLMs (language translation), but besides that I think it's brilliant that I can just ask an LLM to summarise my prompt in a way that gets the point across in far fewer tokens. When I forget a word, I can give it a vague description and it'll find it. I'm terrible at writing emails, and I can just ask it to point out all the little formalisms I need to add to make it "proper".
I can benchmark the quality of one LLM's translation by asking another to critique it. It's not infallible, but the ability to chat with a multilingual agent is brilliant.
It's a new tool in the toolbox, one that we haven't had in our seventy years of working on computers, and we have seventy years of catchup to do working out where we can apply them.
It's also just such a radical departure from what computers are "meant" to be good at. They're bad at mathematics, forgetful, imprecise, and yet they're incredible at poetry and soft tasks.
Oh - and they are genuinely useful for studying, too. My A Level Physics contained a lot of multiple choice questions, which were specifically designed to catch people out on incorrect intuitions and had no mark scheme beyond which answer was correct. I could just give gpt-4o a photo of the practice paper and it'd tell me not just the correct answer (which I already knew), but why it was correct, and precisely where my mental model was incorrect.
Sure, I could've asked my teacher, and sometimes I did. But she's busy with twenty other students. If everyone asked for help with every little problem she'd be unable to do anything else. But LLMs have infinite patience, and no guilt for asking stupid questions!
> Do you speak two or more languages? Anyone that does is wary of automated translations, especially across estranged cultures.
I'm aware that it's imperfect. It's still pretty cool that they're multilingual as an emergent property - and rather than merely translating, you can discuss aspects of another language. Of course it hallucinates - that's the big problem with LLMs - but that doesn't make it useless. Besides, while automated translators aren't perfect, they're the only option in a lot of situations.
> It's data analysis at scale, and reliant on scrapping what humans produced. A word processor does not need TB of eBooks to do it's job.
And I'm not replacing word processors with LLMs, nor did I claim that they were trained in a vacuum.
> Because there's no wrong or right about poetry. Would you be comfortable having LLMs managing your bank account?
No, which is why I don't intend to... they're probabilistic and immensely fallible. I wasn't claiming they were gods. My point was that they're far outside of what we usually use computers for (e.g. managing your bank account), and that opens a lot of possibilities.
> That would be hand-holding, not learning.
Correct. That's why we don't do it... and in normal circumstances, it'd probably be better to deeply consider the problem in order to work out why you were wrong. But a week before the exam it's excellent. I got a healthy A*, so it doesn't seem to have hurt.
What exactly are you arguing against? I'm not convinced they're a route to AGI either, and I'm not about to replace my graphics driver with an LLM, nor code written by one - but you seem to have a vendetta against them that's lead to you sidestepping every single point I made with a snide remark against claims you seem to have imagined me making.
I will be glad to see the day when LLMs will be able to play Minecraft, so I won't have to. Then I can just relax and watch someone else do everything for me without lifting a single finger.
Well, I guess you've caught me then. Naturally, adults would be more preoccupied with how they can use AI to improve their lives and get ahead, instead of playing Minecraft.
In the same line, there are also a phrase about technology, "is everything that doesn’t work yet." by Danny Hillis, "Electric motors were once technology – they were new and did not work well. As they evolved, they seem to disappear, even though they proliferated and were embedded by the scores into our homes and offices. They work perfectly, silently, unminded, so they no longer register as “technology.” https://kk.org/thetechnium/everything-that/
On an amusing note, I've read something similar: Everything that works stops being called philosophy. Science and math being the two familiar examples.
Just in case anyone's curious, this is from Bertrand Russell's "the history of philosophy".
> As soon as definite knowledge concerning any subject becomes possible, this subject ceases to be called philosophy, and becomes a separate science.
I'm not actually sure I agree with it, especially in light of less provable schools of science like string theory or some branches of economics, but it's a great idea.
In college my Professor told me he was disappointed that the string theory hype was dying down because it made for a great subfield of algebraic geometry.
PhD itself is an abbreviation for "Doctor of Philosophy." The title is more about the original Greek "lover of wisdom" than about the modern academic discipline of philosophy. https://en.wikipedia.org/wiki/Doctor_of_Philosophy
Doctor is similar - in the US, when someone says "Doctor" they usually mean "Medical Doctor" but "Doctor" just comes from the Greek "teacher" / "scholar" which is more broad and the title can still be used officially and correctly for PhDs. https://en.wikipedia.org/wiki/Doctor_(title)
Just a little correction. Doctor is Latin and roughly means "someone who has learned a lot."
Science also originally referred to knowledge. What we think of as "science" used to be called the natural sciences. Sometimes people get confused because I have a B.S. in Classics because science has lost that broader meaning.
Indeed, in the Summa Theologica, Thomas Aquinas asks if theology is a science, and concludes that it is. He also gave lip service to logical rigor and falsifiability, in the latter case by encouraging the discipline of asking contrary questions and answering them. What he didn't do was appeal to empirical data, to any great extent.
I think the reasoning behind "doctor of philosophy" may be lost to history. All knowing Wikipedia suggests that it didn't happen at once. My take was that the requirements for a modern PhD were added long after the title was adopted.
I suspect there was a time when a person could be well versed in multiple of what are now separated fields, and that you had to be a philosopher to make sense of science and math. Also, as science was flexing its own wings, claiming to be a philosopher might have been a way to gain an air of respectability, just like calling a physician "doctor" when the main impact of medicine was to kill rich people.
It is still called natural science, but it used to be called natural philosophy.
And it is interesting, as you say, that when it comes to Bachelor/Master/Doctor of Science/Art/Philosophy (even professor), these are all titles formed from arbitrary terms that have been enshrined by the institutions that give people these titles.
There is a reason for that. People who inquired into the actual functioning of the world used to be called philosophers. That's why so many foundations of mathematics actually come from philosophers. The split happened around the 17th century. Newton still called his monumental work "Natural Philosophy", not "Physics".
This is also true for consciousness or sentience. No matter how surprising abilities of non-human beings (and computer agents), it is something mysterious that only humans do.
I won't say that things like stoicism or humanism never worked. But they never got to the level of strict logical or experimental verifiability. Physics may be hard science, but the very notion of hard science, hypotheses, demand to replicate, demand to be able to falsify, etc, are all philosophy.
Exactly what I wrote recently: "The "AI effect" is behind some of the current confusion. As John McCarthy, AI pioneer who coined the term "artificial intelligence," once said: "As soon as it works, no one calls it AI anymore." This is why we often hear that AI is "far from existing." This led to the formulation of the Tesler's Theorem: "AI is whatever hasn't been done yet.""
https://www.lycee.ai/blog/there-are-indeed-artificial-intell...
> As soon as it works, no one calls it AI anymore.
So, what are good examples of some things that we used to call AI, which we don't call AI anymore because they work? All the examples that come to my mind (recommendation engines, etc.) do not have any real societal benefits.
Some examples are navigation algorithms, machine learning, neural networks, fuzzy logic, or computer vision. I personally learned several of those in a "Artificial Intelligence" CS course ~15 years ago, but most people would never think to call Google Maps, a smart thermostat learning their habits, or their doorbell camera recognizing faces "AI".
It's only recently with generative AI that you see any examples of the opposite, people outside the field calling LLMs or image generation "AI".
Chess engines. People once believed that in order for a computer to play chess we would first have to create an intelligence comparable to humans. Turns out you can win by evaluating millions of positions per turn instead. It would be somewhat disappointing for someone who was looking for a human-like opponent, that it can simply be brute forced. But it may be that a lot of “intelligence” is like that.
It’s the cognitive equivalent of the Indiana Jones sword vs gun scene.
If you trace the etymology of computer in English, it means something like "small chalk/limestone pebbles used for counting do-er." (See Latin "calx".)
Lisp's inventor, John McCarthy, was an AI researcher. (The US government started funding AI research in the 1950s, expecting progress to be much faster than it actually was.)
The original SQL was essentially Prolog restricted to relational algebra and tuple relational calculus. SQL as is happened when a lot of cruft was added to the mathematical core.
This is a pretty common perspective that was introduced to me as “shifting the goalposts” in school. I have always found it a disingenuous argument because it’s applied so narrowly.
Humans are intelligent + humans play go => playing go is intelligent
Humans are intelligent + humans do algebra => doing algebra is intelligent
Meanwhile, humans in general are pretty terrible at exact, instantaneous arithmetic. But we aren’t claiming that computers are intelligent because they’re great at it.
Building a machine that does a narrowly defined task better than a human is an achievement, but it’s not intelligence.
Although, in the case of LLMs, in context learning is the closest thing I’ve seen to breaking free from the single-purpose nature of traditional ML/AI systems. It’s been interesting to watch for the past couple years because I still don’t think they’re “intelligent”, but it’s not just because they’re one trick ponies anymore. (So maybe the goalposts really are shifting?) I can’t quite articulate yet what I think is missing from current AI to bridge the gap.
> Meanwhile, humans in general are pretty terrible at exact, instantaneous arithmetic. But we aren’t claiming that computers are intelligent because they’re great at it.
"The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." - Edsger Dijkstra
> breaking free from the single-purpose nature of traditional ML/AI systems
it is really breaking free? so far LLMs in action seem to have a fairly limited scope -- there are a variety of purposes to which they can be applied but it's all essentially the same underlying task
It _is_ all the same task (generating text completions) but that pretraining on that task has seemed to be suitably abstract for the model to work decently well on more narrow problems—certainly they work dramatically better at a collection of tasks like “sentiment analysis of movie ratings” and “spam classifier” than if I took purposes built models for either of those tasks and tried using them like I could an LLM.
Someday if we have computers that are capable of doing 100% of the cognitive tasks humans do, better than any human can, we might still say it’s “just” doing X or Y. It might even be disappointing that there isn't a “special sauce” to intelligence. But at the end of the day, the mechanism isn’t important.
We are already playing with some incredible ingredients. Machines that can instantly recall information, (in principle) connect to any electronic device, and calculate millions of times faster than brains, and perfectly self-replicate. Just using these abilities in a simple way is already pretty darn powerful.
People innately believe that intelligence isn't an algorithm. When a complex problem presents itself for the first time, people think "oh, this must be so complex that no algorithm can solve it, only AI," and when an algorithmic solution is found, people realise that the problem isn't that complex.
Indeed, if AI was an algorithm, imagine what would it feel like to be like one: at every step of your thinking process you are dragged by the iron hand of the algorithm, you have no agency in decision making, for every step is pre-determined already, and you're left the role of an observer. The algorithm leaves no room for intelligence.
Is that not the human experience? I have no “agency” over the next thought to pop into my head. I “feel” like I can choose where to focus attention, but that too is a predictable outcome arising from the integration of my embryology, memories, and recently reinforced behaviors. “I” am merely an observer of my own mental state.
But that is an uncomfortable idea for most people.
If this was true, you could lay back relaxed and watch where your brain takes you. But we experience life as a never ending stream of choices, usually between what's easy and what's right, and pursuing the right choice takes a constant effort. We are presented with problems and have to figure out solutions on our own, with no guarantees of success.
This "I'm just an observer" idea may be true at some higher level, if you're a monk on the threshold of nirvana, but for common folk this mindset leads nowhere.
The „making choices“ could just be an illusion where it’s already clear from the inputs what you’re going to do. You rejecting determinism could already be determined and you couldn’t even choose to believe it. That’s not falsifiable of course, but your point doesn’t really contradict the idea of determinism due to the possibility.
The other option you don't mention is "algorithms can solve it, but they do something different to what humans do". That's what happened with Go and Chess, for example.
I agree with you that people don't consider intelligence as fundamentally algorithmic. But I think the appeal of algorithmic intelligence comes from the fact that a lot of intelligent behaviours (strategic thinking, decomposing a problem into subproblems, planning) are (or at least feel) algorithmic.
it mostly depends on one's definition of an algorithm.
our brain is mostly scatter-gather with fuzzy pattern matching that loops back on itself. which is a nice loop, inputs feeding in, found patterns producing outputs and then it echoes back for some learning.
but of course most of it is noise, filtered out, most of the output is also just routine, most of the learning happens early when there's a big difference between the "echo" and the following inputs.
it's a huge self-referential state-machine. of course running it feels normal, because we have an internal model of ourselves, we ran it too, and if things are going as usual, it's giving the usual output. (and when the "baseline" is out of whack then even we have the psychopathologies.)
Exactly. Machine learning used to be AI and "AI-driven solutions" were peddled over a decade ago. Then that died down. Now suddenly every product has to once again be "powered by AI" (even if under the hood all you're running is a good 'ol SVM).
Bert is already not an LLM and the vector embedding it generates are not AI. It is also first general solution for natural language search anyone has come up with. We call them vector databases. Again I'd wager this is because they actually work.
Interestingly, a big barrier to voice recognition is the same as with AI assistants - they don't understand context, and so they have a difficult time navigating messy inputs that you need assumed knowledge and contextual understanding for. Which is kinda the baseline for how humans communicate things to eachother with words in the first place.
There are really very few situations where people really want voice recognition. The main one is hands free controls when driving or for as a remote control for TV or music.
Exactly, yeah. The high frequency of mistakes from voice commands is a specific point of friction that makes me still prefer tactile buttons for cars - ideally somewhere you don't have to take your hands off the wheel or your eyes off the road. If voice command got really good, that would probably change for me.
Touchscreens are great for phones, but I'm really not a fan of them in cars where I prefer the tactile feedback of knowing what button is under my finger.
A lot of it just comes down to having the right tool for the right usecase, really.
Yeah but honestly we all know LLMs are different then say some chess ai.
You can thank social media for dumbing down a human technological milestone in artificial intelligence. I bet if there was social media around when we landed on the moon you’d get a lot of self important people rolling their eyes at the whole thing too.
What about computer characters in video games? They're usually controlled by a system that everyone calls AI, but there's almost never any machine learning involved. What about fitting a line to a curve, i.e. linear regression? Definitely ML, but most people don't call it AI.
I would say people call NPCs "AI" only for historical reasons. Typically handwritten algorithms are not called AI nowadays.
I'm not sure, but I think statistical regression already specifies a simple model (e.g. a linear one), while the approach with neural networks doesn't make such strong assumptions about the model.
I actually think linear regression isn't usually called ML in practice. Anyway, I meant something like neural networks.
I think we are in the middle of a steep S-curve of technology innovation. It is far from plateauing and there are still a bunch of major innovations that are likely to shift things even further. Interesting time and these companies are riding a wild wave. It is likely some will actually win big, but most will die - similar to previous technology revolutions.
The ones that win will win not just on technology, but on talent retention, business relationships/partnerships, deep funding, marketing, etc. The whole package really. Losing is easy, miss out on one of these for a short period of time and you've easily lost.
There is no major moat, except great execution across all dimensions.
"There is, however, one enormous difference that I didn’t think about: You can’t build a cloud vendor overnight. Azure doesn’t have to worry about a few executives leaving and building a worldwide network of data centers in 18 months."
This isn't true at all. There are like 8 of these companies stood up in the last three or four years fueled by massive investment of sovereign funds - mostly the saudi, dubai, northern europe, etc. oil-derived funds - all spending billions of dollars doing exactly that and getting something done.
The real problem is the ROI on AI spending is.. pretty much zero. The commonly asserted use cases are the following:
Chatbots
Developer tools
RAG/search
Not a one of these is going to generate $10 of additional revenue per sollar spent, nor likely even $2. Optimizing your customer services representatives from 8 conversations at once to an average of 12 or 16 is going to save you a whopping $2 per hour per CSR. It just isn't huge money. And RAG has many, many issues with document permissions that make the current approaches bad for enterprises - where the money is - who as a group haven't spent much of anything to even make basic search work.
"The real problem is the ROI on AI spending is.. pretty much zero. The commonly asserted use cases are the following:
Chatbots Developer tools RAG/search"
I agree with you that ROI on _most_ AI spending is indeed poor, but AI is more than LLM's. Alas, what used to be called AI before the onset of the LLM era is not deemed sexy today, even though it can still make very good ROI when it is the appropriate tool for solving a problem.
AI is a term that changes year to year. I don't remember where I heard it but I like that definition that "as soon as computers can do it well it stops becoming AI and just becomes standard tech". Neural Networks were "AI" for a while - but if I use a NN for risk underwriting nobody will call that AI now. It is "just ML" and not exciting. Will AI = LLM forever now? If so what is the next round of advancements called?
While it might be possible for a deep-pocketed organization to spin up a cloud provider overnight, it doesn't mean that people will use it. In general, the switching cost of migrating compute infrastructure from one service to another is much higher than the switching cost of changing the LLM used for inference.
Amazon doesn't need to worry about suddenly losing its entire customer base to Alibaba, Yandex, or Oracle.
Amazon has spent a ton on developing features that lock in users beyond what is fundamentally 'cloud', i.e. the ability to lease computers as a commodity. I have always warned employers about the downside of adopting all that extraneous stuff, with little effect.
> The real problem is the ROI on AI spending is.. pretty much zero.
Companies in user acquisition/growth mode tend to have low internal ROI, but remember both Facebook and Google has the same issue -- then they introduced ads and all was well with their finances. Similar things will happen here.
LLMs are a utility, not a platform, and utility markets exert a downward pressure on pricing. Moreover, it's not obvious that -- once trained models hit the wild -- any one actor has or can develop significant competitive moats that would allow them to escape that price pressure. Beyond that, the digital marginal cost of services needs to be significantly reduced to keep these companies in business, but more efficient models leads to pushing inference out to end-user compute, which hollows out their business model (I assume that Apple dropping out of the OpenAI investment round was partially due to the wildly optimistic valuations involved, partially because they're betting on being able to optimize runtime costs down to iPhone levels).
Basically, I'd argue that LLMs look less like a Web 2.0 social media opportunity and more like Hashicorp or Docker, except with operational expenses running many orders of magnitude higher with costs scaling linearly to revenue.
> LLMs are a utility, not a platform, and utility markets exert a downward pressure on pricing.
I think competition exerts a downward pressure on pricing, not being a utility personally. But I guess I agree with the utility analogy in that there are massively initial upfront costs and then the marginal costs are low.
> more efficient models leads to pushing inference out to end-user compute, which hollows out their business model
Faster CPUs have been coming forever but we keep coming up with ways of keeping them busy. I suspect the same pattern with AI. Thus server-based AI will always be better than local. In the future, I expect to be served by many dozens of persistent agents acting on my behalf (more agents as you go further into the future) and they won't be hosted on my smartphone.
> 10 years ago a decade old computer vs a current one would have made a huge difference.
For running Word or Excel or Node.js server apps, I would agree with you. But this is where new applications come in. Modern PCs with either a GPU or an NPU can run circles around your PC when it comes to running Llama or StableDiffusion locally. Same with regards to high end graphics, old PCs can not do real-time raytracing or upscaling with their lesser capabilities. I personally do rendering via Blender or astrophotography via PixInsight and I need all the cores + memory I can get for that.
Faster PCs make for more opportunities that were not possible earlier. But if you do not change your workloads as the hardware evolves, then you do not need to upgrade.
And on other side, how many more paying users will there be? And of users that known about AI or have tried it are happy to use whatever is free at the moment. Or just whatever is on Google or Bing with add next to it?
Social media is free, subscription services are for stuff you can't get for free easily. But will there actually be similar need for AI? Be it any generation.
So I go to ask an LLM to answer to a question and it starts trying to sell me products I don't want or need? I'll need to filter it's output through a locally run adblock LLM which detects/flags/strips out advertising text before delivering the output. Hmmm.
I still have this pipe-dream of an augmented reality system where you can walk through the grocery store and it will superimpose a warning label over any brand connected to a company on the individual user's shit-list.
For many decades consumers have been told that the magnanimous "Free Market" (multiple definitions) is empowering them to vote with their wallets, however some of those same groups start acting rather suspiciously when faced with the possibility consumers might ever exercise that power.
> I still have this pipe-dream of an augmented reality system where you can walk through the grocery store and it will superimpose a warning label over any brand connected to a company on the individual user's shit-list.
There are boycott apps that let you scan barcodes, e.g.:
> And RAG has many, many issues with document permissions
Why can't these providers access all documents and when answers are prompted, self-censor if the reply has references to documents that the end users do not have access permissions? In fact, I'm pretty sure that's how existing RAGaaS providers are handling document/file permissions.
Would the texts of those documents be part of the LLM training data? If so, there's an reliable way to keep a determined user from fetching stuff back out.
> I think we are in the middle of a steep S-curve of technology innovation
We are? What innovation?
What do we need innovation for? What present societal problems can tech innovation possibly address? Surely none of the big ones, right? So then is it fit to call technological change - 'innovation'?
I'd agree that LLMs improve upon having to read Wikipedia for topics I'm interested in but would investing billions in Wikipedia and organizing human knowledge have produced a better outcome than relying on a magic LLM? Almost certainly, in my mind.
You see, people are pouring billions into LLMs and not Wikipedia not because it is a better product - but because they foresee a possibility of an abusive monopoly and that really excites them.
That's not innovation - that's more of the same anti-social behaviour that makes any meaningful innovation extremely difficult.
In a way, that is in alignment with their founding goal of organizing human knowledge.
An ad is just a piece of data (eg Bob is selling shovels for $10 + shipping), with the additional metadata that someone really wants you to see it (eg Bob paid Google $100 to tell every person who searched for "shovels" that he's selling shovels for $10 + shipping).
Selling ads is organizing human knowledge.
Yes, you could argue using a market system is not the best way to do this, but it is a way to do it.
I'm not sure the Wikipedia example is a strong one as that site has it's own serious problems with "abusive monopolies" in its moderator cliques and biases (as with any social platform).
At least with the current big AI players there is the potential for differentiation through competition.
Unless there is some similar initiative with the Wikipedias, the problem of single supplier dominance is a difficult one to see as the way forward.
I can solve Wikipedia's woes quite easily - Wikipedia should limit itself to math, science, engineering, medicine, physics, chemistry, geography and other disciplines that are not at all in dispute.
Politics, history, religion and other topics of conversation that are matters of opinion, taste and state sponsored propaganda need to be off limits.
Its mission ought to be to provide a PhD level education in all technical fields, not engage in shortening historical events and/or opinions/preferences/beliefs down to a few pages and disputing which pages need to be left in or out. Let fools engage in that task on their own time.
A lot of things that seem like simple facts are actually abstractions based on abstractions based on abstractions. The further up the chain, the more theoretical it gets. This is true of the humanities and science.
Well for medicine, there's alternative medicines like chiropractic and homeopathy. According to some people, these are snake oil and quackery, but many others swear by them, and many healthcare systems even provide these treatments despite the complete lack of scientific backing.
Nonetheless, Microsoft is firing up a nuclear reactor to power a new data center. My money is in the energy sector right now. Obvious boom coming with solar, nuclear and AI.
Yeah, I moved into a vanguard energy ETF. Some of this is already priced in (which makes me whince a little when buying) but the actual upside hasn't even hit yet, which will bring in a rush of activity on energy stocks. I have noticed over the decades that energy companies always do very well, especially in the long run.
I was looking for renewables investments (particularly solar) but there is just too much uncertainty and the companies are so small.
The car wasn't a horse that was better, but a car has not changed drastically since they went mainstream.
They've gotten better, more efficient, loaded with tech, but are still roughly 4 seats, 4 doors, 4 wheels, driven by petroleum.
I know that this is a massive oversimplification, but I think we have seen the "shape" of LLMs\Gen AI\AI products already and it's all incremental improvements from here on out with more specialization.
We are going to have SUVs, sports cars, and single seater cars, not flying cars. AI will be made more fit for purpose for more people to use, but isn't going to replace people outright in their jobs.
Feels like someone might have said this in 1981 about personal computers.
"We've pretty much seen their shape. The IBM PC isn't fundamentally very different from the Apple II. Probably it's just all incremental improvements from here on out."
I would agree with your counter if it weren't for the realities of power usage, hardware constraints, evident diminishing returns on training larger models, and as always the fact that AI is still looking for the problem it solves, aside from mass employment.
Computers solved a tangible problem in every area of life, AI is being forced everywhere and is arguably failing to make a big gain in areas that it should excel.
I think the big game changer in the PC space was graphics cards, but since their introduction, it has all been incremental improvement -- at first, pretty fast, then... slower. Much like CPU improvements, although those started earlier.
I can't think of a point where the next generation of PCs was astoundingly different from the prior one... just better. It used to be that they were reliably faster or more capable almost every year, now the rate of improvements is almost negligible. (Yes, graphics are getting better, but not very fast if you aren't near the high end.)
Smartphones and tablets aren't PC replacements, they're TV replacements. Even we, deep into the throes of the smartphone age, still need to use real computers to do the tasks we were using them for in 1995, e.g. programming and word processing.
The fact that computing went through yet another phase transition with mobile is pretty much undisputed. Battery tech, energy efficient chips, screens, radios, solid-state storage, etc. Of course it's not the same as a desktop/notebook, because it's optimized for being in our pockets/hands not put on a desk. (But the compute power is there, plug in peripherals and many phones easily beat a desktop even from ~10-15 years ago.)
It remains to be seen where LLMs and this crop of ML tech can go in the garden of forking paths, and whether they can reach truly interesting places.
Smartphones let us do some tasks we couldn't do at all in 1995, such as GPS turn-by-turn navigation.
Sure, they also do other tasks we were already doing in 1995: mobile telephony, mobile TV watching (we had portable handheld TVs back then--they were awful but they worked), mobile music listening (Sony Walkman), easy-to-use photography, but there's some things that we couldn't do at all before this technology became mainstream, like mobile banking.
GPS turn-by-turn wasn't commonly available in the consumer space in 1995, but the first successful consumer turn-by-turn GPS came out 1998 with the Garmin StreetPilot. So close to 1995 but not quite there.
Still though, GPS navigation was definitely a common thing pre-iPhone. I remember being gifted a cheap one a few years before the iPhone came out. We didn't need smartphones to do GPS navigation. TomTom came out in 2004, three years before the iPhone launched and four years before Android had Google Maps with Turn by Turn navigation.
You need smartphones to do GPS navigation in the modern sense, which is:
1) look for e.g. "Italian restaurant" in your area
2) look at choices nearby, screen out ones currently closed, too expensive, bad reviews, etc., and pick one
3) navigate there
In the early days of GPS, you needed an actual address to navigate to. That's not very useful if you don't know where you want to go in the first place. Smartphones changed all that: now you don't need to know a specific place you want to go, you just need to know what you want to do generally, and the nav app will help you find where exactly you want to go and take you there. That's impossible without either an internet connection or a very large local database (that's out of date).
The POI database was largely the big selling point of TomTom though.
And the Magellan GPS I had pre-iPhone had quite a POI database as well. I think it had monthly updates available online. I could search "Blockbuster" or "gas station" or "public parking" or "hotel" and it would know locations. Obviously it wasn't making dinner recommendations but it did have a lot of restaurants in it.
Also you specifically called out turn by turn. Knowing the one off holiday hours of a hole in the wall restaurant isn't necessary for turn by turn GPS.
I would say smartphones and tablets are new devices, not, per se, PCs.
Smartphones did bring something huge to the table, and exploded accordingly. But here we are, less than 2 decades from their introduction and... the pace of improvement has gone back to being incremental at best.
They're, at a minimum, PC replacements in the sense that I'm currently on the Internet and posting on a message board from my bathtub instead of needing to go over to my desk to do that.
the difference is a personal computer is a vaguely-shaped thing that has a few extremely broad properties and there is tremendous room inside that definition for things to change shape, grow, and improve. processor architecture, board design, even the methods of powering the machine can change and you still meet the basic definition of "a computer."
for better or worse, people saying AI in any capacity right now are referring to current-generation generative AI, and more specifically diffusion image generation and LLMs. that's a very specific category of narrowly-defined technologies that don't have a lot of room to grow based on the way that they function, and research seems to be bearing out that we're starting to reach peak functionality and are now just pushing for efficiency. for them to improve dramatically or suddenly and radically change would require so many innovations or discoveries that they are unrecognizable.
what you're doing is more akin to looking at a horse and going "i forsee this will last forever because maybe someday someone will invent a car, which is basically the same thing." it's not. the limitations of horses are a feature of their biology and you are going to see diminishing returns on selectively breeding as you start to max out the capabilities of the horse's overall design, and while there certainly will be innovations in transportation in the future, the horse is not going to be a part of them.
And that's more or less true? By 1989 certainly, we had word processors, spreadsheets, email, BBSes. We have better versions of everything today, but fundamentally, yes, the "shape" of the personal computer was firmly established by the late 1980s.
Anyone comfortable with MS-DOS Shell would not be totally lost on a modern desktop.
Yes, you can say all of them are Turing machines... but now those Turing machines are very fast and in your pocket and beat you at chess, etc. The UI/UX has changed humans.
> The IBM PC isn't fundamentally very different from the Apple II. Probably it's just all incremental improvements from here on out.
Honestly? I'm not sure I even disagree with that!
Amazon is just an incremental improvement over mail order catalogs. Netflix is just an incremental improvement over Blockbusters. UberEats is just an incremental improvement over calling a pizzeria. Google Sheets is just an incremental improvement over Lotus 1-2-3.
Most of the stuff we're doing these days could have been done with 1980s technology - if a bit less polished. Even the Cloud is just renting a time slice on a centralized mainframe.
With the IBM PC it was already reasonably clear what the computer was going to be. Most of the innovation since then is "X, but on a computer", "Y, but over the internet", or just plain market saturation. I can only think of two truly world-changing innovations: 1) smartphones, and 2) social media.
The current AI wave is definitely producing interesting results, but there is still a massive gap between what it can do and what we have been promised. Considering current models have essentially been trained on the entire internet and they have now poisoned the well and made mass gathering of more training data impossible, I doubt we're going to see another two-orders-of-magnitude improvement any time soon. If anything, for a lot of applications it's probably going to get worse as the training set becomes out of date.
And if people aren't willing to pay for the current models, they aren't going to pay for a model which hallucinates 50% less often. They're going to need that two-orders-of-magnitude improvement to actually become world-changing. Taking into account how much money those companies are losing, are they going to survive the 5-10 years or more until they reach that point?
And even smartphone is just combination of with computer, but smaller and over internet. It just got pretty good eventually. https://en.wikipedia.org/wiki/HP_200LX arguably is "smartphone" supporting modem or network connectivity, albeit not wireless...
The big missing thing between both the metaphor in the OP's link and yours is that I just can't fathom any of these companies being able to raise a paying subscriber base that can actually cover the outrageous costs of this tech. It feels like a pipe dream.
Putting aside that I fundamentally don't think AGI is in the tech tree of LLM, if you will, that there's no route from the latter to the former: even if there is, even if it takes, I dunno, ten years: I just don't think ChatGPT is a compelling enough product to fund about $70 billion in research costs. And sure, they aren't having to yet thanks to generous input from various commercial and private interests but like... if this is going to be a stable product at some point, analogous to something like AWS, doesn't it have to... actually make some money?
Like sure, I use ChatGPT now. I use the free version on their website and I have some fun with AI dungeon and occasionally use generative fill in Photoshop. I paid for AI dungeon (for awhile, until I realized their free models actually work better for how I like to play) but am now on the free version. I don't pay for ChatGPT's advanced models, because nothing I've seen in the trial makes it more compelling an offering than the free version. Adobe Firefly came to me free as an addon to my creative cloud subscription, but like, if Adobe increased the price, I'm not going to pay for it. I use it because they effectively gave it to me for free with my existing purchase. And I've played with Copilot a bit too, but honestly found it more annoying than useful and I'm certainly not paying for that either.
And I realize I am not everyone and obviously there are people out there paying for it (I know a few in fact!) but is there enough of those people ready to swipe cards for... fancy autocomplete? Text generation? Like... this stuff is neat. And that's about where I put it for myself: "it's neat." OpenAI supposedly has 3.9 million subscribers right now, and if those people had to foot that 7 billion annual spend to continue development, that's about $150 a month. This product has to get a LOT, LOT better before I personally am ready to drop a tenth of that, let alone that much.
And I realize this is all back-of-napkin math here but still: the expenses of these AI companies seem so completely out of step with anything approaching an actual paying user base, so hilariously outstripping even the investment they're getting from other established tech companies, that it makes me wonder how this is ever, ever going to make so much as a dime for all these investors.
In contrast, I never had a similar question about cars, or AWS. The pitch of AWS makes perfect sense: you get a server to use on the internet for whatever purpose, and you don't have to build the thing, you don't need to handle HVAC or space, you don't need a last-mile internet connection to maintain, and if you need more compute or storage or whatever, you move a slider instead of having to pop a case open and install a new hard drive. That's absolutely a win and people will pay for it. Who's paying for AI and why?
> The big missing thing between both the metaphor in the OP's link and yours is that I just can't fathom any of these companies being able to raise a paying subscriber base that can actually cover the outrageous costs of this tech. It feels like a pipe dream.
I suspect these companies will introduce ads at some point similar to Google and Facebook for similar reasons, and it will be highly profitable.
I mean that's quite an assertion given how the value of existing digital ad space is already cratering and users are more than ever in open rebellion against ad supported services. And besides which, isn't the whole selling point of AI to be an agent that accesses the internet and filters out the bullshit? So what, you're going to do that, then add your own bullshit to the output?
I agree with everything you said in your previous post. The LLM as a search engine might basically eat Google's lunch in the same way the internet ate cable tv's lunch. Cable tv was a terrible product due to the advertising, and you could escape the BS by watching stuff online. Now look where we are.
Fun fact: Cable itself was originally the ad-free "premium" version of over-the-air television! And it eventually added some ads and then more ads and now we're at present day, where you get roughly 8 minutes of ads per 22 minutes of content.
I noticed visiting a hotel recently (like fuck I pay for cable!) that their cable service had sped up the intro of Law and Order SVU, presumably to make room for more advertisements.
>I noticed visiting a hotel recently (like fuck I pay for cable!) that their cable service had sped up the intro of Law and Order SVU, presumably to make room for more advertisements.
That's pretty smart of them! It makes sense to try to increase the time for ads, and decrease the time for the TV show; maybe next they can just cut the credits, or cut out parts of each act in the show. There's no reason not to. After all, the only people still paying for cable TV subscriptions are elderly people who absolutely refuse to give up their reruns of decades-old shows and also refuse to learn how to use Hulu, so they'll happily pay $150/month so they can watch 70s shows with even worse video quality than TVs in the 70s had.
> Cable itself was originally the ad-free "premium" version of over-the-air television
Often repeated but not true. Cable was originally just all the ad-supported OTA stations delivered over a wire. It had ads on day 1. It would be over a decade before the first premium cable channels would launch, and even then most of the first channels also had ads.
> I fundamentally don't think AGI is in the tech tree of LLM
There is a lot of hype and investor money running on the idea that if you make a really good text prediction engine, it will usefully impersonate a reasoning AI.
Even worse is the hype that a extraordinarily good text prediction engine will usefully impersonate a reasoning AI that usefully impersonating a specialized AI.
The car was very much a horse that was better though. It has replaced the horse ( or other draught animal ) and that's basically it. I'm not even sure it has brought fundamentally new and different use cases.
It did. It changed the entire way society operates and how and where we live by greatly increasing the speed of travel and the ability to transport goods (without laying rail everywhere).
Training took hundreds of thousands of years. Everyone just gets a foundation model to fine tune for another couple of decades before it can start flipping burgers.
Kind of feels like the ride-sharing early days. Lots of capital being plowed into a handful of companies to grab market share. Economics don't really make sense in the short term because the vast majority of cash flows are still far in the future (Zero to One).
In the end the best funded company, Uber, is now the most valuable (~$150B). Lyft, the second best funded, is 30x smaller. Are there any other serious ride sharing companies left? None I know of, at least in the US (international scene could be different).
I don't know how the AI rush will work out, but I'd bet there will be some winners and that the best capitalized will have a strong advantage. Big difference this time is that established tech giants are in the race, so I don't know if there will be a startup or Google at the top of the heap.
I also think that there could be more opportunities for differentiation in this market. Internet models will only get you so far and proprietary data will become more important potentially leading to knowledge/capability specialization by provider. We already see some differentiation based on coding, math, creativity, context length, tool use, etc.
Uber is not really a tech company though - its moat is not technology but market domination. If it, along with all of its competitors were to disappear tomorrow, the power vacuum would be filled in very short order, as the core technology is not very hard to master.
It's a fundamentally different beast from AI companies.
How is it not a tech company? They're literally trying to approximate TSP in the way that makes them money. In addition, they're constantly optimizing for surge pricing to maximize ROI. What kind of problems do you think those are?
> I call this scale the Zombie Apocalypse Scale. It is a measure of how many days a company could run for it all its employees were turned to zombies overnight. The more days you can operate without any humans the more of a tech company you are.
If someone would be able to solve TSP and surge pricing better than Uber, do you think they'd be able to dethrone them?
The reason surge pricing exists and works the way it is isn't some capitalist efficiency mechanism - in fact it's quite the opposite - it's algorithmic price fixing enabled by Uber's quasi-monopoly, and it is the very thing that enables Uber to exist in the first place as a very profitable company.
A scenario in which competition exists would drive the margins of drive sharing apps to the ground, making it a highly unprofitable exercise. I suspect investors knew this and secretly colluded to promote Uber to become a monopoly. It has very little to do with technology.
> it's algorithmic price fixing enabled by Uber's quasi-monopoly, and it is the very thing that enables Uber to exist in the first place as a very profitable company.
If this was true, they'd have the surge pricing set for always.
Is Uber profitable already or are they waiting for another order of magnitude increase in scale before they bother with that?
Amazon is the poster-child of that mentality. It spent more than it earned into growth for more than 20 years, got a monopoly on retail, and still isn't the most profitable retail company around.
Uber the company was profitable last year, for the first time[1].
But I am doubtful that the larger enterprise that is Uber (including all the drivers and their expenses and vehicle depreciation, etc) was profitable. I haven't seen that analysis.
This article, and all the articles like it, are missing most of the puzzle.
Models don’t just compete on capability. Over the last year we’ve seen models and vendors differentiate along a number of lines in addition to capability:
- Safety
- UX
- Multi-modality
- Reliability
- Embeddability
And much more. Customers care about capability, but that’s like saying car owners care about horsepower — it’s a part of the choice but not the only piece.
One somewhat obsessive customer here: I pay for and use Claude, ChatGPT, Gemini, Perplexity, and one or two others.
The UX differences among the models are indeed becoming clearer and more important. Claude’s Artifacts and Projects are really handy as is ChatGPT’s Advanced Voice mode. Perplexity is great when I need a summary of recent events. Google isn’t charging for it yet, but NotebookLM is very useful in its own way as well.
When I test the underlying models directly, it’s hard for me to be sure which is better for my purposes. But those add-on features make a clear differentiation between the providers, and I can easily see consumers choosing one or another based on them.
I haven’t been following recent developments in the companies’ APIs, but I imagine that they are trying to differentiate themselves there as well.
To me, the vast majority of "consumers" as in B2C only care about price, specifically free. Pro and enterprise customers may be more focused on the capabilities you listed, but the B2C crowd is vastly in the free tier only space when it comes to GenAI.
This is like when VCs were funding all kinds of ride share, bike share, food delivery, cannabis delivery, and burning money so everyone gets subsidized stuff while the market figures out wtf is going on.
Yep, you will probably lose. The VCs aren't out there to advance the technology. They are there to lay down bets on who's going to be the winner. "Winner" has little to do with quality, and rides much more on being the one that just happens to resonate with people.
The ones without money will usually lose because they get less opportunity to get in front of eyeballs. Occasionally they manage it anyway, because despite the myth that the VCs love to tell, they aren't really great at finding and promulgating the best tech.
> when VCs were funding all kinds of ride share, bike share, food delivery, cannabis delivery, and burning money so everyone gets subsidized stuff while the market figures out wtf is going on
I’m reminded of slime molds solving mazes [1]. In essence, VC allows entrepreneurs to explore the solution space aggressively. Once solutions are found, resources are trimmed.
VC is good for high-risk, capital-intensive, scalable bets. The high risk and scalability cancel out, thereby leaving the core of finance: lending to enable economics of scale.
Plenty of entrepreneurship is low to moderate risk, bootstrappable and/or unscalable. That describes where VC is a raw deal. It also does not describe AI.
I'm already keeping an eye on what NVidia gets into next... because that will inevitably be the "Next big thing". This is the third(ish) round of this pattern that I can recall, I'm probably wrong about the exact count, but NVidia is really good at figuring out how to be powering the "Next big thing". So alternatively... I should probably invest in the utilities powering whatever Datacenters are using the powerhungry monsters at the center of it all.
One thing I'm not clear on is how much of this is cause and how much effect: that is, does NVidia cheerleading for something make it more popular with the tech press and then everyone else too? There are definitely large parts of the tech press that serve more as stenographers than as skeptical reporters, and so I'm not sure how much is NVidia picking the right next big thing and how much is NVidia announcing the next big thing to the rest of us?
That's exactly the short term thinking they're hoping they can use to distract.
Tech companies purchased television away from legacy media companies and added (1) unskippable ads, (2) surveillance, (3) censorship and revocation of media you don't physically own, and now they're testing (4) ads while shows are paused.
Where I live the ridesharing/delivering startups didn't bring goodies, they just made everything worse.
They destroyed the Taxi industry, I used to be able to just walk out to the taxi rank and get in the first taxi, but not anymore. Now I have to organize it on an app or with a phone call to a robot, then wait for the car to arrive, and finally I have to find the car among all the others that other people called.
Food delivery used to be done by the restaurants own delivery staff, it was fast, reliable and often free if ordering for 2+ people. Now it always costs extra, and there are even more fees if I want the food while it's still hot. Zero care is taken with the delivery, food/drinks are not kept upright and can be a total mess on arrival. Sometimes it's escaped the container and is just in the plastic bag. I have ended up preferring to go pickup food myself over getting it delivered, even when I have a migraine, it's just gone to shit.
I assume you are talking about airports. Guess what, they still exist in many places. And on the other hand, for US, other than a few big cities, the "normal" taxi experience is that you call a number and maybe a taxi shows up in half an hour. With Uber, that becomes 10 minutes or less, with live map updates. Give me that and I'll be happy to forget about Uber.
If drivers are actually driving people around and earning money, potentially getting more income in the end, with a trade-off of inconvenience for riders to wait a few minutes for the next ride to arrive, do you consider that positive or negative? Sounds like a much more efficient model to me.
taxis were the greatest example of regulatory capture. the post-event/airport Uber pickup situation is stupid and has obvious fixes, but, again, that's the taxicab regulatory capture where Uber has to thread a needle in order to not be a taxi, for them to legally operate. if we could clean slate, and make a working system, that would be great but we can't, because of the taxicab regulatory commission.
I can now summon a cab from the comfort of the phone I'm holding, and know that they'll accept my credit card. I know the price before I get in and I know the route they should take. I'm not going to get taken for an unnecessary scenic tourist surcharge detour.
people don't like feeling they got cheated, and pre-uber, taxis did that all the time.
For where I live (Asia), I disagree with both of these examples.
Getting a taxi was awful before ride-sharing apps. You'd have to walk to a taxi stop, or wait on the side of the road and hope you could hail one. Once the ride-sharing apps came in, suddenly getting a ride became a lot simpler. Our taxi companies are still alive, though they have their own apps now -- something that wouldn't have happened without competition -- and they also work together with the ride-hailing companies as a provider. You could still hail taxis or get them from stops too, though that isn't recommended given that they might try to run the meter by taking a longer route.
For food delivery, before the apps, most places didn't deliver food. Nowadays, more places deliver. Even if a place already had their own delivery drivers, they didn't get rid of them. We get a choice, to use the app or to use the restaurant's own delivery. Usually the app is better for smaller meals since it has a lower minimum order amount, but the restaurant provides faster delivery for bigger orders.
Add the abuse of gig workers, expansion of the toxic tipping culture, increase in job count but reduction in pay, concentration of wealth in fewer hands.
These rideshare and delivery companies are disgusting and terrible.
At least OAI and MSFT offer "copyright shields" for enterprise customers.
Say what you want about the morals associated to said offerings, but it sounds very interesting for companies that may want to use a plagiarizing machine for copying GPL code while covering their asses in some way (at least in terms of cost).
The biggest problem that I have with AI is how people extrapolate out of thin air, going from "LLMs can craft responses that sound exactly like a human" to "LLMs can solve climate change and world hunger". Those are two orthogonal skills and nothing we've seen so far indicates that LLMs would be able to make that jump, any more than I expect someone with a PhD in linguistics to solve problems faced by someone with a PhD in applied physics working on solving nuclear fusion.
The expectation for LLMs "solving global warming" is effectively believing that if you could just ask enough people that vaguely know of the research out there to guess how to solve it someone would get the answer right, and they'd get it right better than anyone that has dedicated their life to researching it.
Yeah, the issue with climate change and world hunger has never been the ability to generate loads and loads of vaguely relevant text in a very short time. It also isn't a problem of being able to recognize patterns in data, as you would with machine learning.
They're issues of socially and politically organizing how humans and resources and economies interact, and there is nothing about the current crop of AI that suggests they can help with that in any particular capacity. If anything the genAI hunger for energy and chips is kinda looking like a net negative in terms of the first problem.
I don't even really get why the hypemen are pushing these huge global issues as motivators - surely that insane standard of Unrealised Value only highlights how limited the abilities of this technology actually are?
> I don't even really get why the hypemen are pushing these huge global issues as motivators - surely that insane standard of Unrealised Value only highlights how limited the abilities of this technology actually are?
Sam Altman is either seriously drunk on his own success, or he's a snake-oil salesman: https://ia.samaltman.com/ (Could be both; they are not mutually exclusive)
It seems very difficult to build a moat around a product when the product is supposed to be a generally capable tool and the input is English text. The more truly generally intelligent these models get the more interchangeable they become. It's too easy to swap one out for another.
Humans are the ultimate generally intelligent agents available on this planet. Even though most of them (us) are replaceable for mundane tasks, quite some are unique enough so that people seek their particular services and no one else's. And this is among the pool of about eight billion such agents.
> Even though most of them (us) are replaceable for mundane tasks, quite some are unique enough so that people seek their particular services and no one else's.
Very few people manage that - indeed I can't think of anyone. Even movie stars get replaced with other movie stars if they try to charge too much. Certainly everyone in the tech industry (including the CEOs, the VCs, the investors etc.) has a viable substitute.
It's even more relevant to AI, because the differences between the models (not just training data) may make them pretty uniquely suited to some areas, with any competition being markedly worse (in that area), or at least markedly different. And you only have maybe dozens, not billions, to choose from.
The moat is/will be the virtuous cycle of feeding user usage back into the model like it is for Google. That historically has been a powerful tool and it's something thats nearly impossible to get as a newcomer to the marketplace.
I see 2 paths:
- Consumers - the Google way: search and advertise to consumers
- Businesses - the AWS way: attrack businesses to use your API and lock them in
The first is fickle. Will OpenAI become the door to the Internet? You'll need people to stop using Google Search and rely on ChatGPT for that to happen. Will become a commodity. Short term you can charge a subscription but long term will most likely become a commondity with advertising.
The second is tangible. My company is plugged directly to the OpenAI API. We build on it. Still very early and not so robust. But getting better and cheaper and faster over time. Active development. No reason to switch to something else as long as OpenAI leads the pack.
20 years ago people asked that exact question. E-Commerce emerged. People knew the physical process of buying things would move online. Took some time. Sure, more things emerged but monetizing the Internet still remains about selling you something.
Assuming AI progress continues, AI could replace both Microsoft's biggest product, OS, and Google's biggest product, search and ads. And there is a huge tail end of things autonomous driving/flying, drug discovery, robotics, programming, healthcare etc.
Too vague. How would it replace Windows? How would it replace search?
The latter is more believable to me, but how would the AI-enhanced version generate the additional revenue that its costs require? And I think a lot of improvement would be required... people are going to be annoyed by things like hallucinations while trying to buy products.
In reality, as soon as a competitor shows up, Google will add whatever copycat features it needs to search. So it isn't clear to me that search is a market that can be won, short of antitrust making changes.
You saw "Her" or Iron man? That's how it could replace windows. Basically entire OS working in natural language. Imagine a great human personal assistant who operates computer for you.
And searchGPT could replace Google. Also, wasn't the point of genAI is that it is cheaper than the entire stack of search? At least I know for some recommendation GPT-4 is literally cheaper than many companies in house models and I know companies who saved money using GenAI.
Not saying any of these would likely happen, but still it is not in the fantasy realm.
> people are going to be annoyed by things like hallucinations
That's like saying people are going to be annoyed by no delivery in online shopping. Yes it happened more often earlier, but we are arguing more on the ideal case if it could get solved. That's why I said if AI progress is good in my message, which means we solve hallucination etc.
OK, then I'm pretty sure it won't happen. I don't want to have a personal assistant replace an OS. I don't even want to talk to Alexa. And I'm not alone...
I meant human personal assistant as I was clear in previous post. If you are given human personal assistant for free who could replace your screen usage, would you be open to replace your OS with it?
Also we are just talking on theoretical level if AI is able to imitate human assistant(which I personally give 10% chance of happening, but not out of realm of possibility).
My guess would be using "AI" to increase/enhance sales with your existing processes. Pay for this product, get 20% increased sales, ad revenue, yada yada.
Sure it does. Ask any common mortal about AI and they'll mention ChatGPT - not Claude, Gemini or whatever else. They might not even know OpenAI. But they do know ChatGPT.
Has it become a verb yet? Waiting to peole to replace "I googled how to..." with "I chatgpted how to...".
You’re moving the goalposts a little here. In your other post you implied you were using OpenAI for its technical properties. “But getting better and cheaper and faster over time.”
Whether something has more name recognition isn’t completely related here. But if that’s what you really mean, as you state, “any common mortal about AI and they'll mention ChatGPT - not Claude, Gemini or whatever else. They might not even know OpenAI. But they do know ChatGPT,” then I mostly agree, but as an outsider it doesn’t seem like this is a robust reason to build on top of.
OpenAI's sole focus is serving AI to consumers and businesses. I trust them more to remain backwards compatible over time.
Google changed their AI name multiple times. I've built on them before and they end of lifed the product I was using. Zero confidence Gemini will be there tomorrow.
There would need to be significant capabilities that openai doesn't have or wouldn't be built on a short-ish timeline to have the enterprise switch. There's tons of bureaucratic work going on behind the scenes to approve a new vendor.
I am wondering where exactly are the AI products. I am not talking about boring, nerd stuff but rather about products that are well-known (to the point where non-technical users have heard of them) and are used on a daily basis by a significant amount of people. Currently only ChatGPT fits this bill despite the fact that the "AI revolution" is nearly two years old at this point.
> I am wondering where exactly are the AI products
ChatGPT is most popular and often in news, because it is the first of its kind(after siri/cortana) which was accessible to general public. I think, I have seen at least 100+ indie and commercial wrappers/apps offering access to better chat interface and common place to access various models from OAI/Anthropic/Mistral. At least one business I read about this weekend uses OAI API to ingest and summarize extensive mountain of documents to simplify some sort of regulatory certification process for medical device manufacturers. There was another one, which allowed a virtual assistant with basic activity ability(book appointments, write emails, transcribe phone calls, summarize letters etc.)
Things will come eventually, as more and more people get access to the APIs and access becomes cheap(current model of per-token price is still immensely expensive imho) and general people do not understand what context is, what embedding is, what exactly is a model, what is an API.
AI was/is being used massively in medical sector, defense/military, fraud detection(finance) and various other sectors for decades or more. Unfortunately, those applications are behind closed door and do not need to drum up public interest or investments directly as those are already well financed and usually referred to as ML/DL/CV etc.
For example, I was doing an internship many many years back where my employer was using "AI"(though it was not referred to as such) to combine vision, noise, vibration, pressure and hundreds of other data to detect hit-n-run, break-in crimes, structural damage, prediction of collapse of large structures etc. and these were prominently used by insurance as well as some public service offices. These were specialized things, and outside of their closed doors, no one was interested about these.
The old chestnut about AI just being a term for things we haven't quite figured out yet might apply here. "Products that are well-known... and are used on a daily basis by a significant amount of people" are almost by definition not AI.
But here are some examples of things that used to fall under the AI umbrella but don't really anymore:
- Fulltext search with decent semantic hit ranking (Google)
- Fulltext search with word sense disambiguation (Google)
- Fulltext search with decent synonym hits (Google)
- Machine translation
- Text to speech
- Speech to text
- Automated biometric identification (Like for unlocking your phone)
If you're more specifically asking for everyday applications of GPT-style generative large language models, I don't think that's going to happen for cost reasons. These things are still far too expensive for use in everyday consumer products. There's ChatGPT, but it's kind of an open secret that OpenAI is hemorrhaging money on ChatGPT.
The point is that you will not see them. They will be used to generate products and services that you think will come from a human. It may be the drive-through window or tech support line that sounds like a real person. It will be the background in the next Marvel movie that was animated by an AI algorithm rather than a graphics designer. It will be the website that generates a bespoke price quote aimed at the maximum price you are willing to pay for a service or product.
I am pretty sure we will see communication devices to surpass any language barriers within the next few years. The technology is already there, but not accessible to the broad masses in a fast and convenient enough way. Already, people are using their phones to break language barriers (Google lens, translation websites), but this will become much more common.
It is used for a while in a lot of places that does not market itself as "AI". Tiktok, instagram, faceapp, facebook, netflix, google ads, youtube, etc... where you can see any complex image/video filters or targeted suggestions there is likely some AI in the background.
Personally, I think the more interesting question is: if not AI then what? I doubt we will go back to Metaverse (lol) or Crypto, B2B SaaS seems safe but stagnant (and competing over a largely fixed pie of IT spend). It feels like it doesn't matter if the companies work-- until there is something shinier to invest in it will be the defacto choice for LPs chasing returns (and VCs chasing carry).
How about moving away from the unicorn model into creating small, sustainable and focused companies that don't explode, but provide sustainable income.
Lol that sounds good to me, but I'm not the one making those decisions. To be clear, my comment was a prediction based on what I see as the primary motivations of the people with the money. I'm definitely not endorsing this model.
How about improving the software, both the development process and the final product? We hear claims of upcoming AGI, but most everyday software still feels embarrassingly broken.
>Therefore, if you are OpenAI, Anthropic, or another AI vendor, you have two choices. Your first is to spend enormous amounts of money to stay ahead of the market. This seems very risky though
Regarding point 6, Amazon invested heavily in building data centers across the US to enhance customer service and maintain a competitive edge. it was risky.
This strategic move resulted in a significant surplus of computing power, which Amazon successfully monetized. In fact, it became the company's largest profit generator.
After all, startups and businesses is all about taking risk, ain't it?
No, this an incorrect analogy on most points. Amazon retail marketplaces are absolutely not correlated with distributed data centers. Most markets are served out of a single cluster, which is likely to be in a different country; eg historically all of Europe from dublin, the americas from virginia, and asia pacific from seattle/portland.
The move of the retail marketplace hosting from seattle to virginia happened alongside, and continued long after, the start of AWS.
It is an utter myth, not promoted by Amazon, that AWS was some scheme to use “surplus computing power” from retail hosting/operations. It was an intentional business case to get in to a different B2B market as a service provider.
Don't mixup LLM with AI. Not every AI company works on top of LLM's, many are doing vision or robotics or even old-school AI.
Our system works, is AI, is profitable, doing vision. Vision scales. There's a little bit of LLM classification. And robotics also, but this part is not really AI, just a generic industry robot.
LLM intelligence has plateaued for the past 12 months. Open source models are catching up while being 20x smaller than the original gpt4 (gemma2/llama3.2/qwen2.5).
AI companies are promising AGI to investors to survive a few more years before they probably collapse and don't deliver on that promise.
LLMs are now a commodity. It's time for startups to build meaningful products with it!
The title reminds me of this classic paper, "If It Works, It's Not AI: A Commercial Look at Artificial Intelligence Startups": https://dspace.mit.edu/handle/1721.1/80558
> If the proprietary models stop moving forward, the open source ones will quickly close the gap.
This is the Red Queen hypothesis in evolution. You have to keep running faster just to stay in place.
On it's face, this does seem like a sound argument that all the $$ following LLMs is irrational:
1. No matter how many billions you pour into your model, you're only ever, say, six months away from a competitor building a model that's just about as good. And so you already know you're going to need to spend an increased number of billions next year.
2. Like the gambler who tries to beat the house by doubling his bet each time, at some point there must be a number where that many billions is considered irrational by everybody.
3. Therefore it seems irrational to start putting in even the fewer billions of dollars now, knowing the above two points.
> it seems irrational to start putting in even the fewer billions of dollars now, knowing the above two points
This doesn’t follow. One, there are cash flows you can extract in the interim—standing in place is potentially lucrative.
And two, we don’t know if the curve continues until infinity or asymptotes. If it asymptotes, being the first at the asymptote means owning a profitable market. If it doesn’t, you’re going to get AGI.
Side note: bought but haven’t yet read The Red Queen. Believe it was a comment of yours that lead me to it.
Why would being the first at the (horizontal) asymptote mean owning a profitable market? In six months, somebody else is at the asymptote as well. If different models are mostly interchangable quality-wise, then the decision will mostly be made on price.
For a market to be both winner-take-all as well as lucrative, you need some kind of a feedback cycle from network effets, economies of scale, or maybe even really brutal lock-in. For example, for operating systems applications provide a network effect -- more apps means an OS will do better, a higher install base for an OS means it'll attract more apps. For social networks it's users.
One could imagine this being the case for LLMs; e.g. that LLMs with a lot of users can improve faster. But if the asymptote has been reached, then by definition there's no more improvement to be had, and none of this matters.
Excellent point, but now consider the availability of open source models as well. They’re not as good as the frontier models, but at some point they get close enough, right?
I run Llama 3.x locally on my 5+ year old GPU. It’s pretty decent. It’s not as smart as GPT-4 or Claude, but it’s good enough for a lot of what I do, and most importantly there’s no cap on the number of prompts I can send in a given timeframe.
For many consumers, it’s fine to be a little bit behind the cutting edge. My smartphone is 2 models behind now. It’s fast enough, the camera is good enough, and the battery is holding up just fine. I don’t feel any need to fork over $1,000 for a new one right now.
I think the same phenomenon will become more common for AI enthusiasts. It’s hard to beat free.
And what happens to the economics of building frontier models then? I suspect a lot of air is going to be released from the AI bubble in the next couple of years.
> In six months, somebody else is at the asymptote as well
But you were there first. That gives you all of a first mover’s advantages plus the profits from that period to set the terms of the market sharing.
None of this requires winner-takes-all economics. And given OpenAI’s revenues, it’s unclear they need to be the sole winner to make an investment thesis work.
The first mover advantage isn't as big an advantage as being the one to monopolize a commodity. While there are some historical advantages to first mover's doing this, it is the common case. I also think that is very nearly a winner take all sort of thing, or at least a very API providers will still matters, if this asymptotes.
Compare this to toilet paper or fuel, if AI asymptotes and everyone catches up in 6 months it will be be just like these. There are some minor differences in quality, but if a customer can buy something good enough to keep their ass clean or their car powered then they will tend to pick the cheapest option.
Sometimes it is subterfuge like ms stealing the OS market from IBM or there might be a state sponsored AI lab like so many state run industries pushing out competition, but there are a ton of ways to beat first movers.
If there is no niche, first-mover is useless. But if there is value to be had, it’s valuable. Fighting to tread water in a moving stream with the expectation the flow will slow isn’t irrational.
Are you claiming there is no Niche to making commodities? I haven't really considered it I could see an argument being made either way. Or are you saying that it's a product these API based llms are just not going to be how we use these tools? Or are you saying something else entirely?
Ok, so let's say I was there first. But now you're offering a product that's (by definition) just as good for a lower price. Why will the users stick to my product rather than paying your cheaper prices?
As for your second point, the profits from the first six months are going to be basically nothing compared to the years of massive investments needed to make this happen. Why would those profits give any kind of sustainable leverage?
Obviously there's a huge technical change waiting in the wings because we don't need billions of dollars to make a human. Nor does a human need hundreds of kilowatts of electricity to think.
Apple might emerge as one of the winners here, despite being one of the last to come out with an LLM, because it already has the infrastructure, or delivery system, for hundreds of millions of users to interact with an LLM. Google has a similar foothold worldwide, but its major weakness that it's cannibilizing its Search cash cow (how do you make $ from ads with LLMs?), whereas in Apple's case it enhances its product (finally a smart Siri). OpenAI doesn't have that except through Microsoft.
Microsoft's been trying to ram essentially that down my throat for the better part of a year now, and it's mostly convinced me that the answer is "no". I don't want to have arbitrary conversations with my computer.
I still just want the same thing I've been wanting from my digital assistant for 30 years now: fewer "eat up Martha" moments, and handling more intents so that I can ask "When does the next east-bound bus come?" and it stops answering questions like "Will it rain today?" as if I had asked "Is it raining right now?". None of those are particularly appropriate problems for a GPT-style model.
LLMs only cannibalize search if people use it as a replacement. I'm not seeing this happening at scale, instead they can be used to complement it by providing summaries, doing translations etc.
If people no longer click on sponsored links because they are getting a reply to their query in the summary paragraph at the top, Google no longer makes money from search. That's the cannibalization.
One thing I have found myself wondering is why this didn't play out similarly with Google in the early days.
I guess their secret sauce was just so good and so secret that neither established players nor copycat startups were able to replicate it, the way it happened with ChatGPT? Why is the same not the case here, is it just because the whole LLM thing grew out of a relatively open research culture where the fundamentals are widely known? OTOH PageRank was also published before the founding of Google.
I'd be curious to hear if anyone has theories or insight here.
Being "data driven" wasn't much of a thing yet back then. I mean, of course, there was massive energy around the web... but your average non-web corpo still had no idea. Of course there was pets.com or whatever dumb dotcom bubble shit, but I think noone saw data and digitalization as an absolutely essential part of making profits, or at least grabbing money, like we do now.
Conversely, the 2020s has execs and other rich individuals doomscrolling LinkedIn and thinking that if they don't invest in the latest crypto/quantum/genai crap, they're missing out on a vital element of retaining competitive edge.
ChatGPT benefits from network effects, where user feedback on the quality of its answers helps improve the model over time. This reduces its reliance on external services like ScaleAI, lowering development costs.
Larger user base = increased feedback = improved quality of answers = moat
Companies in the business of building models are forced to innovate on two things at once.
1. Training the next generation of models
2. Providing worldwide scalable infrastructure to serve those models (ideally at a profit)
It's hard enough to accomplish #1, without worrying about competing against the hyperscalers on #2. I think we'll see large licensing deals (similar to Anthropic + AWS, OpenAI + Azure) as one of the primary income sources for the model providers.
With the second (and higher margin) being user facing subscriptions. Right now 70% of OpenAI's revenue comes from chatgpt + enterprise gpt. I imagine Anthropic is similar, given the amount of investment in their generative UI. At the end of the day, model providers might just be consumer companies.
The competition for big LLM AI companies is not other big LLM AI companies, but rather small LLM AI companies with good enough models. This is a classic innovator dilemma.
For example, I can imagine a team of cardiologists creating a fine tune LLM model.
The cardiologist checks the ECG, compare with the LLM results and checks the difference. If it can reduce error rate by like 10%, that's already really good.
My current stance on LLM is that it's good for stuff which is painful to generate, but easy to check (for you). It's easier/faster to read an email than to write it. If you're a domain expert, you can check the output, and so on.
The danger is in using it for stuff you cannot easily check, or trusting it implicitly because it is usually working.
> trusting it implicitly because it is usually working
I think this danger is understated. Humans are really prone to developing expectations based on past observations and then not thinking very critically about or paying attention to those things once those expectations are established. This is why "self driving" cars that work most of the time but demand that the driver remain attentive and prepared to take over are such a bad idea.
> The cardiologist checks the ECG, compare with the LLM results and checks the difference.
Perhaps you're confusing the acronym LLM (Large Language Model) with ML (Machine Learning)?
Analyzing electrocardiogram waveform data using a text-predictor LLM doesn't make sense: No matter how much someone invests in tweaking it to give semi-plausible results part of the time, it's fundamentally the wrong tool/algorithm for the job.
I like the article and agree with many of the arguments. A few comments though.
1. It's not that easy to switch between providers. There's no lock-in of course, but once you build a bunch of code that is provider specific (structured outputs, prompt caching, json mode, function calls, prompts designed for a specific provider, specific tools used by the openai Assistant, etc) then you need a good reason to switch (like a much better or cheaper model)
2. All of these companies do try to build some echo system around them, esp in the enterprise. The problem is that Google and Microsoft have a huge advantage here cause they have all the integrations
3. The consumer side. It's not just LLM. It's image, video, voice, and many more. You cannot ignore that ChatGPT can rival Google in a few years in terms of usage. As long as they can deliver good models, users are not going to switch so quickly. It's a huge market, just like Google. Pretty much, everyone in the world is going to use ChatGPT or some alternative in the next few years. My 9 year old and her friends already use it. No reason why they cannot monetize their huge user base like Google did.
Having been there for the dotcom boom and bust from pre-IPO Netscape to the brutal collapse of the market, it’s hard to say dotcoms don’t work. There was clearly something there of immense value, but it took a lot experimentation with business models and maturation of technology as well as the fundamental communications infrastructure of the planet. All told it feels like we’ve really only gained a smooth groove in the last 10 years.
I see no reason why AI will be particularly different. It seems difficult to make the case AI is useless, but it’s also not particularly mature with respect to fundamental models, tool chains, business models, even infrastructure.
In both cases speculative capital flowed into the entire industry, which brought us losers like pets.com but winners like Amazon.com, Netflix.com, Google.com, etc. Which of the AI companies today are the next generation of winners and losers? Who knows. And when the music stops will there be a massive reckoning? I hope not, but it’s always possible. It probably depends on how fast we converge to “what works,” how many grifters there are, how sophisticated equity investors are (and they are much more sophisticated now than they were in 1997), etc.
> it’s burning through $7 billion a year to fund research and new A.I. services and hire more employees
And at some point one of these companies will reach the point it does not need as many employees. And has a model capable of efficiently incorporating new learning without having to reset and relearn from scratch.
That is what AGI is.
Computing resources for inference and incremental learning will still be needed, but when the AGI itself is managing all/much of that, including continuing to find efficiencies, ... profitably might be unprecedented.
The speed of advance over the last two decades has been steady and exponential. There are not many (or any) credible signals that a technical wall is about to be encountered.
Which is why I believe that I, I by myself, might get there. Sort of, kind of, probably not, probably just kidding. Myself.
--
Another reason companies are spending billions is to defend their existing valuations. Google's value could go to zero if they don't keep up. Other companies likewise.
It is the new high stakes ante for large informational/social service relevance.
> What, then, is an LLM vendor’s moat? Brand? Inertia? A better set of applications built on top of their core models? An ever-growing bonfire of cash that keeps its models a nose ahead of a hundred competitors?
Missed one I think... the expertise accumulated in building the prior generation models, that are not themselves that useful anymore.
Yes, it's true that will be lost if everybody leaves, a point he briefly mentions in the article. But presumably AWS would also be in trouble, sooner or later, if everybody who knows how things work left. Retaining at least some good employees is tablestakes for any successful company long-term.
Brand and inertia also don't quite capture the customer lock-in that happens with these models. It's not just that you have to rewrite the code to interface with a competitor's LLM; it's that that LLM might now behave very differently than the one you were using earlier, and give you unexpected (and undesirable) results.
They eventually have to turn a profit or pass the hot potato. Maybe they’ll be the next generation of oligarchs supported by the state when they all go bankrupt but are too essential to fail.
My guess is that profitability is becoming increasingly difficult and nobody knows how yet… or whether it will be possible.
Seems like the concentration of capital is forcing the tech industry to take wilder and more ludicrous bets each year.
I don't really buy the "cost of hardware" portion of the argument, even if everything else seems sound. In 1992 if you wanted 3D graphics, you'd call up SGI and drop $25,000 on an Indigo 2. 6 years later, you could walk into Circuit City and buy a Voodoo2 for 300 bucks, slap it in the PC you already owned, and call it a day.
I know we aren't in the 90s. I know that the cost of successive process nodes has grown exponentially, even when normalizing for inflation. But, still. I'd be wary of betting the farm on AI being eternally confined to giant, expensive special-purpose hardware.
This stuff is going to get crammed into a little special purpose chip dangling off your phone's CPU. Either that, or GPU compute will become so commodified that it'll be a cheap throw-in for any given VPS.
This stuff is going to get crammed through an algorithm orders of magnitude more efficient. We have living proof that it does not take 5GW to make a brain.
The mistake in that article is the assumption that these companies collecting those gigantic VC funding rounds are looking to stay ahead of the pack and be there even 10 years down the road.
That's a fundamental misunderstanding of the (especially) US startup culture in the last maybe 10-20 years. Only very rarely is the goal of the founders and angel investors to build an actual sustainable business.
In most cases the goal is to build enough perceived value by wild growth financed by VC money & by fueling hype that an subsequent IPO will let the founders and initial investors recoup their investment + get some profit on top. Or, find someone to acquire the company before it reaches the end of its financial runway.
And then let the poor schmucks who bought the business hold the bag (and foot the bill). Nobody cares if the company becomes irrelevant or even goes under at that point anymore - everyone who did has has recouped their expense already. If the company stays afloat - great, that's a bonus but not required.
I do wonder if they become another category like voice assistants where people just expect to get them free as part of an existing ecosystem.
Or search/social media where people are happy to pay $0 to use it in exchange for ads.
Sure some people are paying now, but its nowhere near the cost of operating these models, let alone developing them.
Also the economics may not accrue to the parts of the stack people think. What if the model is commodity and the real benefits accrue to the GOOG/AAPL/MSFT of the world that integrate models, or to the orgs that gatekept their proprietary data properly and now can charge for querying it?
I think the article contradicts itself, missing some simple math. Contradicting points:
1. it takes huge and increasing costs to build newer models, models approached asymptote
2. a startup can take an open-source model and get you out of business in 18 months (with CocaCola example)
The size of LLM is what protects them from being attacked by startups. Microsoft's operating profit in 2022 was $72B, which is 10x bigger than the running cost of OpenAI. And if 2022 was too successful, profits of $44B still dwarf OpenAI.
If OpenAI manages to ramp up investment like Uber, it may stay alive, otherwise it's tech giants that can afford running some LLM. ...if people will be willing to pay for this level of quality (well, if you integrate it into MS Word, they actually may want it).
So riddle me this - Netflix’s revenue is around $9.5B. So a little more than the running costs of OpenAI.
Do you think it is plausible that they will be able to get enough people to pay for OpenAI that Netflix? They would need much more revenue than Netflix to make it viable. Considering that there are other options available, like Google’s, MSFT’s, Anthropic’s, etc.
As a business model, this is all very suspect. GPT5 has had multiple delays and challenges. What if “marginally better” is all we are going to get?
Two models to address how/why/when AI companies make sense:
(1) High integration (read: switching) costs: any deployment of real value is carefully tested and tuned for the use-case (support for product x, etc.). The use cases typically don't evolve that much, so there's little benefit to re-incurring the cost for new models. Hence, customers stay on old technology. This is the rule rather than the exception e.g., in medical software.
(2) The Instagram model: it was valuable with a tiny number of people because they built technology to do one thing wanted by a slice of the market that was very interesting to the big players. The potential of the market set the time value of the delay in trying to replicate their technology, at some risk of being a laggard to a new/expanding segment. The technology gave them a momentary head start when it mattered most.
Both cases point to good product-market fit based on transaction cost economics, which leads me to the "YC hypothesis":
The AI infrastructure company that best identifies and helps the AI integration companies with good product-market fit will be the enduring leader.
If an AI company's developer support consist of API credits and online tutorials about REST API's, it's a no-go. Instead, like YC and VC's, it should have a partner model: partners use considerable domain skills to build relationships with companies to help them succeed, and partners are selected and supported in accordance with the results of their portfolio.
The partner model is also great for attracting and keeping the best emerging talent. Instead of years of labor per startup or elbowing your way through bureaucracies, who wouldn't prefer to advise a cohort of the best prospects and share their successes? Unlike startup's or FAANG, you're rewarded not for execution or loyalty, but for intelligence in matching market needs.
So the question is not whether the economics of broadcast large models work, but who will gain the enduring advantage in supporting AI eating the software that eats the world?
#2 is dead wrong, and shows that the author is not aware of the current exciting research happening in parameter efficient fine-tuning or representation/activation engineering space.
The idea that you need huge amounts of compute to innovate in a world of model merging and activation engineering shows a failure of imagination, not a failure to have the necessary resources.
PyReft, Golden Gate Claude (Steering/Control Vectors), Orthogonalization/Abliteration, and the hundreds of thousands of Lora and other adapters available on websites like civit.ai is proof that the author doesn't know what they're talking about re: point #2.
And I'm not even talking about the massive software/hardware improvements we are seeing for training/inference performance. I don't even need that, I just need evidence that we can massively improve off the shelf models with almost no compute resources, which I have.
> If a competitor puts out a better model than yours, people can switch to theirs by updating a few lines of code.
This may become increasingly the case as models get smarter, but it’s often not the case right now. It’s more likely to be a few lines of code, a bunch of testing, and then a bunch of prompt tweaking iterations. Even within a vendor and model name, it’s a good idea to lock to a specific version so that you don’t wake up to a bunch of surprise breakages when the next version has different quirks.
> you probably don’t want to stake your business on always being the first company to find the next breakthrough
that's like a better half of the entire Apple business model that brought them success, find the next hot thing (like a capacitive touchscreen or high density displays), make exclusive deals with hardware providers so you can release a device using that new tech and feed on it for some time till the the upstream starts leaking tech left and right and you competition can finally catch up
Did Apple work when they launched the iPhone without the App store? It's a very similar question. There is this obsession to talk about tangible business value (and that its nowehere to be seen) despite OpenAI setting record for daily users. Right now it is a consumer product and will take time before we understand how to organize business around them. It took us about 20 years to get from the hype of .com bubble to tech giants.
Just like supermarkets know what products you are buying, LLM inference providers what requests you are making. And just like supermarkets list the most profitable products and then clone them, LLM inference providers could come up with their own versions of the most profitable products based on LLMs.
My prediction is that the thin layer built on top of LLMs will be eaten up starting from the most profitable products.
By using inference APIs you are doing their market research for free.
This period of model scaling at all cost is going to be a major black eye on the industry in a couple years. We already know that language models are few shot learners at inference time, and yet OpenAI seems to be happy throwing petaflops of compute training models the slow way.
The question is how can you use in-context learning to optimize the model weights. It’s a fun math problem and it certainly won’t take a billion dollar super computer to solve it.
>The market needs to be irrational for you to stay solvent.
it's a telling quote he chose: if you think AI is over invested, you should short AI companies, and that's where the quote comes from, problem is, even if you're right the market can stay irrational longer than you can afford to hold your short.
I think a selling point for LLMs would be to match you with people that you find perfect for that use case. For example a team for a job, a wife, real friends, clubs for sharing hobbies, for finding the best people mastering something you want to accomplish. Unfortunately we and LLMs don't know how to match people in that way.
The title should be “Do LLM building companies work?”. The article fails to address companies that will be using LLMs or companies innovating on other models/architectures.
I don’t think most people looking to build an AI company want to build an LLM and call it a company.
>2) pushing the frontier further out will likely get more difficult.
The upside risk is premised on this point. It'll get so cost prohibitive to build frontier models that only 2-3 players will be left standing (to monetize).
Yes, AI companies can only work if they all somehow agree to slow things down a little bit instead of competing to release a better model like every month.
It is ironic that this article seems to focus on the business logic, considering that is the same myopia at these AI companies.
Not that physical/financial constraints are unimportant, but they often can be mitigated in other ways.
Some background: I was previously at one of these companies that got hoovered up in the past couple years by the bigs. My job was sort of squishy, but it could be summarized as 'brand manager' insofar as it was my job to aide in shaping the actual tone, behaviors, and personality of our particular product.
I tell you this because in full disclosure, I see the world through the product/marketing lens as opposed to the engineer lens.
They did not get it.
And by they I mean founds whose names you've heard of, people with absolute LOADS of experience in building a shipping technology products. There were no technical or budgetary constraints at this early stage, we were moving fast and trying shit. But they simply could not understand why we needed to differentiate and how that'd make us more competitive.
I imagine many technology companies go through this, and I don't blame technical founders who are paranoid about this stuff; it sounds like 'management bullshit' and a lot of it is, but at some point all organizations who break even or take on investors are going to be answerable to the market, and that means leaving no stone unturned in acquiring users and new revenue streams.
All of that to say, I do think a lot of these AI companies have yet to realize that there's a lot to be done user experience-wise. The interface alone - a text prompt(!?) is crazy out-of-touch to me. The fact that average users have no idea how to set up a good prompt and how hard everyone is making it for them to learn about that.
All of these decisions are pretty clearly made by someone who is technology-oriented, not user-oriented. There's no work I'm aware of being done on tone, or personality frameworks, or linguistics, or characterization.
Is the LLM high on numeracy? Is it doing code switching/matching, and should it? How is it qualifying its answers by way of accuracy in a way that aids the user learning how to prompt for improved accuracy? What about humor or style?
It just completely flew over everyone's heads. This may have been my fault. But I do think that the constraints you see to growth and durability of these companies will come down to how they're able to build a moat using strategies that don't require $$$$ and that cannot be easily replicated by competition.
Nobody is sitting in the seats at Macworld stomping their feet for Sam Altman. A big part of that is giving customers more than specs or fiddly features.
These companies need to start building a brand fast.
I don't get the point of the author. At one point he's saying because the race to get bigger and better will never end, we'll need ever larger compute and more and more and more, so in the end, the companies will fail.
I don't see it this way. In plumbing I could have chosen to use 4" pipe throughout my house. I chose 3". Heck, I could have purchased commercial pipe that's 12", or even 36". It would have changed a lot of the design of my foundation.
Just because there is something much bigger and can handle a lot more poop, doesn't mean it's going to be useful for everyone.
I get the sense that the value prop of LLMs should first be cut into two categories: coding assistant, and everything else.
LLMs as coding assistants seem to be great. Let’s say that every working programmer will need an account and will pay $10/month (or their employer will).. what’s a fair comp for valuation? GitHub? That’s about $10Bn. Atlassian? $50Bn
The “everything else” bin is hard to pin down. There are some clear automation opportunities in legal, HR/hiring, customer service, and a few other fields - things that feel like $1-$10Bn opportunities.
Sure, the costs are atrocious, but what’s the revenue story?
> I get the sense that the value prop of LLMs should first be cut into two categories: coding assistant, and everything else.
Replace coding assistant with artists and you have the vibe of AI 2 years ago.
The issue is that these models are easy to make (if expensive) so the open source community (of which many, maybe most, are programmers themselves) will likely eat up any performance moat given enough time.
This story already played out with AI art. Nothing beats SD and comfyUI if you really need high quality and control.
Enjoyed the article and thought many of the points were good.
Here's a counterargument.
> In other words, the billions that AWS spent on building data centers is a lasting defense. The billions that OpenAI spent on building prior versions of GPT is not, because better versions of it are already available for free on Github.
The money that OpenAI spends on renting GPUs to build the next model is not what builds the moat. The moat comes from the money/energy/expertise that OpenAI spends on the research and software development. Their main asset is not the current best model GPT-4; it is the evolving codebase that will be able to churn out GPT-5 and GPT-6. This is easy to miss because the platform can only churn out each model when combined with billions of dollars of GPU spend, but focusing on the GPU spend misses the point.
We're no longer talking about a thousand line PyTorch file with a global variable NUM_GPUs that makes everything better. OpenAI and competitors are constantly discovering and integrating improvements across the stack.
The right comparison is not OpenAI vs. AWS, it's OpenAI vs. Google. Google's search moat is not its compute cluster where it stores its index of the web. Its moat is the software system that incorporates tens of thousands of small improvements over the last 20 years. And similar to search, if an LLM is 15% better than the competitors, it has a good shot at capturing 80%+ of the market. (I don't have any interest in messing around with a less capable model if a clearly better one exists.)
Google was in some sense "lucky" that when they were beginning to pioneer search algorithms, the hardware (compute cluster) itself was not a solved problem the way it is today with AWS. So they had a multidimensional moat from the get-go, which probably slowed early competition until they had built up years' worth of process complexity to deter new entrants.
Whereas LLM competition is currently extremely fierce for a few reasons: NLP was a ripe academic field with a history of publishing and open source, VC funding environment is very favorable, and cloud compute is a mature product offering. Which explains why there is currently a proliferation of relatively similar LLM systems:
> Every LLM vendor is eighteen months from dead.
But the ramp-up time for competitors is only short right now because the whole business model (pretrain massive transformers -> RLHF -> chatbot interface) was only discovered 18 months ago (ChatGPT launched at the end of 2022) - and at that point all of the research ideas were published. By definition, the length of a process complexity moat can't exceed how long the incumbent has been in business! In five years, it won't be possible to raise a billion dollars and create a state of the art LLM system, because OpenAI and Anthropic will have been iterating on their systems continuously. Defections of senior researchers will hurt, and can speed up competitor ramp-time slightly, but over time a higher proportion of accumulated insights is stored in the software system rather than the minds of individual researchers.
Let me emphasize: the billions of dollars of GPU spend is a distraction; we focus on it because it is tangible and quantifiable, and it can feel good to be dismissive and say "they're only winning because they have tons of money to simply scale up models." That is a very partial view. There is a tremendous amount of incremental research going on - no longer published in academic journals - that has the potential to form a process complexity moat in a large and relatively winner-take-all market.
The infrastructure hardware and software is a commodity. Any real moat comes from access to data. I think we've seen data close up quite a bit since people realized that you can train LLMs with it, so I don't know that OpenAI's data access is better than when they trained GPT 4. In fact, it's probably worse unless they've cut independent deals with massive data providers.
Inference costs many many times less than training. I think we may end up with one more round of training a larger foundational model, and then it's going to have to wait for new, cheaper, more efficient hardware. It's quite possible that all of these first mover companies go bankrupt, but the models will still exist generating value.
The question is too broad. What's an AI company? It could be anything. The particular sub class here that is implied is companies that are spending many billions to develop LLMs.
The business model for those is to produce amazing LLM models that are hard to re-create unless you have similar resources and then make money providing access, licensing, etc.
What are those resources? Compute, data, and time. And money. You can compensate for lack of time by throwing compute at the problem or less/more data. Which is a different way of saying: spend more money. So, it's no surprise that this space is dominated by trillion dollar companies with near infinite budgets and a small set of silicon valley VC backed companies that are getting multi billion dollar investments.
So the real question is whether these companies have enough of a moat to defend their multi billion dollar investments. The answer seems to be no. For three reasons: hardware keeps getting cheaper, software keeps getting better, and using the models is a lot cheaper than creating them.
Creating GPT-3 was astronomically expensive a few years ago and now it is a lot cheaper by a few orders of magnitude. GPT-3 is of course obsolete now. But I'm running Llama 3.2 on my laptop and it's not that bad in comparison. That only took 2 years.
Large scale language model creation is becoming a race to the bottom. The software is mostly open source and shared by the community. There is a lot of experimentation happening but mostly the successful algorithms, strategies, and designs are quickly copied by others. To the point where most of these companies don't even try to keep this a secret anymore.
So that means new, really expensive LLMs have a short shelf life where competitors struggle to replicate the success and then the hardware gets cheaper and others run better algorithms against whatever data they have. Combine that with freely distributed models and the ability to run them on cheap infrastructure and you end up with a moat that isn't that hard to cross.
IMHO all of the value is in what people do with these models. Not necessarily in the models. They are enablers. Very expensive ones. Perhaps a good analogy is the value of Intel vs. that of Microsoft. Microsoft made software that ran on Intel chips. Intel just made the chips. And then other chip manufacturers came along. Chips are a commodity now. Intel is worth a lot less than MS. And MS is but a tiny portion of the software economy. All the value is in software. And a lot of that software is OSS. Even MS uses Linux now.
the hype will never die. all the smartest people in industry and government believe that there is a very high probability that this technology is near the edge of starting the AGI landslide. you dont need AGI to start the AGI landslide, you just need AI tools that are smart enough to automate the process of discovering and building the first AGI models. every conceivable heuristic indicates that we are near the edge. and because of this, AI has now become a matter of national security. the research and investment wont stop, because it cant, because it is now an arms race. this wont just fizzle out. it will be probed and investigated to absolute exhaustion before anyone feels safe enough to stop participating in the race. if you have been keeping up you will know that high level federal bureaucrats are now directly involved in openAI.
I am not among those smartest, so take my opinion with a mountain of salt. But I'm just not convinced that this is going in the direction of AGI.
The recent advances are truly jaw dropping. It absolutely merits being investigated to the hilt. There is a very good chance that it will end up being a net profit.
But intuitively they don't feel to me like they're getting more human. If anything I feel like the recent round of "get it to reason aloud" is the opposite of what makes "general intelligence" a thing. The vast majority of human behavior isn't reasoned, aloud or otherwise.
It'll be super cool if I'm wrong and we're just one algorithm or extra data set or scale factor of CPUs away from It, whatever It turns out to be. My intuition here isn't worth much. But I wouldn't be surprised if it was right despite that.
Did you criticize the turing test as being meaningless before it was easily passed by LLMs? if not i don't see how you can avoid updating on "this is getting more human" or at least "this is getting closer to intelligence" to avoid the human-bias
I never gave much thought to the Turing test one way or the other. It never struck me as especially informative.
I've always been more interested in the non-verbal aspects of human intelligence. I believe that "true AGI", whatever that is, is likely to be able to mimic sentient but non-verbal species. I'd like to see an AGI do what a dog or cat does.
LLMs are quite astonishing at mimicking something humans specifically do, the most "rational" parts of our brain. But they seem to jump past the basic, non-rational parts of the brain. And I don't think we'll see it as "true AGI" until it does that -- whatever that is.
I'm reminded of the early AI researchers who taught AI to play chess because it's what smart people do, but it couldn't do any of the things dumb people do. I think the biggest question right now is whether our present techniques are a misleading local maximum, or if we're on the right slope and just need to keep climbing.
> I never gave much thought to the Turing test one way or the other
then how are you qualified to even discuss this? shouldn't you have to have actually had a thought about something before you pretend to know something about it? i mean its pretty clear from your comment that you havent thought about AI for more than a year or two
I don't know. I know that any particular thing I say could be met with an LLM that did that thing. But at the moment an LLM doesn't seem to be capable of coding itself up that way. Yet.
I guess that was kind of my question when I thought about what you said. I could think of any things that a cat or dog do that _current_ LLMs can't do but nothing that seems fundamentally out of reach
Possibly, but as of now it's a completely unsolved problem and to my knowledge nobody has shown even a tiny model being able to perform it.
Based on the top page today I may even be able to make the argument we can't even simulate the abilities of a fruit fly.
The absolute frontier models can perform only a fraction of a fraction of what a typical work day looks like for a human. I calculated the chance that a frontier model today has about a 1^-29 chance of performing a single day of connected tasks based on GAIA benchmarks.
What does it mean for something to "act on its own"? Every action taken is in some part due to what's happening in the environment right? I would argue that nothing "acts on it's own"
itll be super cool huh? no, it wont be super cool. AGI will cause the total and complete destabilization of the world and with almost mathematical certainty it can be said that it will lead to a global war. a war fueled by AGI is the last war humans will ever witness. AGI is something that everyone should be very afraid of. unfortunately, the only experience that most people have with AI is watching star trek, playing halo, and consuming other media that depict sentient machines in a benign and very positive light. dont be fooled. the machines will not be your friends. its an outcome that must be avoided at all costs.
We talk about scaling laws, superintelligence, AGI etc. But there is another threshold - the ability for humans to leverage super-intelligence. It's just incredibly hard to innovate on products that fully leverage superintelligence.
At some point, AI needs to connect with the real world to deliver economically valuable output. The ratelimiting step is there. Not smarter models.
In my mind, already with GPT-4, we're not generating ideas fast enough on how best to leverage it.
Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
We're seeing much less of "it's making mistakes" these days.
If we have open-source models that match up to GPT-4 on AWS / Azure etc, not much point to go with players like OpenAI / Anthropic who may have even smarter models. We can't even use the dumber models fully.
reply