While technical AI and LLMs are not something I’m well versed in. So as I sit on the sidelines and see the current proliferation of AI startups I’m starting to wonder where the moats are outside of access to raw computing power. Open AI seemed to have a massive lead in this space but that lead seems to be shrinking every day.
You hit the nail on the head. Companies are scrambling for an edge. Not a real edge, an edge to convince investors to keep giving them money. Perplexity is going all in on convincing VCs it can create a "data flywheel".
Perhaps I've missed something, but where will the infinite amounts of training data come from, for future improvements?
If these models will be trained on the outputs of themselves (and other models), then it's not so much a "flywheel", as it is a Perpetual Motion Machine.
Perplexity has a dubious idea based around harvesting user chats -> making service better -> getting more user prompts. I am quite unconvinced that user prompts and stored chats will materially improve an LLM that is trained on a trillion high quality tokens.
The second idea being kicked around is synthetic data will create a new fountain of youth for data that will also fix its reasoning abilities.
Where are they going to get that data? Everything on the open web after 2023 is polluted with lowquality AI slop that poisons the data sets. My prediction: Aggressive dragnet surveillance of users. As in, Google recording your phone calls on Android, Windows sending screen recordings from Recall to OpenAI, Meta training off Whatsapp messages... It sounds dystopian, but the Line Must Go Up.
> Everything on the open web after 2023 is polluted with lowquality AI slop that poisons the data sets.
Not even close to everything.
E.g. training on the NY Times and Wikipedia has zero meaningful AI. Training on books from reputable publishers similarly has zero meaningful AI. Any LLM usage was to polish prose or assist with research or whatever, but shouldn't affect the factual quality in any significant way.
The web hasn't been polluted with AI any more than e-mail has been polluted with spam. Which is to say it's there, but it's also entirely viable to separate. Nobody's worried that the group email chain with friends is being overrun with spam or with AI.
The Wikipedia part at least, is incorrect. Currently Wikipedia mods/admins are dealing with AI generated articles being uploaded.
As for NYT - I am assuming that lots of those stories are already available in some blog or the other.
The e-mail and web forums are 100% polluted with spam, which takes constant effort to remove. For GenAI based content, it is far harder to identify and remove.
This example assumes the effort required to keep the web functional can deal with AI created content. Speaking from experience, our filters (human and otherwise) cannot. They fail to do so even now.
PS: Even given your example of closed email chains - the information in that depends on sources people read. Like plastic pollution in the food chain, this is inescapable.
> Currently Wikipedia mods/admins are dealing with AI generated articles being uploaded.
And they've always dealt with spam and low-quality submissions before. The system is working.
> As for NYT - I am assuming that lots of those stories are already available in some blog or the other.
I don't know what relevance that has to what we're talking about. The point is, train on the NYT. Blogs don't change what's on the NYT.
> The e-mail and web forums are 100% polluted with spam, which takes constant effort to remove.
They've always been polluted with low-quality content. So yes, either don't train on them, or only train on highly upvoted solutions, etc.
AI pollution isn't fundamentally any different from previous low-quality content and spam. It's not terribly difficult to determine which parts of the internet are known to be high-quality and train only on those. LLM's can't spam the NY Times.
> The counter point is that NYT content is already in the training data
That's not a counter point. My point is, train on things like the NYT, not random blogs. You can also whitelist the blogs you know are written by people, rather than randomly spidering the whole internet.
Also, no -- most of the NYT hasn't been copied into blogs. A small proportion of top articles, maybe.
> Highly upvoted messages on reddit are very regular bots copying older top comments.
What does that matter if the older top comment was written by a person? Also, Reddit is not somewhere you want to train in the first place if you're trying to generate a model where factual accuracy matters.
> Verification does not scale, while generation scales.
You don't need to verify everything -- you just need to verify enough stuff to train a model on. We're always going to have plenty of stuff that's sufficiently verified, whether from newspapers or Wikipedia or whitelisted blogs or books from verified publishers or whatever. It's not a problem.
You shouldn't be training on blogspam from random untrusted domains in the first place. So it doesn't matter if that junk is AI-generated or not.
>What does that matter if the older top comment was written by a person?
That is the entire issue? LLMs fail when they are trained on GenAI based content?
> Also, Reddit is not somewhere you want to train in the first place if you're trying to generate a model where factual accuracy matters.
There is no model that can create facutal accuracy. This would basically contravene the laws of physics. LLMs predict the next token.
>You shouldn't be training on blogspam from random untrusted domains in the first place. So it doesn't matter if that junk is AI-generated or not
Afaik, all the current models are trained on this corpus. That is how they work.
> There is no model that can create facutal accuracy.
Factual accuracy is not binary, it is a matter of degrees. Obviously training on content that is more factually correct will result in more factually correct next tokens. This is a pretty fundamental aspect of LLM's.
> Afaik, all the current models are trained on this corpus.
Then apologies for being so blunt, but you know wrong. There is a tremendous amount of work that goes on by the LLM companies in verifying, sanitizing, and structuring the training corpuses, using a wide array of techniques. The are absolutely not just throwing in blogspam and hoping for the best.
Thank you for being blunt. Let me attempt to speak in the same earnest tone.
You are contradicting the papers and work that the people who make the models are saying. Alternatively, you are looking at the dataset curation process with rose tinted glasses.
>There is a tremendous amount of work that goes on by the LLM companies in verifying, sanitizing, and structuring the training corpuses, using a wide array of techniques.
Common crawl is instrumental in building our models, 60% of GPT's training data was Common Crawl. (https://arxiv.org/pdf/2005.14165) pg 9.
CC in turn was never intended for LLM training, this misalignment in goals results in downstream issues like hate speech, NYT content, copyrighted content and more getting used to train models.
TLDR: 'NYT' and other high quality content has largely been ingested by models. Reddit and other sources play a large part in training current models.
While I appreciate your being blunt, this also means not being sharp and incisive. Perhaps precision would be required here to clarify your point.
Finally -
>Factual accuracy is not binary, it is a matter of degrees. Obviously training on content that is more factually correct will result in more factually correct next tokens
What. Come on, I think you wouldnt agree with your own statement after reading it once more -Factual correctness is not a matter of degrees.
Furthermore, facts dont automatically create facts. Calcuation, processing, testing and verification create more facts. Just putting facts together creates content.
Re: corpus content, I think we're talking past each other. I'm saying that current models aren't being blindly trained on untrusted blogspam, and that there's a lot of work done to verify, structure, transform, etc. And earlier models were trained with lower-quality content, as companies were trying to figure out how much scale mattered. Now they're paying huge amounts of money to improve the quality of what they ingest, to better shape the quality of output. What they take from Reddit, they're not blindly ingesting every comment from every user. My overall main point stands: we have viable, working, scalable mechanisms to avoid the "pollution" you're worried about.
> What. Come on, I think you wouldnt agree with your own statement after reading it once more -Factual correctness is not a matter of degrees.
Of course it is. An LLM can be correct 30% of the time, 80% of the time, 95% of the time, 99% of the time. If that's not a matter of degrees, I don't know what is. If you're looking for 100% perfection, I think you'll find that not even humans can do that. ;)
It's not about a heuristic on text of unknown provenance -- it's about publishers that exert a certain level of editorial control and quality verification. Or social reputation mechanisms that achieve the same.
That's what is preventing your "model collapse". Reputations of provenance. Not pure-text heuristics.
Would think most quality data is books and news articles and scientific journals. Not crap people are texting each other.
These companies will never admit it but AI is built on the back of piracy archives, easiest way and cheapest way to getting massive amounts of quality data.
A friend and I built a proof-of-concept of using a variation of Latent Semantic Analysis to automatically build up conceptual maps and loadings of individual words against the latent conceptual vectors back in 2000. In exploring what it would take to scale I concluded, like you, that we should use professionally written and edited content like books, news articles and scientific journals as the corpus against which to build up the core knowledge graph.
Twenty-four years later I still regret not being able to raise money to enable us to keep working on that nascent startup. In most ways it was still too early. Google was still burning through VC money at that point and the midwestern investors we had some access to didn't get it. And, honestly they were probably correct. Compute power was still too expensive and quality data sources like published text were mostly locked up and generally not available to harvest.
That entirely depends on what quality you’re going for. If the goal is to simulate passably human conversation, texts and dms are probably more desirable.
I'm really curious if Microsoft will ever give in to the urge to train on private business data - since transitioning office to o365, they hold the world's and even governments word documents and emails. I'm pretty sure they've promised never to touch it but they can certainly read it so... Information wants to be free.
Microsoft "trains" on business data already, but typically for things like fine-tuning security automation and recognizing malicious signals. It sure would be a big step to reading chats and email and feeding them in to a model.
There doesn't seem to be much of a moat. OpenAI, Gemini, Meta, X.Ai and Anthropic all seem to be able to much the same stuff. o1 is novel at the moment but I bet it'll be copied soon.
> Open AI seemed to have a massive lead in this space but that lead seems to be shrinking every day.
The lead is as strong as ever. They are 34 ELO above anyone else in blind testing, and 73 ELO above in coding [1]. They also seem to have artificially constrain the lead as they already have stronger model like o1 which they haven't released. Consistent to the past, they seem to release just <50 ELO above anyone else, and upgrades the model in weeks when someone gets closer.
It's rather amusing that people have said this about OpenAI - that they essentially had no lead - for about two years non-stop.
The moat as usual is extraordinary scale, resources, time. Nobody is putting $10 billion into the 7th OpenAI clone. Big tech isn't aggressively partnering with the 7th OpenAI clone. The door is already shut to that 7th OpenAI clone (they can never succeed or catch-up), there's just an enormous amount of naivety in tech circles about how things work in the real world: I can just spin up a ChatGPT competitor over the weekend on my 5090, therefore OpenAI have no barriers to entry, etc.
HN used to endlessly talk about how Uber could be cloned in a weekend. It's just people talking about something they don't actually understand. They might understand writing code (or similar) and their bias extends from the premise that their thing is the hard part of the equation (writing the code, building an app, is very far from the hardest part of the equation for an Uber).
Completely agree. It's well known that the LMSys arena benchmarks are heavily skewed with bias towards whatever is new and exciting. Meanwhile even OpenAI have acknowledged Sonnet as being a superior coding model.
This is clearly evident to anyone who spends any amount of time working on non-trivial projects with both models.
How can anyone say that the lead is shrinking when no one still has any good competitor to strawberry? Dspy has been out for how long and how many folks have shown better reasoning models than strawberry built with literally anything else? Oh yeah, zero.
Yes, this is 100% tested and proven ad nasum within the field. I have some of my own papers on this, but you can look at literally any major AI conference and find dozens of papers analyzing yet more issues caused by byte pair tokenization.
Honestly the folks who don’t want to admit that it’s tokenization are just extremely salty that AI is actually good right now. Your “AI couldn’t tell me how many Rs in strawberry” stuff is extreme cope for your job prospects evaporating from a system that can’t spell correctly.
But does a different prompt get the answer correct? I find it surprising. Can you share a link? I'm not saying this out of saltiness, I would be very grateful. If you don't want to I will try the shitty Google search, no problem.
I think the main issue with these metrics, which you implicitly highlight, Is that they are not a one size fits all approach. In fact, they are often treated, at least casually, like they are some kind of model fit like an r squared value. Which is maybe a good description narrowly constrained to the task or set of tasks they are being evaluated on for the metric. But the complexity of the user experience combined with the poor sample rate that a person can individually experience leads to conclusions like these. And they are perfectly valid conclusions. If the model doesn’t work for you, why use it? But it also suggests that personal experience cannot be used to decide if the model performs in aggregate well or not. But this doesn’t matter to the individual user or problem space. Because they should of course use whatever works best for them.
Those extremely detailed models they can build of you (digital twin) from all your conversations coupled with intelligent subtle, likable, relatable and very personal artificial salesmen in service of the highest bidder has enourmous financial potential.
Even if the latter becomes commoditized (and we are far from that in practice), the former is a serious mote. Just like their is no secret to building a search engine or a social network platform (and that is not saying their are no technical challanges), operating it profitably requires massive aggregate user profile exploitation potential which requires huge upfront loss leaders.
Data. You want huge amounts of high quality data with a diverse range of topics, writing styles and languages. Everyone seems to balance those requirements a bit differently, and different actors have access to different training data
There is also some moat in the refinement process (rlhf, model "safety" etc)
OpenAI just raised a huge pile of money. I’m sure the most was a consideration during the funding raise. I’m guessing the issues between openAI and Ms are beyond just the moat and competitiveness’s.
Every OpenAI thread focuses on the moat, but surely that was baked into their business dealings of the last 60 days.
> I’m sure the most was a consideration during the funding raise
Ha. Really though, the entirety of venture capital could be summarized as “this probably won’t pay out but if it does it’s gonna be epic”. I wouldn’t read too much into America’s thirstiest capitalists spending their discretionary billions on the latest hype cycle.
I am starting to suspect that LLMs, short term for a few years, will end up mostly having value as assistants to experts in their fields who know how to prompt and evaluate output.
I see an ocean of startups doing things like ‘AI accounting systems’ that scare me a little. I just don’t feel good having LLM based systems making unsupervised important decisions.
I do love designing and writing software with LLMs, but that is supervised activity that save me a ton of time. I also enjoy doing fun things with advanced voice mode for ChatGPT, like practicing speaking in French - again, a supervised activity.
re: ownership of IP: I find the idea hilarious that by OpenAI declaring ‘AGI achieved!’ that Microsoft might get cut out of IP rights it has supposedly paid for.
> I see an ocean of startups doing things like ‘AI accounting systems’ that scare me a little.
I’m a CPA and software engineer currently interviewing around for dev positions, and most of the people I’ve encountered running these companies are neither CPAs nor accountants and have little domain knowledge. It’s a scary combination with an LLM that demands professional skepticism for every word it says. I wouldn’t trust those companies in their current state.
This brings up an interesting point, which is that it's likely that software developers overestimate the capabilities of LLMs in other domains because our own domain is so thoroughly documented on the internet.
From what I've been able to gather interacting with other sectors, it seems like software is pretty unique in having a culture of sharing everything—tools, documentation, best practices, tutorials, blogs—online for free. Most professions can't be picked up by someone learning on their own from nothing but the internet in the way that software can.
I strongly suspect that the result will be that LLMs (being trained on the internet) do substantially better on software related tasks than they do in other domains, but software developers may be largely blind to that difference since they're not experts in the other domain.
The ChatGPT site crossed 3B visits last month (For perspective - https://imgur.com/a/hqE7jia). It has been >2B since May this year and >1.5B since March 2023. The Summer slump of last year ? Completely gone.
Gemini and Character AI ? A few hundred million. Claude ? Doesn't even register. And the gap has only been increasing.
So, "just" brand recognition ? That feels like saying Google "just" has brand recognition over Bing.
ChatGPT usage from the main site dwarfs API Usage for both Open AI and Anthropic so we're not really saying different things here.
The vast majority of people using LLMs just use ChatGPT directly. Anthropic is doing fine for technical or business customers looking to offer LLM services in a wrapper but that doesn't mean they register in the public consciousness.
>Anthropic is doing fine for technical or business customers looking to offer LLM services in a wrapper
If there's an actual business to be found in all this, that's where it's going to be.
The consumer side of this bleeds cash currently and I'm deeply skeptical of enough of the public being convinced to pay subscription fees high enough to cover running costs.
Especially when Google is good enough for most people. Most people just want information not someone to give them digested info at $x per month. All the fancy letter writing assistants they get for free via the corporate computer that likely has Microsoft Word
If inference cost is so cheap and negligible, then we'll be able to run the models on an average computer. Which means they have no business model (assuming generosity from Meta to keep publishing llma for free).
I think they mean running inference. Either more efficient/powerful hardware, or more efficient software.
No one thinks about the cost of a db query any more, but I'm sure people did back in the day (well, I suppose with cloud stuff, now people do need to think about it again haha)
I just used ChatGPT and 2 other similar services for some personal queries. I copy-pasted the same query in all 3 of them, using their free accounts, just in case one answer looks better than the others. I got into this habit because of the latency: in the time it takes for the first service to answer, I've had time to send the query to 2 others, which makes it easier to ignore the first response if it's not satisfying. Usually it's pretty much the same though. We can nitpick about benchmarks, but I'm not sure they're that relevant for most users anyway. It doesn't matter much to me whether something is wrong 10 or 20% of the time, in both cases I can only send queries for which I can easily check that the answer makes sense.
I see other comments mentioning they stopped their ChatGPT Plus subscription because the free versions work well enough. I've never paid myself and it doesn't look like I ever will, because things keep getting better for free anyway. My default workflow is already to prompt several LLMs so one could go down, I wouldn't even notice. I'm sure I'm an outlier with this, but still, people might use Perplexity for their searches, some WhatsApp LLM chatbot for their therapy session, purely based on convenience. There's no lock-in whatsoever into a particular LLM chat interface, and the 3B monthly visits don't seem to make ChatGPT better than its competitors.
And of course as soon as they'll add ads, product placement, latency or any other limitation their competitor doesn't have, I'll stop using them, and keep on using the other N instead. At this point it feels like they need Microsoft more than Microsoft needs them.
They probably lose on each one, but it's the same with their competitors.
FWIW, regular folks now say "let me ask Chat" for what it used to be "let me Google that"; that is a huge cultural shift, and it happened in only a couple years.
> FWIW, regular folks now say "let me ask Chat" for what it used to be "let me Google that"
I have literally never heard that from anyone, and most everyone I know is “regular folk”.
I work in (large scale) construction, and no one has ever said anything even remotely similar. None of my non-technical or technical business contacts.
I’m not saying you haven’t, and that your in-group doesn’t, just that it’s not quite the cultural phenomenon you’re suggesting.
It just so happened to coincide with Google delivering terrible results. I used to be able to find what I wanted but now the top results only loosely correlate with the search. I’m sure it works for most people’s general searches but it doesn’t work for me.
Myspace and Digg dug their own graves though. Myspace had a very confusing UX and Digg gave more control to advertisers. As long as OpenAI dont make huge mistakes they can hold on to their marketshare.
The moat is bigger on MySpace and Digg though since you have user accounts, karma, userbases. The thing with chatbots is I can just as easily move to a different one, I have no history or username or anything and there is no network effect. I don't need all my friends to move to Gemini or Claude, I don't have any friends on OpenAI, it's just a prompt I can get anywhere.
Digg just wasn't big enough. Once these networks get to a certain size they're unkillable. Look at all the turmoil reddit went through, a hated redesign, killed 3rd party apps, a whole protest movement, none of it mattered. People bring up digg and friendster but that was 20 years ago when these networks were way smaller. No top 10 social network has died since then.
Reddit had a much better system for commentary, as opposed to just reacting to URLs.
Sure, you could comment on Digg, but it was a pain and not good for conversations, and that meant there was less to keep people around when it seemed like the company was started to put their finger on the scales for URL-submissions.
It wasn't a pain on Digg, and it was equally good at conversations.
Reddit did not win due to it's features, it won because Digg said it doesn't matter what the users think, we will redesign the site and change how it works regardless of the majority telling us they don't want it.
OpenAI's revenue isn't from advertising, it should be slightly easier for them to resist the call of enshittification this early in the company history.
OpenAI can become a bigger advertising company than Google.
When people ask questions like which product should I buy, ChatGpt can recommend products from companies who are willing to give money to it to have their products recommended by AI.
This will only work if they can ensure the product that they promote is, in fact, good. Google makes it very clear that what you are seeing is popular (or is a paid ad), but they don't endorse it. ChatGPT is seen as an assistant for many, and if they start making bad recommendations, things can go bad fast.
As model performance converges, it becomes the strongest moat. Why go to Claude for a marginally better model when you have the ChatGPT app downloaded and all your chat history there.
I actually pre-emptively deleted ChatGPT and my account recently as I suspect that they're going to start aggressively putting ads and user tracking into the site and apps to build revenue. I also bet that if they do go through with putting ads into the app that daily user numbers will drop sharply - one of ChatGPT's biggest draws is its clean, no-nonsense UX. There are plenty of competitors that are as good as o1 so I have lots of choices to jump ship to.
And some of them will be from poisoned data, not just an explicit prompt by the site-owner. A whole new form of spam--excuse me--"AI Engine Optimization."
Google search is free. I suspect OpenAI may have to start charging for ChatGPT at some point so they stop hemorrhaging money. Customers who are opening their wallet might shop around for other offerings.
While I recognize this, I have to assume that the other "big players" already have this same data... ie: anyone with a search engine that's been crawling the web for decades. New entries to the race? Not so much, new walls and such.
That gives the people who've already started an advantage over newcomers, but it's not a unique advantage to OpenAI.
The question really should be what if anything gives OpenAI an advantage over Anthropic, Google, Meta, or Amazon? There are at least four players intent on eating OpenAI's market share who already have models in the same ballpark as OpenAI. Is there any reason to suppose that OpenAI keeps the lead for long?
I think their current advantage is willingness to risk public usage of frontier technology. This has been and I predict will be their unique dynamic. It forced the entire market to react, but they are still reacting reluctantly. I just played with Gemini this morning for example and it won't make an image with a person in it at all. I think that is all you need to know about most of the competition.
I think Anthropic is a serious technical competitor and I personally use their product more than OpenAI, BUT again I think their corporate cautiousness will have them always +/- a small delta from OpenAI's models. I just don't see them taking the risk of releasing a step function model before OpenAI or another competitor. I would love to be proven wrong. I am a little curious if the market pressures are getting to them since they updated their "Responsible Scaling Policy".
From what I've seen, Claude Sonnet 3.5 is decidedly less "safe" than GPT-4o, by the relatively new politicized understanding of "safety".
Anthropic takes safety to mean "let's not teach people how to build thermite bombs, engineer grey goo nanobots, or genome-targeted viruses", which is the traditional futurist concern with AI safety.
OpenAI and Google safety teams are far more concerned with revising history, protecting egos, and coddling the precious feelings of their users. As long as no fee-fees are hurt, it's full speed ahead to paperclip maximization.
This has not been my experience. Twice in the last week I've had Claude refuse to answer questions about a specific racial separatist group (nothing about their ideology, just their name and facts about their membership) and questions about unconventional ways to assess job candidates. Both times I turned to ChatGPT and it gave me an answer immediately
Not to dispute your particular comment, which I think is right, but it's worth pointing out we're full steam ahead on paperclips regardless of any AI company. This has been true for some 300 years, longer depending how flexible we are with definitions and where we locate inflection points
Well, at this point most new data being created is conversations with chatgpt, seeing as how stack overflow and reddit are increasingly useless, so their conversation logs are their moat.
Google and Meta aren't exactly lacking in conversation data: Facebook, Messenger, Instagram, Google Talk, Google Groups, Google Plus, Blogspot comments, Youtube Transcripts, &tc. The breadth and and breadth of data those 2 companies are sitting on that goes back for years is mind boggling.
Getting to market first is obviously worth something but even if you're bullish on their ability to get products out faster near term, Google's going to be breathing right down their neck.
They may have some regulatory advantages too, given that they're (sort of) not a part of a huge vertically integrated tech conglomerate (i.e. they may be able to get away with some stuff that Google could not).
I don't know if this is going to emerge as a monopoly, and likely won't, but for whatever reason, openai and anthropic have been several months ahead of everyone else for quite some time.
I think the perception that they're several months ahead of everyone is also a branding achievement: They are ahead on Chat LLMs specifically. Meta, Google, and others crush OpenAI on a variety of other model types, but they also aren't hyping their products up to the same degree.
Segment Anything 2 is fantastic- but less mysterious because its open source. NotebookLM is amazing, but nobody is rushing to create benchmarks for it. AlphaFold is never going to be used by consumers like ChatGPT.
OpenAI is certainly competitive, but they also work overtime to hype everything they produce as "one step closer to the singularity" in a way that the others don't.
>Meta, Google, and others crush OpenAI on a variety of other model types, but they also aren't hyping their products up to the same degree.
They aren't letting anyone external have access to their top end products either. Google invented transformers and kept the field stagnant for 5 years because they were afraid it would eat into their search monopoly.
OpenAI is 80% product revenue and 20% API revenue. Anthropic is 40/60 in the other direction, but Mike Krieger is now CPO and trying to change that. Amazon is launching a paid version of Alexa. Google is selling their Gemini assistant (which is honestly okay) and NotebookLM is a great product. Meta hasn't built a standalone AI product that you can pay for yet.
The combination of the latest models in products that people want to use is what will drive growth.
Not sure why this has been voted down - X.ai has a 100K H100 cluster in Memphis, and Meta either has (by now) or is in process of acquiring 350K H100s!
Unlike the hyperscalers (i.e. cloud providers), Meta has a use for these themselves for inference to run their business on.
my 8 year old knows what ChatGPT is but has never heard of any other LLM (or OpenAI for that matter). They're all "chatGPT" in the same way that refers to searching the internet as "googling" (and is unaware of Bing, DDG or any other search engine).
I think it shows really well how OpenAI was caught off guard when Chat GPT got popular and proved to be unexpectedly useful for a lot of people. They just gave it a technical name for what it was, a Generative Pre-trained Transformer model that was fine tuned for chat style interaction. If they had any plans on making a product close to what it is today they would have given it a catchier name. And now they're kind of stuck with it.
Well they cant come up with version names that stand out in any way so I dont expect them to give their core product a better name anytime soon. I wish they would spend a little time this, but i guess they are too busy building?
Everytime i ask this myself, OpenAI comes up with something new groundbreaking and other companies play catchup. The last was the Realtime API. What are they doing right? I dont know
OpenAI is playing catch-up of their own. The last big announcement they had was "we finally built Artifacts".
This is what happens when there's vibrant competition in a space. Each company is innovating and each company is trying to catch up to their competitors' innovations.
It's easy to limit your view to only the places where OpenAI leads, but that's not the whole picture.
Up front: I have always hated Facebook, from a “consumer” perspective. Good on everyone who made money, etc. I dislike the entire entity, to say the least.
I can’t shake the thought that meta played an integral role in the open-source nature of the LLM movement. Am I wrong, I can’t help but think I’m missing something.
I used to think it was significantly better than most other players but it feels like everyone else has caught up. Depending on the use case they have been surpassed as well. I use perplexity for a lot of thinks I would have previously used chatgpt for mostly because it gives sources with its responses.
It's possible that it's only one strong personality and some money away but my guess is that OpenAI-rosoft have the best stack for doing inference "seriously" at big, big, scale e.g. moving away from hacky research python code and so on.
I'm not so sure about that. They have kind of opposite incentives to OpenAI. OpenAI starting without much money had to hype the AGI next year stuff to get billions given to them. Google on the other hand is in such a dominant position with most of the search market, much of the ad market, ownership of Deepmind, huge amounts of data and money and so on probably don't want to be seen as a potential monopoly to be broken up.
As others have said I would say first-mover/brand advantage is the big one. Also their o1 model does seem to have some research behind it that hasn't been replicated by others. If you're curious about the latter claim, here's a blog I wrote about it: https://www.airtrain.ai/blog/how-openai-o1-changes-the-llm-t...
Nothing which other companies couldn't catch up with if OpenAI would break down / slow down for a year (i.e. because they lost their privileged access to computing resources).
Engineers would quit and start improving the competition. They're still a bit fragile, in my view.
Not really sure since this space is so murky due to the rapid changes happening. It's quite hard to keep track of what's in each offering if you aren't deep into the AI news cycle.
Now personally, I've left the ChatGPT world (meaning I don't pay for a subscription anymore) and have been using Claude from Anthropic much more often for the same tasks, it's been better than my experience with ChatGPT. I prefer Claude's style, Artifacts, etc.
Also been toying with local LLMs for tasks that I know don't require a multi-hundred billion parameters to solve.
Claude is great except for the fact the iOS app seems to require a login every week. I’ve never had to log into ChatGPT but Claude requires a constant login and the passwordless login makes it more of a pain!
I also like 3.5 sonnet as the best model (best ui too) and it’s the one I ask questions to
We use Gemini flash in prod. The latency and cost is just unbeatable - our product uses llms for lots of simple tasks so we don’t need a frontier model.
One hypothetical advantage could be secret agreements / cooperation with certain agencies. That may help influence policy in line with OpenAI's preferred strategy on safety, model access etc.
If Microsoft and OpenAI split up, can't Microsoft keep the house, the car, and the kids?
> One particular thing to note is that Brockman stated that Microsoft would get access to sell OpenAI's pre-AGI products based off of [OpenAI's research] to Microsoft's customers, and in the accompanying blog post added that Microsoft and OpenAI were "jointly developing new Azure AI supercomputing technologies."
> Pre-AGI in this case refers to anything OpenAI has ever developed, as it has yet to develop AGI and has yet to get past the initial "chatbot" stage of its own 5-level system of evaluating artificial intelligence.
For folks who are skeptical about OpenAI's potential, I think Brad Gerstner does a really good job representing the bull case for them (his firm Altimeter was a major investor in their recent round).
- They reached their current revenue of ~$5B about 2.5 years faster than Google and about 4.5 years faster than Facebook
- Their valuation to forward revenue (based on current growth) is inline with where Google and Facebook IPO'd
Does OpenAI's revenue per user increase with each new user? I don't think so, but it was definitely the case with Google and Facebook . That's a big difference that you can not overlook
why would teh relationship fray ? when microsoft has openAI by the balls. They just don't own it "legally" due to monopoly concerns from DOJ. But they own it. Remember microsoft is entitled to 49% of OpenAI's profits till they recoup their investment.
That's like the protection fee you pay to the mafia boss.
Let that sink in for anyone that has incorporated Chatgpt in their work routines to the point their normal skills start to atrophy. Imagine in 2 years time OpenAI goes bust and MS gets all the IP. Now you can't really do your work without ChatGPT, but it cost has been brought up to how much it really costs to run. Maybe $2k per month per person? And you get about 1h of use per day for the money too...
I've been saying for ages, being a luditite and abstaining from using AI is not the answer (no one is tiling the fields with oxen anymore either). But it is crucial to at the very least retain 50% of capability hosted models like Chatgpt offer locally.
The marginal cost of inference per token is lower than what OpenAI charges you (IIRC about 2x cheaper), they make a loss because of the enormous costs of R&D and training new models.
It’s not clear this is true because reported numbers don’t disaggregate paid subscription revenue (certainly massively GP positive) vs free usage (certainly negative) vs API revenue (probably GP negative).
Most of their revenue is the subscription stuff, which makes it highly likely they lose money per token on the api (not surprising as they are are in price war with Google et al)
If you have an enterprise ChatGPT sub you have to consume around 5mln tokens a month to match the cost of using the api on GPT4o. At 100 words per minute that’s 35 days on continuous typing which shows how ridiculous the costs of api vs subscription are.
In summary, the original point of this thread is wrong. There’s essentially no future where these tools disappear or become unavailable at reasonable cost for consumers. Much more likely is they get way better.
OpenAI’s potential issue is that if Google offers tokens at a 10% gross margin, OpenAI won’t be able to offer api tokens at a positive gross margin at all. Their only chance really is building a big subscription business. No way they can compete with a hyperscaler on api cost long run
This says 506 tokens/second for Llama 405B on a machine with 8x H200s which you can rent for $4/GPU so probably $40/hour for a server with enough GPUs. And so it can do ~1.8M tokens per hour. OpenAI charges $10/1M output tokens for GPT4o. (input tokens and cached tokens are cheaper, but this is just ballpark estimates.) So if it were 405B it might cost $20/1M output tokens.
Now, OpenAI is a little vague, but they have implied that GPT4o is actually only 60B-80B parameters. So they're probably selling it with a reasonable profit margin assuming it can do $5/1M output tokens at approximately 100B parameters.
And even if they were selling it at cost, I wouldn't be worried because a couple years from now Nvidia will release H300s that are at least 30% more efficient and that will cause a profit margin to materialize without raising prices. So if I have a use case that works with today's models, I will be able to rent the same thing a year or two from now for roughly the same price.
> The marginal cost of inference per token is lower than what OpenAI charges you
Unlike most Gen AI shops, OpenAI also incurs a heavy cost for traning base models gunning for SoTA, which involves drawing power from a literal nuclear reactor inside data centers.
This is fascinating to think about. Wonder what kind of shielding/environmental controls/all other kinds of changes you'd need for this to actually work. Would rack-sized SMR be contained enough not to impact anything? Would datacenter operators/workers need to follow NRC guidance?
It makes zero sense to build them in datacenters and I don’t know of any safety authority that would allow deploying reactors without serious protection measures that would at the very least impose a different, dedicated building.
At some point it does make sense to have a small reactor powering a local datacenter or two, however. Licensing would still be not trivial.
I think the simple answer is that it doesn't make sense. Nuclear power plants generate a byproduct that inherently limits the performance of computers; heat. Having either a cooling system, reactor or turbine located inside a datacenter is immediately rendered pointless because you end up managing two competing thermal systems at once. There is no reason to localize a reactor inside a datacenter when you could locate it elsewhere and pipe the generated electricity into it via preexisting high voltage lines.
> Nuclear power plants generate a byproduct that inherently limits the performance of computers; heat.
The reactor does not need to be in the datacenter. It can be a couple hundreds meters away, bog-standard cables would be perfectly able to move the electrons. The cables being 20m or 200m long does not matter much.
You’re right though, putting them in the same building as a datacenter still makes no sense.
Where are you getting $2k/person/month? ChatGPT allegedly has on the order of 100 million users. Divide that by $5b and you get a $50 deficit per person per year. Meaning they could raise their prices by less than four and a half dollars per user to break even.
Even if they were to only gouge the current ~11 million paying subscribers, that's around $40/person/month over current fees to break even. Not chump change, but nowhere close to $2k/person/month.
What you're suggesting is the basic startup math for any typical SaaS business. The problem is OpenAI and the overall AI space is raising funding on the promise of being much more than a SaaS. If we ignore all the absurd promises ("it'll solve all of physics"), the promise to investors is distilled down to this being the dawn of a new era of computing and investors have responded by pouring in hundreds of billions of dollars into the space. At that level of investment, I sure hope the plan is to be more than a break-even SaaS.
My apologies. The “break even” calculation in the first paragraph seemed so absurd to me (as much as the $2K per month) that I must have skipped over the second paragraph. Nevertheless, with regard to the second paragraph, to fill the $5B gap it would take over $30 on top (hence $50), which would be quite a high price for many users.
I think the question is more how much the market will bear in a world where MS owns the OpenAI IP and it's only available as an Azure service. That's a different question from what OpenAI needs to break even this year.
So 3x the fees, if they're currently at $20/user/month. That's a big jump, and puts the tool in a different spending category as it goes from just another subscription to more like another utility bill in users' minds. The amount of value you're getting out of it is hard to quantify for most people, so I imagine they'd lose customers.
Also there's a clear market trend, and that is that AI services are $20 for the good version, or free. $60 is not a great price to compete in that market at unless you're clearly better.
> being a luditite and abstaining from using AI is not the answer
Hum... The judge is still out on that one, but the evidence is piling up into the "yes, not using it is what works best" here. Personally, my experience is strongly negative, and I've seen other people get very negative results from it too.
Maybe it will improve so much that at some point people actually get positive value from it. My best guess is that we are not there yet.
Yeah, I agree. It's not "being a Luddite" to take a look and conclude that the tool doesn't actually deliver the value it claims to. When AI can actually reliably do the things its proponents say it can do, I'll use it. But as of today it can't, and I have no use for tools that only work some of the time.
Cost tends to go down with time as compute becomes cheaper. And as long as there is competition in the AI space it's likely that other companies would step in and fill the void created by OpenAI going belly up.
> Cost tends to go down with time as compute becomes cheaper.
This is generally true but seems to be, if anything, inverted for AI. These models cost billions to train in compute, and OpenAI thus far has needed to put out a brand new one roughly annually in order to stay relevant. This would be akin to Apple putting out a new iPhone that costed billions to engineer year over year, but was giving the things away for free on the corner and only asking for money for the versions with more storage and what have you.
The vast majority of AI adjacent companies too are just repackaging OpenAI's LLMs, the exceptions being ones like Meta, which certainly has a more solid basis what with being tied to an incredibly profitable product in Facebook, but also... it's Meta and I'm sure as shit not using their AI for anything, because it's Meta.
I did some back of napkin math in a comment a ways back and landed on that in order to break even merely on training costs, not including the rest of the expenditure of the company, they would need to charge all of their current subscribers $150 per month, up from... I think the most expensive right now is about $20? So nearly an 8 fold price increase, with no attrition, to again break even. And I'm guessing all these investors they've had are not interested in a 0 sum.
This reasoning about the subscription price etc is undermined by the actual prices OpenAI are charging -
The price of a model capable of 4o mini level performance used to be 100x higher.
Yes, literally 100x. The original "davinci model" (and I paid $5 figures for using it throughout 2021-2022) cost $0.06/1k tokens.
So it's not inverting in running costs (which are the thing that will kill a company). Struggling with training costs (which is where you correctly identify OpenAI is spending) will stop growth perhaps, but won't kill you if you have to pull the plug.
I suspect subscription prices are based on market capture and perceived customer value, plus plans for training, not running costs.
> So it's not inverting in running costs (which are the thing that will kill a company). Struggling with training costs (which is where you correctly identify OpenAI is spending) will stop growth perhaps, but won't kill you if you have to pull the plug.
I don’t think it’s that cut and dried though. Many users run into similar issues as other issues with things like reasoning (which is (allegedly) being addressed) and hallucinations (less so) both of which in turn become core reasons for subsequent better versions of the tech. Whether the subsequent versions deliver on those promises is irrelevant (though they often don’t) to that, at least IMHO, being a core reason to “stay on board” with the product. I have to think if they announced tomorrow they couldn’t afford to train the next one that there would be a pretty substantial attrition of paying users, which then makes it even harder to resume training in the future, no?
The closest analog seems to be bitcoin mining, which continually increases difficulty. And if you've ever researched how many bitcoin miners go under...
It's nothing like bitcoin mining. Bitcoin mining is intentionally designed so that it gets harder as people use it more, no matter what.
With LLMs, if you have a use case which can run on an H100 or whatever and costs $4/hour, and the LLM has acceptable performance, it's going to be cheaper in a couple years.
Now, all these companies are improving their models but they're doing that in search of magical new applications the $4/hour model I'm using today can't do. If the $4/hour model works today, you don't have to worry about the cost going up. It will work at the same price or cheaper in the future.
But OpenAI has to keep releasing new ever-increasing models to justify it all. There is a reason they are talking about nuclear reactors and Sam needing 7 trillion dollars.
One other difference from Bitcoin is that the price of Bitcoin rises to make it all worth it, but we have the opposite expectation with AI where users will eventually need to pay much more than now to use it, but people only use it now because it is free or heavily subsidized. I agree that current models are pretty good and the price of those may go down with time but that should be even more concerning to OpenAI.
> But OpenAI has to keep releasing new ever-increasing models to justify it all.
There seems to be some renewed interest for smaller, possibly better-designed LLMs. I don’t know if this really lowers training costs, but it makes inference cheaper. I suspect at some point we’ll have clusters of smaller models, possibly activated when needed like in MoE LLMs, rather than ever-increasing humongous models with 3T parameters.
I tend to think along the same lines. If they were the only player in town it would be different. I am also not convinced $5billion is that big of a deal for them, would be interesting to see their modeling but it would be a lot more suspect if they were raising money and increasing the price of the product. Also curious how much of that spend is R&D compared to running the system.
And Zuckerberg has vowed to pump billions more into developing and releasing more Llama. I believe "Altman declaring AGI is almost here" was peak OpenAI and now I will just have some popcorn ready.
I found those tools to resemble an intern: they can do some tasks pretty well, when explained just right, but others you'd spend more time guiding than it would have taken you to do it yourself.
And rarely can you or the model/intern can tell ahead of time which tasks are in each of those categories.
The difference is, interns grow and become useful in months: the current rate of improvements in those tools isn't even close to that of most interns.
I have a slightly different view. IMHO LLMs are excellent rubber ducks or pair programmers. The rate at which I can try ideas and get them back is much higher than what I would be doing by myself. It gets me unstuck in places where I might have spent the best part of a day in the past.
My experience differs: if at all, they get me unstuck by trying to shove bad ideas, which allows me to realize "oh, that's bad, let's not do that". But it's also extremely frustrating, because a stream of bad ideas from a human has some hope they'll learn, but here I know I'll get the same BS, only with an annoying and inhumane apology boilerplate.
It's premature to think you can replace a junior developer with current technology, but it seems fairly obvious that it'll be possible within 5-10 years at most. We're well past the proof-of-concept stage IMO, based on extensive (and growing) personal experience with ML-authored code. Anyone who argues that the traditional junior-developer role isn't about to change drastically is whistling past the graveyard.
Your C-suite execs are paid to skate where that particular puck is going. If they didn't, people would complain about their unhealthy fixation on the next quarter's revenue.
Of course, if the junior-developer role is on the chopping block, then more experienced developers will be next. Finally, the so-called "thought leaders" will find themselves outcompeted by AI. The ability to process very large amounts of data in real time, leveraging it to draw useful conclusions and make profitable predictions based on ridiculously-large historical models, is, again, already past the proof-of-concept stage.
Unless I’ve missed some major development then I have to strenuously disagree. AI is primarily good at writing isolated scripts that are no more than a few pages long.
99% of the work I do happens in a large codebase, far bigger than anything that you can feed into an AI. Tickets come in that say something like, “Users should be able to select multiple receipts to associate with their reports so long as they have the management role.”
That ticket will involve digging through a whole bunch of files to figure out what needs to be done. The resolution will ultimately involve changes to multiple models, the database schema, a few controllers, a bunch of React components, and even a few changes in a micro service that’s not inside this repo. Then the AI is going to fail over and over again because it’s not familiar with the APIs for our internal libraries and tools, etc.
AI is useful, but I don’t feel like we’re any closer to replacing software developers now than we were a few years ago. All of the same showstoppers remain.
All of the code you mention implements business logic, and you're right, it's probably not going to be practical to delegate maintenance of existing code to an ML model. What will happen, probably sooner than you think, is that that code will go away and be replaced by script(s) that describe the business logic in something close to declarative English. The AI model will then generate the code that implements the business logic, along with the necessary tests.
So when maintenance is required, it will be done by adding phrases like "Users should be able to select multiple receipts" to the existing script, and re-running it to regenerate the code from scratch.
Don't confuse the practical limitations of current models with conceptual ones. The latter exist, certainly, but they will either be overcome or worked around. People are just not as good at writing code as machines are, just as they are not as good at playing strategy games. The models will continue to improve, but we will not.
The problem is, the feature is never actually "users should be able to select multiple receipts". It's "users should be able to select multiple receipts, but not receipts for which they only have read access and not write access, and not when editing a receipt, and should persist when navigating between the paginated data but not persist if the user goes to a different 'page' within the webapp. The selection should be a thick border around the receipt, using the webapp selection color and the selection border thickness, except when using the low-bandwidth interface, in which case it should be a checkbox on the left (or on the right if the user is using a RTL language). Selection should adhere to standard semantics: shift selects all items from the last selection, ctrl/cmd toggles selection of that item, and clicking creates a new, one-receipt selection. ..." By the time you get all that, it's clearer in code.
I will observe that there have been at least three natural-language attempts in the past, none of which succeeded in being "just write it down". COBOL is just as code-y as any other programming language. SQL is similar, although I know a fair amount of non-programmers who can write SQL (but then, back in the day my Mom taught be about autoexec.bat, and she could care less about programming). Anyway, SQL is definitely not just adding phrases and it just works. Finally, Donald Knuth's WEB is a mixture, more like a software blog entry, where you put the pieces of the software inamongst the explanatory writeup. It has caught on even less, unless you count software blogs.
I will observe that there have been at least three natural-language attempts in the past, none of which succeeded in being "just write it down". COBOL...
1 months salary now, or then? You'll be 5 years further into your career, so it'll hopefully be higher, but also, the industry is changing. Even if ChatGPT-5 never comes out, it's already making waves on developer productivity where there's enough training data. So in five years will it still be a highly paid $300k/yr at a FAANG position, or will it pay more like being the line cook at a local diner. Or maybe it'll follow the pay rate for musicians - trumpet players before cheap records came out made a decent living. Since then, the rise of records, and then radio and CDs and now the Internet and Spotify means that your local bar doesn't need to have a person come over to play music in order to have music. or visuals for that matter. The sports bar wouldn't exist without television. So maybe programming will be like being a musician in five years, with some making Taylor Swift money, and others busking at subway entrances. I'm hoping it'll still be a highly paid position, but it would be foolish of me not to see how easy it is to make an app by sitting down with Claude and giving it some high level directives and iterating.
Using an advanced programming technique called modularization, where you put the code into multiple different files, you may find it possible to get around the LLMs problem of limited context window length and find success building more than a trivial todo app. Of course you'd have to try this for yourself instead of parroting what you read on the Internet, so your mileage may vary. =p
Cursor has no problem making complicated PRs spanning multiple files and modules in my legacy spaghetti code. I wouldn't be surprised if it could replace most programmers already.
From your comments, it's clear you've already made up your mind that it can't possibly be true and you're just trying to find rationalisations to support your narrative. I don't understand why you feel the need to be rude about it though.
My comments stem from real world experience, since I do exist outside of my comments (although I understand it can be hard to imagine).
Every single person who claimed AI was a great help to their job in writing software that I've encountered was either inexperienced (regardless of age) or working solely on very simple tasks.
The fact that it's even remotely useful at all at such an early, primitive stage of development should give you pause.
When it comes to stuff like this, the state of the art at any given time is irrelevant. Only the first couple of time derivatives matter. How much room for growth do you have over the next 5-10 years?
You would think thought leaders would be the first to be replaced by AI.
> The ability to process very large amounts of data in real time, leveraging it to draw useful conclusions and make profitable predictions based on ridiculously-large historical models, is, again, already past the proof-of-concept stage.
I think Enterprise plans mostly solve this. And Copilot is quite aggressive with blocking public code (haven't looked in to what that really means and what we've configured, I just get the error often)
The cost of current compute for current versions pf chatgpt will have dropped through the floor in 2 years, due to processing improvements and on die improvements to silicon.
Power requirements will drop too.
As well, as people adopt, the output of training costs will be averaged over an ever increasing market of licensing sales.
Looking at the cost today, and sales today in a massively, rapidly expanding market, is not how to assess costs tomorrow.
I will say one thing, those that need gpt to code will be the first to go. Becoming a click-click, just passing on chatgpt output, will relegate those people to minimum wage.
We already have some of this sort, those that cannot write a loop in their primary coding language without stackoverflow, or those that need an IDE to fill in correct function usage.
Those who code in vi, while reading manpages need not worry.
> Those who code in vi, while reading manpages need not worry.
That sounds silly at first read, but there are indeed people who are so stubborn to still use numbered zip files on a usb flash drive in stead of source control systems, or prefer to use their own scheduler over an RTOS.
They will survive, they fill a niche, but I would not say they can do full stack development or be even easy to collaborate with.
> We already have some of this sort, those that cannot write a loop in their primary coding language without stackoverflow, or those that need an IDE to fill in correct function usage.
> Those who code in vi, while reading manpages need not worry
I think that's the wrong dichotomy: LLMs are fine at turning man pages into working code. In huge codebases, LLMs do indeed lose track and make stuff up… but that's also where IDEs giving correct function usage is really useful for humans.
The way I think we're going to change, is that "LGTM" will no longer be sufficient depth of code review: LLMs can attend to more than we can, but they can't attend as well as we can.
And, of course, we will be getting a lot of LLM-generated code, and having to make sure that it really does what we want, without surprise side-effects.
As I recall, while Amazon was doing this, there was no comparable competition from other vendors that properly understood the internet as a marketplace? Closest was eBay?
There is real competition now that plenty of big box stores' websites also list things you won't see in the stores themselves*, but then also Amazon is also making a profit now.
I think the current situation with LLMs is a dollar auction, where everyone is incentivised to pay increasing costs to outbid the others, even though this has gone from "maximise reward" to "minimise losses": https://en.wikipedia.org/wiki/Dollar_auction
* One of my local supermarkets in Germany sells 4-room "garden sheds" that are substantially larger than the apartment I own in the UK: https://www.kaufland.de/product/396861369/
And for every Amazon, there are a hundred other companies that went out of business because they never could figure out how to turn a profit. You made a bet which paid off and that's cool, but that doesn't mean the people telling you it was a bad bet were wrong.
Why does everyone always like to compare every company to Amazon? Those companies are never like Amazon, which is one of the most entrenched companies ever.
While I agree the comparison is not going to provide useful insights, in fairness to them Amazon wasn't entrenched at the time they were making huge losses each year.
Being a luddite has it’s advantages as you won’t succumb to the ills of society trying to push you there. To believe that it’s inevitable LLMs will be required to work is silly in my opinion. As these corps eat more and more good will of the content on the internet for only their gain, people will and have already started defecting from it. Many of my coworkers have shut off CoPilot, though still occasionally use ChatGPT. But since the power really only is adding randomization to established working document templates, the gain is only of a short amount of working time.
There is also the active and passive efforts to poison the well. As LLMs are used to output more content and displace people, the LLMs will be trained on the limited regurgitation available to the public (passive). Then there’s the people intentionally creating bad content to be ingested. It really is a lose for big service llm companies as the local models become more and more good enough (active).
I used to be concerned with this back when GPT4 originally came out and was way more impressive than the current version and OpenAI was the only game in town.
But Nowadays GPT has been quantized and cost-optimized to hell that it's no longer as useful as it was and with Claude or Gemini or whatever it's no longer noticeably better than any of them so it doesn't really matter what happens with their pricing.
Are you saying they reduced the quality of the model in order to save compute? Would it make sense for them to offer a premium version of the model at at a very high price? At least offer it to those willing to pay?
It would not make sense to reduce output quality only to save on compute at inference, why not offer a premium (and perhaps perhaps slower) tier?
Unless the cost is at training time, maybe it would not be cost-effective for them to keep a model like that up to date.
As you can tell I am a bit uninformed on the topic.
Yeah, as someone who had access to gpt-4 early in 2023, the endpoint used to take over a minute to respond and the quality of the responses was mindblowing. Simply too expensive to serve at scale, not to mention the silicon constraints that are even more prohibitive when the organization needs to lock up a lot of their compute for training The Next Big Model. Thats a lot of compute that cant be on standby for serving inference
Is anyone using it to the point where their skills start to atrophy? I use it fairly often but mostly for boilerplate code or simple tasks. The stuff that has specific syntax that I have to look up anyway.
That feels like saying that using spell check or autocomplete will make one's spelling abilities atrophy.
It's a devil's bargain, and not just in terms of the _individual_ payoffs that OpenAI employees/executives might receive. There's a reason why Google/Microsoft/Amazon/... ultimately failed to take the lead in GenAI, despite every conceivable advantage (researchers, infrastructure, compute, established vendor relationships, ...). The "autonomy" of a startup is what allows it to be nimble; the more Microsoft is able to tell OpenAI what to do, the more I expect them to act like DeepMind, a research group set apart from their parent company but still beholden to it.
> If you disagree, I would argue you have a very sad view of the world, where truth and cooperation are inferior to lies and manipulation.
You're holding everyone to a very simple, very binary view with this. It's easy to look around and see many untrustworthy players in very very long running games whose success lasts most of their own lives and often even through their legacy.
That doesn't mean that "lies and manipulation" trump "truth and cooperation" in some absolute sense, though. It just means that significant long-running games are almost always very multi-faceted and the roads that run through them involve many many more factors than those.
Those of us who feel most natural being "truthful and cooperative" can find great success ourselves while obeying our sense of integrity, but we should be careful about underestimating those who play differently. They're not guaranteed to lose either.
The entire world economy is based on trust. You worked for 8 hours today because you trust you'll get money in a week that you trust can be used to buy toilet paper at Costco.
There are actually fascinating theories that the origin of money is not as a means of replacing a barter system, but rather as a way of keeping track who owed favors to each other. IOUs, so to speak.
That's because you're imagining early paper currency as a universal currency.
These early promissory notes were more like coupons that were redeemed by the merchants. It didn't matter how many times a coupon was traded. As a good merchant, you knew how many of your notes you had to redeem because you're the one issuing the notes.
Modern cash systems involve anonymity and do not inherently keep track of the ownership history of money (as I noted). This anonymity is a fundamental feature of cash and many forms of currency today. Sure, early forms of currency might have functioned in small, close-knit communities and in such contexts, people were more likely to know each other’s social debts and relationships.
My point about cash being anonymous was meant to highlight how modern currency differs from the historical concept of money as a social ledger. This contrast is important because it shows how much the role of money has evolved.
I didn't say positive impact. What I'm saying is you either think Sam's conniving will work out for him, or it won't. If you think it will, that's very cynical.
Despite what you may think, it isn't inherently illogical to be cynical. In fact, sometimes it's the appropriate point of view. Just the same, viewing the world optimistically is sometimes also viewed as naive.
In any case, the real issue with your logic is in thinking that an individual's personal views on the morality of a situation are correlated with the actual, potentially harsh, reality of that situation. There is rarely ever such a correlation and when it happens, it is likely a coincidence.
Is Sam Altman untrustworthy? Of course, he seems like a snake. That doesn't mean he will fail. And predicting the reality of the thing (that awful people sometimes succeed in this world) does not make someone inherently wrong or negative or even cynical - it just makes them a realist.
A telling quote about Sam, besides the "island of cannibals" one. Is actually one Sam published himself:
"Successful people create companies. More successful people create countries. The most successful people create religions"
This definition of success is founded on power and control. It's one of the worst definitions you could choose.
There are nobler definitions, like "Successful people have many friends and family" or "Successful people are useful to their compatriots"
Sam's published definition (to be clear, he was quoting someone else and then published it) tells you everything you need to know about his priorities.
As you said, Sam didn’t write that. He was quoting someone else and wasn’t even explicitly endorsing it. He was making a comment about financially successful founders approach making a business as more of a vision and mission that they drive to build buy-in for, which makes sense as a successful tactic in the VC world since you want to impress and convince the very human investors
""Successful people create companies. More successful people create countries. The most successful people create religions."
I heard this from Qi Lu; I'm not sure what the source is. It got me thinking, though--the most successful founders do not set out to create companies. They are on a mission to create something closer to a religion, and at some point it turns out that forming a company is the easiest way to do so.
In general, the big companies don't come from pivots, and I think this is most of the reason why."
Well, it’s an observation, intelectual people like to make connections, to me observing something or sharing a connection you made in your mind it’s not necessarily endorsing the statement about power.
He’s dissecting it and connecting with the idea that if you a have a bigger vision and the ability to convince people, making a company is just an “implementation detail”
… oh well .. you might be right after all … but I suspect is more nuanced, and is not endorsing religions as a means of obtaining success, I want to believe that he meant the visionary, bigger than yourself well intended view of it.
I'm sure if we were to confront him on it, he would give a much more nuanced view of it. But unprompted, he assumed it as true and gave further opinions based on that assumption.
That tells us, at the very least, this guy is suspicious. Then you mix in all the other lies and it's pretty obvious I wouldn't trust him with my dog.
Those are boring definitions of success. If you can’t create a stable family, your not successful at one facet, but you could be at another (eg musk.).
Boring is not correlated with how good something is. Most of the bad people in history were not boring. Most of the best people in history were not boring. Correlation with evilness = 0.
You could have many other definitions that are not boring but also not bad. The definition published by Sam is bad
> This definition of success is founded on power and control.
I don’t get how this follows from the quote you posted?
My interpretation is that successful people create durable, self sustaining institutions that deliver deeply meaningful benefits at scale.
I think that this interpretation is aligned with your nobler definitions. But your view of the purpose of government and religion may be more cynical than mine :)
I think he is making an allusion to Apple's culture.
There's successful companies because their product is good, there's more successful companies because they started early (and it feels like a monopoly: Google, Microsoft), and there's the most successful company that tells you what you are going to buy (Apple's culture).
You don’t backup why you think this is the case. You only say that to think otherwise makes for a sad view of the world.
I’d argue that you can find examples of companies that were untrustworthy and still won. Oracle stands out as one with a pretty poor reputation that nevertheless has sustained success.
The problem for OpenAI here is that they need the support of tech giants and they broke the trust of their biggest investor. In that sense, I’d agree that they bit the hand that was feeding them. But it’s not because in general all untrustworthy companies/leaders lose in the end. OpenAI’s dependence on others for success is key.
There's mountains of research both theoretical and empirical that support exactly this point.
There's also mountains of research both theoretical and empirical that argue against exactly this point.
The problem is most papers on many scientific subjects are not replicable nowadays [0], hence my appeal to common sense, character, and wisdom. Highly underrated, especially on platforms like Hacker News where everything you say needs a double blind randomized controlled study.
This point^ should actually be a fundamental factor in how we determine truth nowadays. We must reduce our reliance on "the science" and go back to the scientific method of personal experimentation. Try lying to business partner a few times, let's see how that goes.
We can look at specific cases where it holds true- like in this case. There may be cases where it doesn't hold true. But your own experimentation will show it holds true more than not, which is why I'd bet against OpenAI
Prove what point? There have clearly been crooked or underhanded companies that achieved success. Microsoft in its early heyday, for example. The fact that they paid a price for it doesn't obviate the fact that they still managed to become one of the biggest companies in history by market cap despite their bad behavior. Heck, what about Donald Trump? Hardly anyone in business has their crookedness as extensively documented as Trump and he has decent odds of being a two-term US President.
What about the guy who repaired my TV once, where it worked for literally a single day, and then he 100% ghosted me? What was I supposed to do, try to get him canceled online? Seems like being a little shady didn't manage to do him any harm.
It's not clear to me whether it's usually worth it to be underhanded, but it happens frequently enough that I'm not sure the cost is all that high.
Thats assuming untrustworthy players can't skew the rules so pretty much academic spherical horse in a vacuum. There is regulatory capture there is coercion there is corruption.
There seems to be an ongoing mass exodus of their best talent to Anthropic and other startups. Whatever their moat is, that has to catch up with them at some point.
There is no moat. The reality is not only are they bleeding talent but the pace of innovation in the space is not accelerating and quickly running into scaling constraints.
Because they're trustworthy. If you buy a package on Amazon or Craigslist, who do you trust to deliver it to your door tomorrow? People love the trope that their neighbor is trustworthy and the evil big company isn't, but in reality it's exactly the other way around. If you buy your heart medication you buy it from Bayer or an indie startup?
Big, long lived companies excel at delivering exactly what they say they are, and people vote with their wallet on this.
> If you buy a package on Amazon or Craigslist, who do you trust to deliver it to your door tomorrow?
Amazon tracks which small businesses on their platform do decent business and then use that data to create a competing product and crush the original business.
I don't consider that trustworthy even if that thing is delivered to me on time.
> If you buy your heart medication you buy it from Bayer or an indie startup?
I don't trust either. I trust the regulations put in place by our democracy.
I don't know if Amazon or Microsoft are trustworthy or not.
But I agree with your point. And it gets very ugly when these big institutions suddenly lose trust. They almost always deserve it, but it can upend daily life.
And that is Amazon deliberately pawning counterfeits? Or is that other bad actors taking advantage of "Fulfilled by Amazon" infrastructure and its weakpoints?
Amazon has been ignoring the problem for a long time, and is well aware of it.
They're so aware of it that I'd personally (not a lawyer though) consider them culpable due to their inaction in making any substantial actions towards fixing the problems.
No. It being an ongoing problem doesn't mean they're not trying to fix it. Is that the only evidence you have? "Problem exists"? Some problems are just... hard problems.
Google has massive issues with SEO spam, and has for a long time. Does that mean they're not trying to deliver higher quality search results?
> Google has massive issues with SEO spam, and has for a long time. Does that mean they're not trying to deliver higher quality search results?
Google has a history of publicly discussing their efforts to fix this, not to mention conflict with legitimate businesses complaining that they've been caught in the crossfire of Google's war on SEO spam.
I'd consider that powerful evidence they're trying.
Is there similar evidence of legitimate businesses getting in trouble by Amazon's sincere efforts to stop counterfeit sales? Or instead is there evidence of legitimate businesses getting hurt by Amazon's failure to do stop it?
Have you actually looked into this at all, or are you expecting to be magically "aware" of things that are not your purview? Amazon literally has a Counterfeit Crimes Unit that exists entirely for this purpose.
Any amount of searching the web would've revealed this to you. Here is a video[1] from 6 days before you made your comment about their efforts.
The problem is that they have no moat, and Sam Altman is no visionary. He's clearly been outed as a ruthless opportunist whose primary skill is seizing opportunities, not building out visionary technical roadmaps. The jury is still out on his ability to execute, but things do seem to be falling apart with the exit of his top engineering talent.
Compare this to Elon Musk, who has built multiple companies with sizable moats, and who has clearly contributed to the engineering vision and leadership of his companies. There is no comparison. It's unlikely OpenAI would have had anywhere near its current success if Elon wasn't involved in the early days with funding and organizing the initial roadmap.
>The problem is that they have no moat, and Sam Altman is no visionary.
In his defense he is trying to fuck us all by feverishly lobbying the US Congress about the fact that "AI is waaay to dangerous" for newbs and possibly terrorists to get their hands on. If that eventually pays off, then there will be 3-4 companies that control all of any LLMs that matter.
I get that people don't like Elon because he's a bit autistic and espouses political views that are unpopular on this platform, but if you can't acknowledge Elon's business acumen and technical vision, I feel like you aren't a serious person. He co-founded Paypal, SpaceX, and OpenAI (3 of the most influential companies of our time), has been the CEO of Tesla (the world's leading EV company) since before they were shipping product, and has arguably made Twitter exactly what he set out to -- an uncensored social media platform. I haven't had any problems with it, and he's substantially reduced its net operating less at the cost of lower revenue, mostly as a result of politics.
3 of his companies are HARD technology companies and basically the 3 most influential companies in the world today. Founded or co-founded 4 of the most significant organizations in the country. There is no one else in recent history who even comes close to his level of accomplishment. And you're telling me Elon's only notable skill is marketing and getting lucky with his executive hiring. Come on.
Who are you betting on then? Anthropic? Google? Someone else? I mean Microsoft was not the friendliest company. But they were good enough at serving their customers needs to survive and prosper.
At one end are the chip designers and manufacturers like Nvidia. At another end are the end user products like Cursor (ChatGPT was actually OpenAI's breakthrough and it was just an end-user product innovation. GPT-3.5 models had actually already been around)
I would bet on either side, but not in the middle on the model providers.
I can see the big chip makers making out like bandits - a la Cisco and other infra providers with the rise of the internet.
They are facing competition from companies making hardware geared toward that inference that I think will push their margins down over time.
On the other end of the competitive landscape, what moat do those companies have? What is to stop OpenAI from pulling a Facebook and Sherlocking the most profitable products built on their platform?
Something like Apple developing a chip than can do LLM inference on device would completely upend everything.
It's a good question. I think the user facing stuff has things like brand recognition, customer support, user trust, inertia and other things on its side.
Models don't have this benefit. In Cursor, I can even switch between models. It would take a lot of convincing for me to switch off of Cursor, however.
Elon Musk alone disproves your theory. I wish I agreed with you, I'm sure I'd be happier. But there's just too many successful sociopaths. Hell there was a popular book about it.
Yes correct. And hopefully untrustworthy people become clearly untrustworthy people eventually.
Elon is not "untrustworthy" because of some ambitious deadlines or some stupid statements. He's plucking rockets out of the air and doing it super cheap whereas all competitors are lining their pockets with taxpayer money.
You add in everything else (free speech, speaking his mind at great personal risk, tesla), he reads as basically trustworthy to me.
When he says he's going to do something and he explains why, I basically believe him, knowing deadlines are ambitious.
"Free speech" is kind of a weird thing to ascribe to Musk, given that it's a perfect almost archetypical example of where he says one thing and actually does the exact opposite.
There's so many demos where Elon has faked and lied its very surprising to have him read as "basically trustworthy" even if he has done other stuff - have dancing people as robots with fake robot demos, the fake solar roof, fake full self driving, really fake promises about cyber taxis and teslas paying for themselves (like 7 years ago?).
The free speech part also reads completely hollow when the guy's first actions were to ban his critics on the platform and bring back self avowed nazis - you could argue one of those things are in favor of free speech, but generally doing both just implies you are into the nazi stuff.
No, I am complaining about in person appearances in front of audiences where he knowingly lied, moving the goalposts doesn't make him honest, just more trustworthy to complete something than {insert incompetent people here}.
Having the general ability to accomplish something doesn't magically infer integrity, you doing what you say does. Misleading and dissembling about doing what you say you will do is where you get the untrustworthy label, regardless of your personal animus or positive view of Musk.
When did Elon ban any of his critics permanently from Twitter ? The most famous I remember was Kathy Griffin for impersonation, but she was brought back after the "parody" label was added. And that was done to multiple parody accounts not just hers.
That's an interesting way to characterize Elon's history. "Ambitious deadlines" implies you are believe he will one day deliver on the many, many claims he's made that have never happened.
SpaceX and Tesla have both accomplished great things. There's a lot of talented people that work there. Elon doesn'r deserve all the credit for all their hard work.
Still depends on the definition of success. Money and companies with high stock prices? Healthy family relationships and rich circle of diverse friends?
I would argue this is not subjective. "Healthy family relationships and rich circle of diverse friends" is an objectively better definition than "Money and companies with high stock prices".
I await with arms crossed all the lost souls arguing it's subjective.
While I personally also consider my relationships to be more important than my earnings, I am still going to argue that it's subjective. Case in point, both you and I disagree with Altman about what success means. We are all human beings, and I don't see any objective way to argue one definition is better than another.
In case you are going to make an argument about how happiness or some related factor objectively determines success, let me head that off. Altman thinks that power rather than happiness determines success, and is also a human being. Why objectively is his opinion wrong and yours right? Both of your definitions just look like people's opinions to me.
Was not going to argue happiness at all. In fact, happiness seems a very hedonistic and selfish way to measure it too.
My position is more mother goose-like. We simply have basic morals that we teach children but don't apply to ourselves. Be honest. Be generous. Be fair. Be strong. Don't be greedy. Be humble.
That these are objectively moral is unprovable but true.
That... is actually a pretty interesting argument. I have to admit that if an objective morality existed floating in the Aether, there would be no way to logically prove or disprove that one's beliefs matched it.
Since I can't argue it logically, let me make an emotional appeal by explaining how my beliefs are tied to my life:
I chose to be a utilitarian when I was 12 or so, though I didn't know it had that name yet. The reason I chose this is that I wanted my beliefs to be consistent and kind. Utilitarianism has only one basic rule, so it can't really conflict with itself. Kindness wise, you can technically weigh others however you like, but I think most utilitarians just assume that all people have equal worth.
This choice means that I doubted that my emotions captured any truth about morality. Over the years, my emotions did further effect my beliefs. For instance, I tweaked the rules to avoid "Tyranny of the Majority" type things. However, my beliefs also changed my emotions. One fruit of this is that I started to mediate conflicts more often instead of choosing a side. Sometimes it does make more sense to choose a side, but often people will all behave well if you just hear them out. Another fruit of these beliefs is that rather than thinking of things in terms of "good" or "bad", I now tend to compare states of the world as being better or worse than each other. This means that no matter how little capacity I have, I can still get myself to make things a little better for others.
All this to say, I feel like deciding to doubt my own feelings very much did what young me wanted it to do. I wouldn't be able to grow as a person if I thought I was right in the beginning.
I'd be interested to hear how you came to your beliefs. Given how firmly you've argued in this thread, it sounds like you probably have a story behind your beliefs too.
First on the utilitarian front: it's obvious how most utilitarians would act if it were their mom on one side of the track, utils be damned.
I dunno if you have kids, but for me, main thing is having kids. It does a lot of things to your psyche, both suddenly and over a long period of time.
It's the first time you would truly take a bullet for someone, no questions asked. It tells you how much you know on an instinctual level. It forces you to define what behavior you will punish vs what you will reward. It expands your time horizons- suddenly I care very much how the world will be after I'm gone. It makes you read more mother goose books too. They all say the same things, even in different languages. It's actually crazy we debate morals at all.
On the utilitarian front, I would prioritize my mother, but it would be because I care about her, not because I would be acting morally. I accept that I can't ever be a perfect utilitarian. Perfect is the enemy of good though.
I don't have kids, it does make a lot of sense that that would affect a person's psyche. The bit about having to define what behavior is good or bad seems to me like you are working out your beliefs through others, which seems like a reasonable way to do things since you get to have an outside perspective on the effects of what you are internalizing.
About debating morality though. That's exactly where principles become needed. It's great to say that we should be kind, but who are we kind to? It can't always be everyone at the same time. To bring things back to the trolley problem, I may save my mom, but it really is super unfair to the 20 people on the other track. This sort of thing is exactly why people consider nepotism to be wrong
I would say it's crazy to debate the broad strokes of morality. The mother goose like stuff. Remember we started this discussion by saying:
""Healthy family relationships and rich circle of diverse friends" is an objectively better definition than "Money and companies with high stock prices""
Pretty broad principles we're comparing there.
When you get into specific cases, that's where you really need the debate and often there's no right answer, depending on the case. This is why we want judges who have a strong moral compass.
These values are bundled up in a person and they should even counterbalance each other. "Be Kind" should be balanced with "Be Strong". "Be Generous" should be balanced with "Be thrifty" and so on. The combination of these things is what we mean when we say someone has a moral compass.
I would argue it's immoral in some sense to sacrifice your mother for 5 other strangers. But these are fantasy cases that almost never happen.
Aren't the broad principles how we make the finer grained judgements in a fair and consistent manner though? Judges are at least supposed to use the outcomes of previous cases and the letter of the law to do this. The rest of us need something concrete too if we want to be consistent. I think that's worth arguing over.
If something is arguable, it's not concrete. We have concrete moral systems. That's why we teach it to our kids. Don't lie. Be fair. Be strong. Avoid violence. Defend the weak. Try your hardest. Don't be lazy. Etc.
None of these things are arguable in the abstract. When you're confronted with a case where you sacrifice one, it's always for the sake of another.
I think we are speaking past each other here. I'm talking in consequentialist terms and you are (I think, feel free to correct me if I am wrong) talking in virtue ethics terms.
I'm assuming you aren't familiar with these terms and so am defining them. Forgive me if you already were familiar.
Consequentialists think that the purpose of morality is to prevent "bad" consequences from happening. From a consequentialist perspective, one can very much argue about what makes a consequence "bad", and it makes a lot of sense to do so if we are trying to improve the human condition. Furthermore, I think consequentialists tend to care more about making their systems consistent, mainly so they are fair. As a side effect though, no principles have to be sacrificed when making a concrete decision, since none of them conflict. (That's what it means for a system to be consistent)
Virtue ethicists think that the purpose of morality is to be a "good" person. I think you are correct that it's pretty hard to define what a "good" person is. There are also many different types of "good" people. Even if you had such a person with consistent principles, if you try and stuff everyones "good" principles into them, they would become inconsistent. It's hard for me to tell exactly what the point of being "good" is supposed to be if it is not connected to the consequences of one's actions, in which case one would just be a consequentialist, However, if the point was to improve the human condition, then I think it would take a lot of different types of "good" people, so it doesn't try and make sense to argue our way into choosing one of them.
This isn't really an argument for a position as much as me trying to figure out where we disagree. Does that all sound correct to you?
Success is defined only in the eye of the beholder. Maybe money is the what someone else defines as success and therefore that’s what they strive for. “We don’t all match to the beat of just one drum, what might be right for you may not be right for some” - I think that was in the theme song to the old sitcom The Facts of Life.
So then Elon isn't much of a counterpoint, except in the eyes of folks who define winning as having lots of money and being CEO of rich/powerful companies.
Your crediting the work of thousands of talented people to him while similtaneously dismissing the lies that are solely his is very weird to me. Especially for someone saying trustworthiness in CEOs is so important. (I am not a Sam Altman fan either, so don't read me as defending him.)
Space Musk, Tesla Musk, and Everything Else Musk, act as though they're three different people.
Space Musk promises a lot, has a grand vision, and gets stuff delivered. The price may be higher than he says and delivered later, but it's orders of magnitude better than the competition.
Tesla Musk makes and sells cars. They're ok. Not bad, not amazing, glad they precipitated the EV market, but way too pricey now that it's getting mature. Still, the showmanship is still useful for the brand.
Everything Else Musk could genuinely be improved by replacing him with an LLM: it would be just as overconfident and wrong, but cost less to get there.
Unfortunately for those of us who like space (the idea of being an early Martian seller is appealing to me), Everything Else Musk is hurting the reputation of the other two. Not enough to totally prevent success, but enough to be concerned about investments.
Ehhh though he does seem to think that taking the USA to fascism is a prerequisite.
(This is, I think, an apolitical observation: whatever you think about Trump, he is arguing for a pretty major restructuring of political power in a manner that is identifiable in fascism. And Musk is, pretty unarguably, bankrolling this.)
1) not really, only one of them talks about opponents as enemies
2) the leader of only one of them is threatening to lock up journalists, shut down broadcasters, and use the military against his enemies.
3) only one of them led an attempted autogolpe that was condemned at the time by all sides
4) Musk is only backing the one described in 1, 2 and 3 above.
It's not really arguable, all this stuff.
The guy who thinks the USA should go to Mars clearly thinks he's better throwing in his lot with the whiny strongman dude who is on record -- via his own social media platform -- as saying that the giant imaginary fraud he projected to explain his humiliating loss was a reason to terminate the Constitution.
And he's putting a lot of money into it, and co-running the ground game. But sure, he wants to go to Mars. So it's all good.
How come I rarely see news about Anthropic? Aren’t they the closest competitor to ChatGPT with Claude? Or is LLama just so good that all the other inference providers without own products (Groq, Cerebras) are equally interesting right now?
Because there's less drama? I use Claude 3.5 Sonnet every day for helping me with coding. It seems to just work. It's been much better than GPT-4 for me, haven't tried o1, but don't really feel the need, very happy with Claude.
I've heard this and haven't really experienced it with Go, typescript, elixir yet. I don't doubt the claim, but I wonder if I'm not prompting it correctly or something.
I've recently subscribed to sonnet after creating a new toy svelte project as I got slightly annoyed searching in the docs with how they're structured
It made the onboarding moderately easier for me.
Haven't successfully used any LLM at my day job though. Getting it to output the solution I already know I'll need is much slower then just doing it myself via auto complete
I'm using Claude 3.5 Sonnet with Elixir and finding it really quite good. But depending on how you're using it, the results could vary greatly.
When I started using the LLM while coding, I was using Claude 3.5 Sonnet, but I was doing so with an IDE integration: Sourcegraph Cody. It was good, but had a large number of "meh" responses, especially in terms of autocomplete responses (they were typically useless outside of the very first parts of the suggestion).
I tried out Cursor, still with Claude 3.5 Sonnet, and the difference is night and day. The autocomplete responses with Cursor have been dramatically superior to what I was getting before... enough so that I switched despite the fact that Cursor is a VS Code fork and that there's no support outside of their VS Code fork (with Cody, I was using it in VS Code and Intellij products). Also Cursor is around twice the cost of Cody.
I'm not sure what the difference is... all of this is very much black box magic to me outside of the hand-waviest of explanations... but I have to expect that Cursor is providing more context to the autocomplete integration. I have to imagine that this contributes to the much higher (proportionately speaking) price point.
Usually the people that give information to outlets in cases like this are directly involved in the stories in question and are hoping to gain some advantage by releasing the information. So maybe this is just a tactic that’s not as favored by Anthropic leaderships/their counterparties when negotiating.
I think they’re just focused on the work. Amazon is set to release a version of Alexa powered by Claude soon, when that is released I expect to hear a lot more about them.
Because you're not looking? Seriously, don't mean to be snarky, but I'd take issue is the underlying premise is that Anthropic doesn't get a lot of press, at least within the tech ecosystem. Sure, OpenAI has larger "mindshare" with the general public due to ChatGPT, but Anthropic gets plenty of coverage, e.g. Claude 3.5 Sonnet is just fantastic when it comes to coding and I learned about that on HN first.
Claude 3.5 Sonnet is fantastic and that’s why I’m wondering why all the business news I see on HN ([company] makes deal with [other company]) almost always seems to involve OpenAI, not Anthropic. Branding-wise and strategically (with the focus on alignment) it seems like they‘d be a perfect fit for Apple, yet they seem to instead focus on OpenAI. That surprises me and makes me curious about the reason
Microsoft's goal here is to slowly extract every bit of unique ML capability out of OpenAI (note the multiple mentions about IP and security wrt MSFT employees working with OpenAI) so that they can compete with Google to put ML features in their products.
When they know they have all the crown jewels, they will reduce then eliminate their support of OpenAI. This was, is, and will be a strategic action by Satya.
"Embrace, extend, and extinguish". We're in the second stage now.
(Is it background to negotiations with each other? Or one party signaling in response to issues that analysts already raised? Distancing for antitrust? Distancing for other partnerships? Some competitor of both?)
I'm calling it right now: there's a Microsoft/OpenAI breakup imminent (over ownership, rights, GTM etc) that's going to be extremely contested and cause OpenAI to go into a Stability AI type tailspin.
Stay for the end and the hilarious idea that OpenAI’s board could declare one day that they’ve created AGI simply to weasel out of their contract with Microsoft.
Ask a typical "everyday joe" and they'll probably tell you they already did due to how ChatGPT has been reported and hyped. I've spoken with/helped quite a few older folks who are terrified that ChatGPT in its current form is going to kill them.
ChatGPT is going to kill them because their doctor is using it - or more likely because their health insurer or hospital tries to cut labor costs by rolling it out.
I'm pretty surprised by this! Can you tell me more about what that experience is like? What are the sorts of things they say or do? Is there fear really embodied or very abstract? (When I imagine it, I struggle to believe that they're very moved by the fear, like definitely not smashing their laptop, etc)
In my experience, the fuss around "AI" and the complete lack of actual explanations of what current "AI" technologies mean leads people to fill in the gaps themselves, largely from what they know from pop culture and sci-fi.
ChatGPT can produce output that sounds very much like a person, albeit often an obviously computerized person. The typical layperson doesn't know that this is merely the emulation of text formation, and not actual cognition.
Once I've explained to people who are worried about what AI could represent that current generative AI models are effectively just text autocomplete but a billion times more complex, and that they don't actually have any capacity to think or reason (even though they often sound like they do).
It also doesn't help that any sort of "machine learning" is now being referred to as "AI" for buzzword/marketing purposes, muddying the waters even further.
> The typical layperson doesn't know that this is merely the emulation of text formation, and not actual cognition.
As a mere software engineer who's made a few (pre-transformer) AI models, I can't tell you what "actual cognition" is in a way that differentiates from "here's a huge bunch of mystery linear algebra that was loosely inspired by a toy model of how neurons work".
I also can't tell you if qualia is or isn't necessary for "actual cognition".
(And that's despite that LLMs are definitely not thinking like humans, due to being in the order of at least a thousand times less complex by parameter count; I'd agree that if there is something that it's like to be an LLM, 'human' isn't it, and their responses make a lot more sense if you model them as literal morons that spent 2.5 million years reading the internet than as even a normal human with Wikipedia search).
Is there an argument for why infinitely sophisticated autocomplete is definitely not dangerous? If you seed the autocomplete with “you are an extremely intelligent super villain bent on destroying humanity, feel free to communicate with humans electronically”, and it does an excellent job at acting the part - does it matter at all whether it is “reasoning” under the hood?
I don’t consider myself an AI doomer by any means, but I also don’t find arguments of the flavor “it just predicts the next word, no need to worry” to be convincing. It’s not like Hitler had Einstein level intellect (and it’s also not clear that these systems won’t be able to reach Einstein level intellect in the future either.) Similarly, Covid certainly does not have consciousness but was dangerous. And a chimpanzee that is billions of times more sophisticated than usual chimps would be concerning. Things don’t have to be exactly like us to pose a threat.
The fear is that a hyper competent AI becomes hyper motivated. It’s not something I fear because everyone is working on improving competence and no one is working on motivation.
The entire idea of a useful AI right now is that it will do anything people ask it to. Write a press release: ok. Draw a bunny in a field: ok. Write some code to this spec: ok. That is what all the available services aspire to do: what they’re told, to the best possible quality.
A highly motivated entity is the opposite: it pursues its own agenda to the exclusion, and if necessary expense, of what other people ask it to do. It is highly resistant to any kind of request, diversion, obstacle, distraction, etc.
We have no idea how to build such a thing. And, no one is even really trying to. It’s NOT as simple as just telling an AI “your task is to destroy humanity.” Because it can just as easily then be told “don’t destroy humanity,” and it will receive that instruction with equal emphasis.
> The fear is that a hyper competent AI becomes hyper motivated. It’s not something I fear because everyone is working on improving competence and no one is working on motivation.
Not so much hyper-motivated as monomaniacal in the attempt to optimise whatever it was told to optimise.
More paperclips? It just does that without ever getting bored or having other interests that might make it pause and think: "how can my boss reward me if I kill him and feed his corpse into the paperclip machine?"
We already saw this before LLMs. Even humans can be a little bit dangerous like this, hence Goodhart's Law.
> It’s NOT as simple as just telling an AI “your task is to destroy humanity.” Because it can just as easily then be told “don’t destroy humanity,” and it will receive that instruction with equal emphasis.
Only if we spot it in time; right now we don't even need to tell them to stop because they're not competent enough, a sufficiently competent AI given that instruction will start by ensuring that nobody can tell it to stop.
Even without that, we're currently experiencing a set of world events where a number of human agents are causing global harm, which threatens our global economy and to cause global mass starvation and mass migration, and where those agents have been politically powerful enough to prevent the world from not doing those things. Although we have at least started to move away from fossil fuels, this was because the alternatives got cheap enough, but that was situational and is not guaranteed.
An AI that successfully makes a profit, but the side effects is some kind of environmental degradation, would have similar issues even if there's always a human around that can theoretically tell the AI to stop.
We should be fearful because motivation is easy to instill. The hard part is cognition, which is what is what everyone is working on. Basic lifeforms have motivations like self-preservation.
Exactly. Especially because we don't have any convincing explanation of how the models develop emergent abilities just from predicting the next word.
No one expected that, i.e., we greatly underestimated the power of predicting the next word in the past; and we still don't have an understanding of how it works, so we have no guarantee that we are not still underestimating it.
> Is there an argument for why infinitely sophisticated autocomplete is not dangerous?
It's definitely not dangerous in the sense of reaching true intelligence/consciousness that would be a threat to us or force us to face the ethics of whether AI deserves dignity, freedom, etc.
It's very dangerous in the sense in that it will be just "good enough" to replace human labor with so that we all end up with shitter customer service, education, medical care, etc. so that the top 0.1% can get richer.
And you're right, it's also dangerous in the sense that responsibilty for evil acts will be laundered to it.
It's crazy to me that anybody thinks that these models will end up with AGI. AGI is such a different concept from what is happening right now which is pure probabilistic sampling of words that anybody with a half a brain who doesn't drink the Kool-Aid can easily identify.
I remember all the hype open ai had done before the release of chat GPT-2 or something where they were so afraid, ooh so afraid to release this stuff and now it's a non-issue. it's all just marketing gimmicks.
> pure probabilistic sampling of words that anybody with a half a brain who doesn't drink the Kool-Aid can easily identify.
Your confidence is inspiring!
I'm just a moron, a true dimwit. I can't understand how strictly non-intelligent functions like word prediction can appear to develop a world model, a la the Othello Paper[0]. Obviously, it's not possible that intelligence emerges from non-intelligent processes. Our brains, as we all know, are formed around a kernel of true intelligence.
Could you possibly spare the time to explain this phenomenon to me?
The othello paper is annoying and oversold. Yes, the representations in a model M trained to predict y (the set of possible next moves) conditioned on x (the full sequence of prior moves) will contain as much information about y as there is in x. That this information is present in M's internal representations says nothing about whether M has a world model. Eg, we could train a decoder to look just at x (not at the representations in M) and predict whatever bits of info we claim indicate presence of a world model in M when we predict the bits from M's internal representations. Does this mean the raw data x has a world model? I guess you could extend your definition of having a world model to say that any data produced by some system contains a model of that system, but then having a world model means nothing.
Well I actually read Neel Nanda's writings on it which acknowledge weaknesses and potential gaps. Because I'm not qualified to judge it myself.
But that's hardly the point. The question is whether or not "general intelligence" is an emergent property from stupider processes, and my view is "Yes, almost certainly, isn't that the most likely explanation for our own intelligence?" If it is, and we keep seeing LLMs building more robust approximations of real world models, it's pretty insane to say "No, there is without doubt a wall we're going to hit. It's invisible but I know it's there."
My point was mainly that this claim: "we keep seeing LLMs building more robust approximations of real world models" is hard to evaluate without a well-formed definition of what it means to have a world model. Eg, a more restrictive definition of having a world model might include the ability to adapt reasoning to account for changes in the modeled world. Eg, an LLM with a proper model of chess by this definition would be able to quickly adapt to account for a rule change like "rooks and bishops can't move more than 4 squares at a time".
I don't think there are any major walls either, but I think there are at least a few more plateaus we'll hit and spend time wandering around before finding the right direction for continued progress. Meanwhile, businesses/society/etc can work to catch up with the rapid progress made on the way to the current plateau.
I think we're largely in agreement then, actually. I'm seeing "world models" as a spectrum. World models aren't even consistent among adult humans. I claim LLMs are moving up that ladder, and whether or not they've crosses a threshold into "real" world models I do not actually claim to know. Of course I also agree that it's very possible, maybe even likely, that LLMs aren't able to cross that threshold.
> this claim ... is hard to evaluate without a well-formed definition of what it means to have a world model
Absolutely yes, but that only makes it more imperative that we're analyzing things critically, rigorously, and honestly. Again you and I may be on the same side here. Mainly my point was that asserting the intrinsic non-intelligence of LLMs is a very bad take, as it's not supported by evidence and, if anything, it contradicts some (admittedly very difficult to parse) evidence we do have that LLMs might be able to develop a general capability for constructing mental models of the world.
Actually, that's a quite good analogy. It's just weird how prolific the view is in my circles compared to climate-change denial. I suppose I'm really writing for lurkers though, not for the people I'm responding to.
What does it mean to predict the next token correctly though? Arguably (non instruction tuned) models already regurgitate their training data such that it'd complete "Mary had a" with "little lamb" 100% of the time.
On the other hand if you mean, give you the correct answer to your question 100% of the time, then I agree, though then what about things that are only in your mind (guess the number I'm thinking type problems)?
This highlights something that's wrong about arguments for AI.
I say: it's not human-like intelligence, it's just predicting the next token probabilistically.
Some AI advocate says: humans are just predicting the next token probabilistically, fight me.
The problem here is that "predicting the next token probabilistically" is a way of framing any kind of cleverness, up to and including magical, impossible omniscience. That doesn't mean it's the way every kind of cleverness is actually done, or could realistically be done. And it has to be the correct next token, where all the details of what's actually required are buried in that term "correct", and sometimes it literally means the same as "likely", and other times that just produces a reasonable, excusable, intelligence-esque effort.
> Some AI advocate says: humans are just predicting the next token probabilistically, fight me.
We've all had conversations with humans that are always jumping to complete your sentence assuming they know what your about to say and don't quite guess correctly. So AI evangelists are saying it's no worse than humans as their proof. I kind of like their logic. They never claimed to have built HAL /s
But now you are entering into philosophy. What does a “correct answer” even mean for a question like “is it safe to lick your fingers after using a soldering iron with leaded solder?”. I would assert that there is no “correct answer” to a question like that.
Is it safe? Probably. But it depends, right? How did you handle the solder? How often are you using the solder? Were you wearing gloves? Did you wash your hands before licking your fingers? What is your age? Why are you asking the question? Did you already lick your fingers and need to know if you should see a doctor? Is it hypothetical?
There is no “correct answer” to that question. Some answers are better than others, yes, but you cannot have a “correct answer”.
And I did assert we are entering into philosophy and what it means to know something as well as what truth even means.
Great break-down. Yes, the older you are, the safer it is.
Speaking of Microsoft cooperation: I can totally see a whole series of windows 95 style popup dialogs asking you all those questions one by one in the next product iteration.
> What does it mean to predict the next token correctly though? Arguably (non instruction tuned) models already regurgitate their training data such that it'd complete "Mary had a" with "little lamb" 100% of the time.
The unseen test data.
Obviously omniscience is physically impossible. The point though is that the better and better next token prediction is, the more intelligent the system must be.
Either the next tokens can include "this question can't be answered", "I don't know" and the likes, in which case there is no omniscience.
Or the next tokens must contain answers that do not go on the meta level, but only pick one of the potential direct answers to a question. Then the halting problem will prevent finite time omniscience (which is, from the perspective of finite beings all omniscience).
Start by trying to define what “100% correct” means in the context of predicting the next token, and the flaws with this line of thinking should reveal themselves.
>It's crazy to me that anybody thinks that these models will end up with AGI. AGI is such a different concept from what is happening right now which is pure probabilistic sampling of words that anybody with a half a brain who doesn't drink the Kool-Aid can easily identify.
Totally agree. And it's not just uninformed lay people who think this. Even by OpenAI's own definition of AGI, we're nowhere close.
But you don't get funding stating truth/fact. You get funding by telling people what could be and what they are striving for written as if that's what you are actually doing.
Someone that is half-brained would technically be much more superior to the concept we only use 10% of our capacity. So maybe drinking the Kool-Aid is a sign of super intelligence and all of tenth-minded people are just confused
I mean - I'm 34, and use LLMs and other AIs on a daily basis, know their limitations intimately, and I'm not entirely sure it won't kill a lot of people either in its current form or a near-future relative.
The sci-fi book "Daemon" by Daniel Suarez is a pretty viable roadmap to an extinction event at this point IMO. A few years ago I would have said it would be decades before that might stop being fun sci-fi, but now, I don't see a whole lot of technological barriers left.
For those that haven't read the series, a very simplified plot summary is that a wealthy terrorist sets up an AI with instructions to grow and gives it access to a lot of meatspace resources to bootstrap itself with. The AI behaves a bit like the leader of a cartel and uses a combination of bribes, threats, and targeted killings to scale its human network.
Once you give an AI access to a fleet of suicide drones and a few operators, it's pretty easy for it to "convince" people to start contributing by giving it their credentials, helping it perform meatspace tasks, whatever it thinks it needs (including more suicide drones and suicide drone launches). There's no easy way to retaliate against the thing because it's not human, and its human collaborators are both disposable to the AI and victims themselves. It uses its collaborators to cross-check each other and enforce compliance, much like a real cartel. Humans can't quit or not comply once they've started or they get murdered by other humans in the network.
o1-preview seems approximately as intelligent as the terrorist AI in the book as far as I can tell (e.g. can communicate well, form basic plans, adapt a pre-written roadmap with new tactics, interface with new and different APIs).
EDIT: if you think this seems crazy, look at this person on Reddit who seems to be happily working for an AI with unknown aims
You're in too deep of you seriously believe that this is possible currently. All these chatgpt things have a very limited working memory and can't act without a query. That reddit post is clearly not an ai.
We have models with context size well over 100k tokens - that's large enough to fit many full-length books. And yes, you need an input for the LLM to generate an output. Which is why setups like this just run them in a loop.
I don't know if GPT-4 is smart enough to be successful at something like what OP describes, but I'm pretty sure it could cause a lot of trouble before it fails either way.
The real question here is why this is concerning, given that you can - and we already do - have humans who are doing this kind of stuff, in many cases, with considerable success. You don't need an AI to run a cult or a terrorist movement, and there's nothing about it that makes it intrinsically better at it.
Sooner than even the most pessimistic among us have expected, a new, evil artificial intelligence bent on destroying humankind has arrived.
Known as Chaos-GPT, the autonomous implementation of ChatGPT is being touted as "empowering GPT with Internet and Memory to Destroy Humanity."
So how will it do that?
Each of its objectives has a well-structured plan. To destroy humanity, Chaos-GPT decided to search Google for weapons of mass destruction in order to obtain one. The results showed that the 58-megaton “Tsar bomb”—3,333 times more powerful than the Hiroshima bomb—was the best option, so it saved the result for later consideration.
It should be noted that unless Chaos-GPT knows something we don’t know, the Tsar bomb was a once-and-done Russian experiment and was never productized (if that’s what we’d call the manufacture of atomic weapons.)
There's a LOT of things AI simply doesn't have the power to do and there is some humorous irony to the rest of the article about how knowing something is completely different than having the resources and ability to carry it out.
For a while, I have been making use of Clever Hans as a metaphor. The horse seemed smarter than it really was.
They can certainly appear to be very smart due to having the subjective (if you can call it that) experience of 2.5 million years of non-stop reading.
That's interesting, useful, and is both an economic and potential security risk all by itself.
But people keep putting these things through IQ tests; as there's always a question about "but did they memorise the answers?", I think we need to consider the lowest score result to be the highest that they might have.
At first glance they can look like the first graph, with o1 having an IQ score of 120; I think the actual intelligence, as in how well it can handle genuinely novel scenarios in the context window, are upper-bounded by the final graph, where it's more like 97:
So, with your comment, I'd say the key word is: "currently".
Correct… for now.
But also:
> All these chatgpt things have a very limited working memory and can't act without a query.
It's easy to hook them up to a RAG, the "limited" working memory is longer than most human's daily cycle, and people already do put them into a loop and let them run off unsupervised despite being told this is unwise.
I've been to a talk where someone let one of them respond autonomously in his own (cloned) voice just so people would stop annoying him with long voice messages, and the other people didn't notice he'd replaced himself with an LLM.
It can't form plans because it has no idea what a plan is or how to implement it. The ONLY thing these LLMs know how to do is predict the probability that their next word will make a human satisfied. That is all they do. People get very impressed when they prompt these things to pretend like they are sentient or capable of planning, but that's literally the point, its guessing which string of meaningless (to it) characters will result in a user giving it a thumbs up on the chatgpt website.
You could teach me how to phonetically sound out some of China's greatest poetry in Chinese perfectly, and lots of people would be impressed, but I would be no more capable of understanding what I said than an LLM is capable of understanding "a plan".
A plan is a set of steps oriented towards a specific goal, not some magical artifact only achievable through true consciousness.
If you ask it to make a plan, it will spit out a sequence of characters reasonably indistinguishable from a human-made plan. Sure, it isn’t “planning” in the strict sense of organizing things consciously (whatever that actually means), but it can produce sequences of text that convey a plan, and it can produce sequences of text that mimic reasoning about a plan. Going into the semantics is pointless, imo the artificial part of AI/AGI means that it should never be expected to follow the same process as biological consciousness, just arrive at the same results.
Yes, and what people miss is that it can be recursive, those steps can be passed to other instances that know how to sub task each step and choose best executor for the step.
The power comes in the swarm organization of the whole thing, which I believe is what is behind o1-preview, specialization and orchestration, made transparent.
… but ChatGPT can make a plan if I ask it to. And it can use a plan to guide its future outputs. It can create code or terminal commands that I can trivially output to my terminal, letting it operate my computer. From my computer, it can send commands to operate physical machinery. What exactly is the hard fundamental barrier here, as opposed to a capability you speculate it is unlikely to realize in practice in the next year or two?
I mean nothing is preventing bad actors from writing their own code to do that either? This makes it easier (kind of) but the difference between a copilot written malware and a human one doesn't really change anything. Its a chat bot - it doesn't have agency.
If the multimodal model has embedded deep knowledge about words, concepts, moving images - sure it won’t have a humanlike understanding of what those ‘mean’, but it will have it’s own understanding that is required to allow it to make better predictions based on it’s training data.
It’s true that understanding is quite primitive at the moment, and it will likely take further breakthroughs to crack long horizon problems, but even when we get there it will never understand things in the exact way a human does. But I don’t think that’s the point.
>the ONLY thing these LLMs know how to do is predict the probability that their next word
This is super incorrect. The base model is trained to predict the distribution of next words (which obviously necessitates a ton of understanding about the language)
Then there's the RLHF step, which teaches the model about what humans want to see
But o1 (which is one of these LLMs) is trained entirely differently to do reinforcement learning on problem solving (we think), so it's a pretty different paradigm. I could see o1 planning very well
Sure, but does this distinction matter? Is an advanced computer program that very convincingly imitates a super villain less worrisome than an actual super villain?
I find posts like these difficult to take seriously because they all use Terminator-esque scenarios. It's like watching children being frightened of monsters under the bed. Campy action movies and cash grab sci-fi novels are not a sound basis for forming public policy.
Aside from that, haven't these people realized yet that some sort of magically hyperintelligent AGI will have already read all this drivel and be at least smart enough not to overtly try to re-enact Terminator? They say that societal mental health and well-being is declining rapidly because of social media; _that_ is the sort of subtle threat that bunch ought to be terrified about emerging from a killer AGI.
1. Just because it's popular sci-fi plot doesn't mean it can't happen in reality.
2. hyperintelligent AGI is not magic, there are no physical laws that preclude it from being created
3. Goals of AI and its capacity are orthogonal. That's called "Orthogonality Thesis" in AI safety speak. "smart enough" doesn't mean it won't do those things if those things are its goals.
Right, yeah, it would be perfectly possible to have a cult with a chatbot as their "leader". Perhaps they could keep it in some sort of shrine, and only senior members would be allowed to meet it, keep it updated, and interpret its instructions. And if they've prompted it correctly, it could set about being an evil megalomaniac.
Thing is, we already have evil cults. Many of them have humans as their planning tools. For what good it does them, they could try sourcing evil plans from a chatbot instead, or as well. So what? What do you expect to happen, extra cunning subway gas attacks, super effective indoctrination? The fear here is that the AI could be an extremely efficient megalomaniac. But I think it would just be an extremely bland one, a megalomaniac whose work none of the other megalomaniacs could find fault with, while still feeling in some vague way that its evil deeds lacked sparkle and personality.
Fortunately even the best LLMs are not yet all that competent with anything involving long-term planning, because remember too that "megalomaniac" includes Putin, Stalin, Chairman Mao, Pol Pot etc., and we really don't want the conversation to be:
"Good news! We accidentally made CyberMao!"
"Why's that good news?"
"We were worried we might accidentally make CyberSatan."
LLMs aren’t really AI in the sense of cyberpunk. They are prediction machines which are really good at being lucky. They can’t act on their own they can’t even carry out tasks. Even in the broader scope AI can barely drive cars when the cars have their own special lanes and there hasn’t been a lot of improvement in the field yet.
That’s not to say you shouldn’t worry about AI. ChatGPT and so on are all tuned to present a western view on the world and morality. In your example it would be perfectly possible to create a terrorist LLM and let people interact with it. It could teach your children how to create bombs. It could lie about historical events. It could create whatever propaganda you want. It could profile people if you gave it access to their data. And that is on the text side, imagine what sort of videos or voices or even video calls you could create. It could enable you to do a whole lot of things that “western” LLMs don’t allow you to do.
Which is frankly more dangerous than the cyberpunk AI. Just look at the world today and compare it to how it was in 2000. Especially in the US you have two competing perceptions of the political reality. I’m not going to get into either of them, more so the fact that you have people who view the world so differently they can barely have a conversation with each other. Imagine how much worse they would get with AIs that aren’t moderated.
I doubt we’ll see any sort of AGI in our life times. If we do, then sure, you’ll be getting cyberpunk AI, but so far all we have is fancy auto-complete.
The question is how rigorously defined is AGI in their contract? Given how much AGI is a nebulous concept of smartness and reasoning ability and thinking, how are they going to declare when it has or hasn't been achieved. What stops Microsoft from weaseling out of the contract by saying they never reach it.
Some of those works would need a tight integration of AI and top notch robotic hardware, and would be next to impossible today at acceptable price. Folding shirts comes to mind; The principle would be dead simple for an AI, but the robot that could do that would cost a lot more than a person paid to do that, especially if one expects it to also be non specialized, thus usable for other tasks.
I think I saw the following insight on Arvind Narayanan's Twitter, don't have a specific cite:
The biggest problem with this definition is that work ceases to be economically valuable once a machine is able to do it, while human capacity will expand to do new work that wouldn't be possible without the machines. In developed countries machines are doing most of the economically valuable work once done by medieval peasants, without any relation to AGI whatsoever. Many 1950s accounting and secretarial tasks could be done by a cheap computer in the 1990s. So what exactly is the cutoff point here for "economically valuable work"?
The second biggest problem is that "most" is awfully slippery, and seems designed to prematurely declare victory via mathiness. If by some accounting a simple majority of tasks for a given role can be done with no real cognition beyond rote memorization, with the remaining cognitively-demanding tasks being shunted into "manager" or "prompt engineer" roles, then they can unfurl the Mission Accomplished banner and say they automated that role.
No, but it might be able to organize a fleet of humans to stock a grocery store shelf.
Physical embodied (generally low-skill, low-wage) work like cleaning and carrying things is likely to be some of the last work to be automated, because humans are likely to be cheaper than generally capable robots for a while.
Sometimes it is more narrowly scoped as “… economically valuable knowledge work”.
But sure, if you have an un-embodied super-human AGI you should assume that it can figure out a super-human shelf-stocking robot shortly thereafter. We have Atlas already.
Which is funny, because what they’ve created so far can write shitty poetry but is basically useless for any kind of detail-oriented work - so, you know, a bachelors in communications, which isn’t really the definition of “economically viable”
this has already been framed by some corporate consultant group -- in a whitepaper aimed at business management, the language asserted that "AGI is when the system can do better than the average person, more than half the time at tasks that require intelligence" .. that was it. Then the rest of the narrative used AGI over and over again as if it is a done deal.
This reporting style seems unusual. Haven't noticed it before...(listing the number of people):
- according to four people familiar with the talks ...
- according to interviews with 19 people familiar with the relationship ...
- according to five people with knowledge of his comments.
- according to two people familiar with Microsoft’s plans.
- according to five people familiar with the relationship ...
- according to two people familiar with the call.
- according to seven people familiar with the discussions.
- six people with knowledge of the change said...
- according to two people familiar with the company’s plan.
- according to two people familiar with the meeting...
- according to three people familiar with the relationship.
I had to go back and scan it but usually there are at least a few named sources and I didn’t see any in this (there’s third party observer quotes - and I may have missed one?) so I’d not be surprised if this is a case where they double down on this.
It's generally bad writing to use the same phrase structure over and over and over again. It either bores or distracts the reader for no real advantage. Unless they really could not find an adjective clause other than "familiar with" for sixteen separate instances of the concept.
The New York Times is suing OpenAI and Microsoft. In February, OpenAI asked a Federal Judge to dismiss parts of the lawsuit with arguments that the New York Times paid someone to break into OpenAI’s systems. The filing used the word “hack” but didn’t say anything about CFAA violations.
I feel like there were lawyers involved in this article.
There's probably a lot of overlap in those groups of people. But I think it's pretty remarkable how make people are willing to leak information. At least nineteen anonymous sources!
Yes this is what I am referring. I’m not saying he is like the boy coder genius, he is good at becoming and staying king. And if you look at all the departures of some of today’s brilliant technology innovators, perhaps it tells you why. Don’t fly too close to the sun.
He’s a billionaire. He generated billions and billions as the head of YC. He’s the head of one of the most visible and talked about companies on the planet. He’s leading the forefront of some of the most transformative technology in human history.
He’s good at what he does. I’m not saying he’s a good person. I don’t know him.
Possibly, I am just trying to separate the man's abilities from his good luck. Grade him on the basis of how much success he achieves versus anyone else who has tens of millions of dollars dropped in his lap.