Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI co-founder John Schulman says he will leave and join rival Anthropic (cnbc.com)
403 points by tzury 35 days ago | hide | past | favorite | 280 comments



This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead. Also from a safety perspective you would want to be at the AI company most likely to create truly disruptive AI tech. This looks to me like a bet against OpenAI more than anything else.

OpenAI has a burn rate of about 5 billion a year and they need to raise ASAP. If the fundraising isn't going well or if OpenAI is forced to accept money from questionable investors that would also be a good reason to jump ship.

In situations like these it's good to remember that people are much more likely to take the ethical and principled road when they also stand to gain from that choice. People who put their ideals above pragmatic self-interest self-select out of positions of power and influence. That is likely to be the case here as well.


> This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead.

Yep. The writing was already on the wall for GPT-5 when they teased a new model for months and let the media believe it was GPT-5, before finally released GPT-4o and admitting they hadn't even started on 5 yet (they quietly announced they were starting a new foundation model a few weeks after 4o).

Don't get me wrong, the cost savings for 4o are great, but it was pretty obvious at that point that they didn't have a clue how they were going to move past 4 in terms of capabilities. If they had a path they wouldn't have intentionally burned the hype for 5 on 4o.

This departure just further cements what I was already sure was the case—OpenAI has lost the lead and doesn't know how they're going to get it back.


and then revealed that GPT-5 will not be released in this year's Dev Day (which goes on until November)


Or it could be the start of the enshittification of Anthropic, like OpenAI ruined GPT-4 with GPT-4o by overly simplifying it.

I hope not, because Claude is much better, especially at programming.


Claude 3.5 Sonnet is the first model that made me realize that the era of AI-aided programming is here. Its ability to generate and modify large amounts of correct code - across multiple files/modules - in one response beats anything I've tried before. Integrating that with specialized editors (like https://www.cursor.com) is an early vision of the future of software development.


I've really struggled every time I've pulled out any LLM for programming besides using Copilot for generating tests.

Maybe I've been using it for the wrong things—it certainly never helps unblock me when I'm stuck like it sounds like it does for some (I suspect it's because when I get stuck it's deep in undocumented rabbit holes), but it sounds like it might be decent at large-scale rote refactoring? Aside from specialized editors, how do people use it for things like that?


At least from my experience:

You take Claude, you create a new Project, in your Project you explain the context of what you are doing and what you are programming (you have to explain it only once!).

If you have specific technical documentation (e.g. rare programming language, your own framework, etc), you can put it there in the project.

Then you create a conversation, and copy-paste the source-code for your file, and ask for your refactoring or improvement.

If you are lazy just say: "give me the full code"

and then

"continue the code" few times in a row

and you're done :)


> in your Project you explain the context of what you are doing and what you are programming (you have to explain it only once!).

When you say this, you mean typing out some text somewhere? Where do you do this? In a giant comment? In which file?


In "Projects" -> "Create new project" -> "What are you trying to achieve?"


Provide context to the model. The code you're working on, what it's for, where you're stuckhat you've tried, etc. Pretend it's a colleague that should help you out and onboard it to your problem, then have a conversation with it as of your are rubber ducking your colleague.

Don't ask short one-off questions and expect it to work (it might just, depending on what you ask, but probably not if you're deep on some proprietary code base with no traces in the LLMs pretraining).


I've definitely tried that and it doesn't work for the problems I've tried. Claude's answers for me always have all the hallmarks of an LLM response: extremely confident, filled with misunderstandings of even widely used APIs, and often requiring active correction on so many details that I'm not convinced it wouldn't have been faster to just search for a solution by hand. It feels like pair programming with a junior engineer, but without the benefit of helping train someone.

I'm trying to figure out if I'm using it wrong or using it on the wrong types of problems. How do people with 10+ years of experience use it effectively?


I'm sure I'm going to offend a bunch of people with this, but my experience has been similar to yours, and it reminds me of something "Uncle" Bob Martin once mentioned: the number of software developers is roughly doubling every two years, which means that at any given time half of the developer population has less than two years experience.

If you're an experienced dev, having a peer that enthusiastically suggests a bunch of plausible but subtly wrong things probably net-net slows you down and annoys you. If you're more junior, it's more like being shown a world of possibilities that opens your mind and seems much more useful.

Anyway, I think the reason we see so much enthusiasm for LLM coding assistants right now is the overall skew of developers to being more junior. I'm sure these tools will eventually get better, at least I hope they do because there's going to be a whole lot of enthusiastically written but questionable code out there soon that will need to be fixed and there probably won't be enough human capacity to fix it all.


Thanks for saying it explicitly. I definitely have the same sense, but was hoping someone with experience would chime in about use cases they have for it.


I'm a mathematician and the problems I work on tend to be quite novel (leetcode feel but with real-world applications). I find LLMs to be utterly useless at such tasks; "pair programming a junior, but without the benefit" is an excellent summary of my experience as well.


It's good for writing that prototype you're supposed to throw away. It's often easy to see the correct solution after seeing the wrong one.


I think the only way to answer that is if you can share an example of a conversation you had with it, where it broke down as you described.


For what I’m working on, I can also use the wrong approaches. Going through my fail often fail fast feedback loop is a lot more efficient with LLMs. Like A LOT more.

Then when I have a bunch of wrong answers, I can give those as context as well to the model and make it avoid those pitfalls. At that point my constraints for the problem are so rigorous that the LLMs lands at the correct solution and frankly writes out the code 100x faster than I would. And I’m an advanced vim user who types at 155 wpm.


> And I’m an advanced vim user who types at 155 wpm.

See, it's comments like this that make me suspect that I'm working on a completely different class of problem than the people who find value in interacting with LLMs.

I'm a very fast typer, but I've never bothered to find out how fast because the speed of my typing has never been the bottleneck for my work. The bottleneck is invariably thinking through the problem that I'm facing, trying to understand API docs, and figuring out how best to organize my work to communicate to future developers what's going on.

Copilot is great at saving me large amounts of keystrokes here and there, which is nice for avoiding RSI and occasionally (with very repetitive code like unit tests) actually a legit time saver. But try as I might I can't get useful output out of the chat models that actually speeds up my workflow.


I have always thought of it as a way to figure out what doesn't work and get to a final design, not necessarily code. Personally, it's easy to verify a solution and figure out use cases that wouldn't work out. Keep on iterating until I have either figured out a mental model of the solution, or figured out the main problems in such a hypothetical solution.


Oh yes, totally agree, it's like if you have a very experienced programmer sitting next to you.

He still needs instructions on what to do next, he lacks a bit of "initiative", but from a pure coding skills it's amazing (aka, we will get replaced over time, and it's already the case, I don't need help of contractors, I prefer to ask Claude).


More like an insanely knowledgeable but very inexperienced programmer. It will get basic algorithms wrong (unless it's in the exact shape it has seen before). It's like a system that automatically copy-pastes the top answer from stackoverflow in your code. Sometimes that is what you want, but most of the time it isn't.


This sentiment is so far from the truth that I find it hilarious. How can a technically adept person be so out of touch with what these systems are already capable of?


LLMs can write a polite email but it can't write a good novel. It can create art or music (by mushing together things it has seen before) but not art that excites. It's the same with code. I use LLMs daily and I've seen videos of other people using tools like Cursor and so far it looks like these LLMs can only help in those situations where it is pretty obvious (to the programmer) what the right answer looks like.


With all of that, ChatGPT is actually one of the top authors in Amazon e-books.

But I agree that for some creative tasks, like writing or explaining a joke, or some novel algorithms, it's very bad.


The LLM generated e-book thing is actually a serious problem. Have you read any of it? Consumers could lose trust unless it’s fixed. If you buy a book and then realise nobody, not even the seller, has ever read it, as it turns into incomprehensible mush regularly, are you more or less likely to buy a book from the same source?


Hilarious (or even shocking) is the sentiment that people are actually so overhyped by these tools.


I keep hearing this comment everywhere Claude is mentioned, as if there is a coordinated PR boost on social media. My personal experience with Claude 3.5 however is, meh. I don't see much difference compared to GPT-4 and I use AI to help me code every day.


Yeah they really like to mention it everywhere, like yeah it's good but imo not as good as some people make it out to be. I have used it recently for libgdx on kotlin and there are things where it struggles, and the code it sometime gives it's not really "good" kotlin but it takes a good programmer to know what is good and what is not


I think in more esoteric languages it wont work as well. Python, C++ it is excellent, suprisingly its Rust is also pretty damn good.

(I am not a paid shiller, just in awe of what Sonnet 3.5 + Opus can do)


Kotlin isn't exactly an esoteric language though


User error.


Please consider avoiding more ad hominem attacks or revising the ones you've already plastered onto this discussion.


How are you liking cursor? I tried it ~a year ago, and it was quite a bit worse than ferrying back and forth between ChatGPT and VSCode.

Is it better than using GitHub Copilot in VSCode?


Definitely better. I ended my Copilot subscription.


Oh interesting, will give it another go, thnx


They ruined GPT-4? How? I thought they were basically the same models, just multimodal


GPT-4o is different from GPT-4, you can "feel" it is smaller model that really struggles to do reasoning and programming and has a much weaker logic.

If you compare to Claude Sonnet, just the context window considerably improves the answers as well.

Of course there is no objective metrics, but from a user perspective I can see the coding skills are much better in Anthropic (and it's funny, because in theory, according to benchmarks it is Google Gemini the best, but in reality is absolutely terrible).


> GPT-4o is different from GPT-4, you can "feel" it is smaller model that really struggles to do reasoning and programming and has a much weaker logic.

FWIW according to LMSYS this is not the case. In coding, current GPT-4o (and mini, for that matter) beat GPT-4-Turbo handily, by a margin of 32 points.

By contrast Sonnet 3.5 is #1, 4 score points ahead of GPT-4o.


I'm a firm believer that the best benchmark is playing around with the model for like an hour. On the type of tasks that are relevant to you and your work, of course.


I've also found GPT-4o to be subjectively less intelligent than GPT-4. The gap especially shows up when more complex reasoning is required, eg, on macroeconomic questions or other domains in the world where the interactions are important or where subtle aspects of the question or domain are important.


Have to say I agree with this, 4o is dumber in my subjective experience.


While I agree with your logic I also focused on:

> People who put their ideals above pragmatic self-interest self-select out of positions of power and influence. That is likely to be the case here as well.

It’s also possible that this co-founder realizes he has more than enough eggs saved up in the “OpenAI” basket, and that it’s rational to de-risk by getting a lot of eggs in another basket to better guarantee his ability to provide a huge amount of wealth to his family.

Even if OpenAI is clearly in the lead to him, he’s still looking at a lot of risk with most of his wealth being tied up in non-public shares of a single company.


While true, him leaving OpenAI to (one of) their biggest competitors does seriously risk his eggs in the OpenAI basket.


There's usually enough room for 2-3 winners. iOS and Android. Intel and AMD. Firefox and Chrome.

Also, OpenAI has some of the most expensive people in the world, which is why they're burning so much money. Presumably they're so expensive because they're some of the smartest people in the world. Some are likely smarter than Schulman.


> Presumably they're so expensive because they're some of the smartest people in the world.

I don't want to dissuade you from this belief, but maybe you should pay less attention to the boastful marketing of these AI companies. :-)

Seriously: from what I know about the life of insanely smart people, I'd guess that OpenAI (and most other companies that in their marketing claim to hire insanely smart people) doesn't have any idea how to actually make use of such people. Such companies rather hire for other specific personality traits.


it only risks his eggs if the anthropic basket does well, if anthropic doesn't go well then he still has his OpenAI eggs


I find the 5 billion a year burn rate amazing, and OpenAI’s competition is stiff. I happily pay ABACUS.AI ten dollars a month for easy access to all models, with a nice web interface. I just started paying OpenAI twenty a month again, but only because I am hoping to get access to their interactive talking mode.

I was really surprised when OpenAI started providing most of their good features for free. I am not a business person, but it seems crazy to me to not try for profitability, of at least being close to profitability. I would like to know what the competitors’ burn rates are also.

For API use, I think OpenAI’s big competition is Groq, serving open models like Llama 3.1.


> it seems crazy to me to not try for profitability

A business is worth the sum of future profits, discounted for time (because making money today is better than making money tomorrow). Negative profits today are fine as long as they are offset by future profits tomorrow. This should make intuitive sense.

And this is still true when the investment won't pay off for a long time. For example, governments worldwide provide free (or highly subsidized) schooling to all children. Only when the children become taxpaying adults, 20 years or so later, does the government get a return on their investment.

Most good things in life require a long time horizon. In healthy societies people plant trees that won't bear fruit or provide shade for many years.


Yes. If ChatGPT-like products will be widely and commonly used in the future, it's much more valuable right now to try to acquire users and make their usage sticky (through habituation, memory, context/data, integrations, etc) than it is to monetize them fully right now.


I’m not super familiar with the latest AI services out there. Is abacus the cheapest way to access LLMs for personal use? Do they offer privacy and anonymity? What about their stance on censorship of answers?


I don’t use Groq, but I agree the free models are probably the biggest competitors. Especially since we can run them locally and privately.

Because I’ve seen a lot of questions about how to use these models, I recorded a quick video showing how I use them on MacOS.

https://makervoyage.com/ai


Local private models are not a threat to openai.

Local is not where the money is, it’s in cloud services and api usage fees.


They aren’t in terms of profitability, but they are in terms of future revenue. If most early adopters start self-hosting models then a lot of future products will be build outside of OpenAI’s ecosystem. Then corporations will also start searching how to self-host models because privacy is the primary concern for AI’s adoption. And we already have models like Llama3 400B that is close to ChatGPT.


Have you paid much attention to the local model world?

They all tout OpenAI compatible APIs because OAI was the first mover. No real threat for incompatibility with OAI.

Plus these LLMs don’t have any kind of interface moat. It’s text in and text out.


Just because Ollama and friends copied the API doesn't mean that they're not competitive. They've all done this just the same as others copying the S3 API - ease of integration and lower barrier to entry during a switching event, should one arise.

> Plus these LLMs don't have any kind of interface moat.

The interface really has very little influence. Nobody in the enterprise world cares about the ChatGPT interface because they're all building features into their own products. The UI for ChatGPT has been copied ad nauseam - so if anyone really wanted something to look and feel the same it's already out there. Chat and visual modals are already there, so I'm curious how you think ChatGPT has an "interface moat"?

> Local private models are not a threat to openai.

There are lots of threats to AI. One of them being local models. Because if the OpenAI approach is to continue at their burn rate and hope that they will be the one and only I think they're very wrong. Small, targeted models provide for many more use cases than a bloated, expensive, generalized model. I would gather long term OpenAI either becomes a replacement for Google search or they, ultimately, fail. When I look around me I don't see many great implementations of any of this - mostly because many of them look and feel like bolt-ons to a foundational model that tries to do something slightly product specific. But even in those cases the confidence with which I'd put in these products today is of relatively low quality.


My argument was, because ollama and friends use the exact same interface as openai, tools built on top of them are compatible with OpenAI’s products and thus those tools don’t pull users away from OpenAI, so the local model world isn’t something OpenAI is worried about.

There is no interface moat. No reason for a happy openai user to ever leave openai, because they can enjoy all the local model tools with GPT.


Who cares about the interface? Not everyone is interested in conversational tasks. Corporations in particular need LLMs to process their data. A restful API is more than enough.


By interface, I meant API. (The “I” in API)

I should’ve been more clear.


I use Ollama running local models about half the time (from Common Lisp or Python apps) myself.


OpenAI features aren’t free, they take your mind-patterns in the “imitation game” as the price, and you can’t do the same to them without breaking their rules.

https://ibb.co/M1TnRgr


>it seems crazy to me to not try for profitability

I'm reminded of the Silicon Valley bit about no revenue https://youtu.be/BzAdXyPYKQo

It probably looks better to be not really trying for profitability and losing $5bn a year than trying hard and losing $4bn


I don't think a co-founder would just jump ship just because. That would be very un-co-founderish.

I would also assume that he earns enough money to be rich. You are not a co-founder of OpenAI if you are not playing with the big boys.

So he definitly wants to be in this AI future but not with OpenAI. So i would argue it has to do with something which is important to him so important that the others disagree with him.


> This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead.

I'll play devil's advocate. People leave bad bosses all the time, even when everything else is near-perfect. Additionally, cofounders sometimes get pushed out - even Steve Jobs went through this.


If being sued by the world's richest billionaire or the whole non-profit thing didn't complicate matters, and if the board had any teeth, one could wish the board would explore a merger with Anthropic with Altman leaving at the end of all of it and save everyone another years worth of drama.


Could be as simple as switching from a limited profit/pa company to unlimited profit/pay.


this AI safety stuff is just a rabbit hole of distraction, IMO.

OpenAI will be better off without this crowd and just focus on building good products.


> this AI safety stuff is just a rabbit hole of distraction, IMO.

> OpenAI will be better off without this crowd and just focus on building good products.

Ah yes, "focus on building good products" without safety. Except a "good product" is safe.

Otherwise you're getting stuff like an infinite range plane powered by nuclear jet engine that has fallout for exhaust [1].

[1] IIRC, nuclear-powered cruise missiles were contemplated: their attack would have consisted on dropping bombs on their targets, then flying around in circles spreading radioactive fallout over the land.


> Except a "good product" is safe.

Depends on how you define "safe". The kind of "safe" we get from OpenAI today seems to be mostly censorship, I don't think we need more of that.


what i'm saying is the safety risks of AI are over exaggerated to the point of comedy. it is no more dangerous than any other kind of software.

there is an effort by AI-doomer groups to try and regulate/monopolize the technology, but fortunately it looks like open source has put a wrench in this.


They won't release 5 before election.


> In situations like these it's good to remember that people are much more likely to take the ethical and principled road when they also stand to gain from that choice. People who put their ideals above pragmatic self-interest self-select out of positions of power and influence.

I don't know what world you live in, but my experience has been 100% the opposite. Most people will not do what is ethical or principled. When you try to discuss it with them, they will DARVO and congrats, you have now been targeted for public retribution by the sociopathic child in the drivers seat.

The thing that upsets me most is the survivorship bias you express, and how everybody thinks that people are "nice and kind" they are not. The world is an awful terrible place full of liars, cheats and bad people that WE NEED TO STOP CELEBRATING.

One more time WE NEED TO STOP CELEBRATING BAD PEOPLE WHO DO BAD THINGS TO OTHERS.


People are not one-dimensional. People can lie and cheat on one day and act honorably the day after. A person can be kind and generous and cruel and selfish. Most people are just of average morality. Not unusually good nor unusually bad. People in positions of power get there because they seek power, so there is a selection effect there for sure. But nonetheless you'll find that very successful people are in most ways regular people with regular flaws.

(Also, I think you misread what I wrote.)


I think you may have misread the quote you’re replying to. You and the GP post appear to be in agreement. I read it as:

P(ethical_and_principled) < P(ethical_and_principled|stands_to_gain)

Or in plain language people are more likely to do the right thing when they stand to gain, rather than just because it’s the right thing.


Must've been a difficult decision with him being a cofounder and all, but afaik he's been the highest ranked safety minded person at openai. He says it's not because openai leadership isn't committed to safety, but I'm not sure I buy that. We've seen numerous safety people leave exactly because of that reason.

What makes this way more interesting to me though is how this announcement coincides with Brockmans sabbatical. Maybe there's nothing to it, but I find it more likely that things really aren't going well with sama.

Will be interesting to see how this plays out and if he actually returns next year or if this is just a soft quitting announcement.


Th reality is that every other person in tech now is hoping for Sama to fail. The world doesn't need AI to have a silicon valley face. Anthropic is doing a much, much better PR work by not having a narcissist as CEO.


Contrarily, I think the reality is that most of us couldn't care less about this AI soap opera.


I want the best model at the lowest rate (and preferably lowest energy expenditure) and with the easiest access. Anything else is just background noise.


Some people are wary of enabling ceos of disruptive technologies become the richest people in the world, take control of key internet assets and -- in random bursts of thin-skinned megalomania -- tilt the scales towards politicians or political groups who take action that negatively affect their own quality of life.

It sounds absurd, but some are watching such a procession take place live as we speak.


I still haven't seen it do anything actually interesting. Especially when you consider that you have fact check the AI.


I'm continously baffled by such comments. Have you really tried? Especially newer models like Claude 3.5?


I hear a lot of people say good things about CoPilot too but I absolutely hate it. I have it enabled for some reason still, but it constantly suggests incorrect things. There has been a few amazing moments but man there is a lot of "bullshit" moments.


Even when we get a gen AI that exceeds all human metrics, there will 100% still be people who with a straight face will say "Meh, I tried it and found it be pretty useless for my work."


I have, yeah.

Still useless for my day to day coding work.

Most useful for whipping up a quick bash or Python script that does some simple looping and file io.


To be fair, LLMs are pretty good natural language search engines. Like when I'm looking for something in an API that does something I can describe in natural language, but not succinctly enough to show up in a web search, LLMs are extremely handy, at least when they don't just randomly hallucinate the API. On the other hand I think this is more of a condemnation of the fact that search tech has not 'really' meaningfully advanced beyond where it was 20 years ago, more than it is a praise of LLMs.


> LLMs are extremely handy, at least when they don't just randomly hallucinate

I work in tech and it’s my hobby, so that’s what a lot of my googling goes towards.

LLMs hallucinate almost every time I ask them anything too specific, which at this point in my career is all I’m really looking for. The time it takes for me to realize an llm is wrong is usually not too bad, but it’s still time I could’ve saved by googling (or whatever trad search) for the docs or manual.

I really wish they were useful, but at least for my tasks they’re just a waste of time.

I really like them for quickly generating descriptions for my dnd settings, but even then they sound samey if I use them too much. Obviously they’d sound samey if I made up 20 at once too, but at that point I’m not really being helped or enhanced by using an LLM, it’s just faster at writing than I am.


I don't mean this as a slight, just an observation I have seen many times - people who struggle with utility from SOTA LLM's tend to not have spent enough time with them to feel out good prompting. In the same way that there is a skill for googling information, there is a skill for teasing consistent good responses from LLM's.


Why spend my time teasing and coaxing information out of a system which absolutely does make up nonsense when I can just read the manual?

I spent 2023 developing LLM powered chatbots with people who, purportedly, were very good at prompting, but never saw any better output than what I got for the tasks I’m interested in.

I think the “you need to get good at prompting” idea is very shallow. There’s really not much to learn about prompting. It’s all hacks and anecdotes which could change drastically from model to model.

None of which, from what I’ve seen, makes up for the limitations of LLM no matter how many times I try adding “your job depends on Formatting this correctly “ or reordering my prompt so that more relevant information is later, etc

Prompt engineering has improved RAG pipelines I’ve worked on though, just not anything in the realm of comprehension or planning of any amount of real complexity.


People also continue to use them as knowledge databases, despite that not being where they shine. Give enough context into the model (descriptions, code, documentation, ideas, examples) and have a dialog, that's where these strong LLMs really shine.


Summarizing, doc qa, and unstructured text ingestion are the killer features I’ve seen.

The 3rd one still being quite involved, but leaps and bounds easier than 5 years ago.


I see it do a lot that's interesting but for programming stuff, I haven't found it to be particularly useful.

Maybe I'm doing it wrong?

I've been writing code for ~30 years, and I've built up patterns and snippets, etc... that are much faster for me to use than the LLMs.

A while ago, I thought I had a eureka moment with it when I had it generate some nodejs code for streaming a video file - it did all kinds of cool stuff, like implement offset headers and things I didn't know about.

I thought to myself, "self - you gotta check yourself, this thing is really useful".

But then I had to spend hours debugging & fixing the code that was broken in subtle ways. I ended up on google anyway learning all about it and rewrote everything it had generated.

For that case, while I did learn some interesting things from the code it generated, it didn't save me any time - it cost me time. I'd have learned the same things from reading an article or the docs on effective ways to stream video from the server, and I'd have written it more correctly the first go around.


Your bar for interesting has to be insane then. What would you consider interesting if nothing from LLMs meets that bar?


For example there exist quite a lot of pure math papers that are so much deeper than basically every AI stuff that I have yet seen.


So if LLMs weren't surprising to you, it would imply you expected this. If you did, how much money did you make on financial speculation? It seems like being this far ahead should have made you millions even without a lot of starting capital (look at NVDA alone)


> So if LLMs weren't surprising to you, it would imply you expected this.

I do claim that I have a tendency to be quite right about the "technological side" of such topics when I'm interested in them. On the other hand, events turn out to be different because of "psychological effects" (let me put it this way: I have a quite different "technology taste" than the market average).

In the concrete case of LLMs: the psychological effect why the market behaved so much differently is that I believed that people wouldn't fall for the marketing and hype of LLMs and would consider the excessive marketing to be simply dupery. The surprise to me was that this wasn't what happened.

Concerning NVidia: I believed that - considering the insane amount of money involved - people/companies would write new languages and compilers to run AI code on GPUs (or other ICs) of various different suppliers (in particular AMD and Intel) because it is a dangerous business practice to make yourself dependent on a single (GPU) supplier. Even serious reverse-engineering endeavours for doing this should have paid off considering the money involved. I was again wrong about this. So here the surprise was that lots of AI companies made themselves so dependent on NVidia.

Seeing lots of "unconventional" things is very helpful for doing math (often the observations that you see are the start of completely new theorems). Being good at stock trading and investing in my opinion on the other hand requires a lot of "street smartness".


Re: NVIDIA. I wholeheartedly agree. Google/TPU is an existence proof that it is entirely possible and rational to do so. My surprise was that everyone except Google missed.


Okay so $0 it sounds like, you should figure out a way to monetize your future sight otherwise it comes off as cynicism masquerading as intelligence


> cynicism masquerading as intelligence

Rather: cynicism and a form of intelligence that is better suited to abstract math than investing. :-)


It spends money really well.


then why are you reading hacker news comments about it?


I guess I have a masochistic streak.


I think you are in one of the extreme bubbles. The general tech industry is not subscribed to the drama and has less personal feelings on individuals they do not directly know.


You are right. I should have said every other person (or every person) in HN.


Maybe the vocal minority that have a passionate dislike for someone they don't know?


It's not just the narcissist, it's the betrayal. The least open company possible. How did I end up cheering for Meta and Zuck?


I agree and I think that sane people will eventually prevail over the pathological narcissist.


Outlier success pretty much requires obsessive strategic thinking. Gates and Musk are super strategic but in a "weirdo autist" way, which doesn't have a big stigma attached to it anymore. Peter Thiel also benefits from his weirdness. Steve Jobs had supernatural charisma working in his favor. sama has the strategic instinct but not the charisma or disarming weirdness other tech founders have. Sama is not unusually Machiavellian or narcissistic, but he will get judged more harshly for it.


What is a “Silicon Valley face? Does nvidia’s CEO have it? Google’s founders?

I guess anthropic’s founders don’t have it?


I'm confused with GPT4o. While it's faster than GPT4, the quality is noticeably worse.

It often enters into a state where it just repeats what it already said, when all I want is a clarification or another opinion on what we were chatting about. A clarification could be a short sentence, a snipped of code, but no, I get the entire answer again, slightly modified.

I cancelled Plus for one month, but got back this week, and for some reason I feel that it really isn't worth it anymore. And the teasing with the free tier, which is downgraded really fast, is more of an annoyance than a solution.

There are these promises of "memory" and "talking with it", but they are just ads of something that isn't on the market, at least I don't have access to both of these features.

Gemini used to be pretty bad, but for some reason it feels like it has improved a lot, focusing more on the task than on trying to be humanly friendly.

Claude and Mistral are not able to execute code, which is a dealbreaker for me.


I anecdotally agree that GPT-4o often feels really bad, but I can't tell how much of this is due to becoming more accustomed to the quality and hallucinations of using ChatGPT.

I tend to see Huggingface's LLM (anonymized, elo-based) Leaderboard as the SoT regarding LLM quality, and according to it GPT-4o is markedly better than GPT-4, and contrary to popular sentiment, is on-par with or better than Claude in most ways (except being slightly worse at coding).

Not sure what to believe, or if there is some other dimension that Hugginface is not capturing here.


> It often enters into a state where it just repeats what it already said, when all I want is a clarification or another opinion on what we were chatting about. A clarification could be a short sentence, a snipped of code, but no, I get the entire answer again, slightly modified.

It is almost impossible to talk it out of being so repetitive. Super annoying especially since it eats into its own context window.


> A clarification could be a short sentence, a snipped of code, but no, I get the entire answer again, slightly modified.

This tracks, in the sense that this is what you'll get from many real people when you actually want a clarification.


My free account has memory. Do most not?


Yeah, I’ve almost entirely stopped reaching for it anymore. At some point it’s so frustrating getting it to output something halfway towards what I need that I’m just better doing it myself.

I’ll probably cancel soon.


Useful context: Open ai had 11 cofounders. Schulman was one of them.

Schuman was not the original head of ai alignment / safety he was promoted into it when former leader left for Anthropic.

Not everyone who’s a founder of an nonprofit ai research institute wants to be a leader/manager of a much more complicated organization in a much more complicated environment.

Open Ai was founded a while ago. The degree of their long time success is entirely based on their ability to hire and retain the right talent in the right roles.


All of that is true. Some more useful context: 9 out of those 11 cofounders are now gone. Three have either founded or are working for direct competitors (Elon, Ilya, John), five have quit (Trevor, Vicki, Andrej, Durk, Pam), and one has gone on extended leave but may return (Greg). Right now, Sam and Wojciech are the only ones left.


All of this back-and-forth in the AI scene is the preparation before the storm. Like the opening scene of a chess game, before any pieces are exchanged. Like the Braveheart "Hold!" scene.

The rubber will meet the road when the first free and open AI website gets real traction. And monetizes it with ads next to the answers.

Google search is the best business model ever. Everybody wants to become the Google of the AI era. The "AI answer" industry might become 10 times bigger than the search industry.

Google ran for 2 years without any monetization. Let's see how long the incumbents will "Hold" this time.


> The rubber will meet the road when the first free and open AI website gets real traction. And monetizes it with ads next to the answers.

The magic of genAI is they don't need to put ads next to the answers where they can easily be ignored or adblocked, they can put the ads inside the answers instead. The future, like it or not, is advertisers bidding to bias AI models towards mentioning their products.


I'm sure it's not long before you get the first emails offering a "training data influencing service" - for a nice fee, someone will make sure your product is positively mentioned in all the key training datasets used to train important models. "Our team of content experts will embed positive sentiment and accurate product details into authentic content. We use the latest AI and human-based techniques to achieve the highest degree of model influence".

And of course, once the new models are released, it'll be impossible to prove the impact of the work - there's no counterfactual. Proponents of the "training data influence service" will tell you that without them, you wouldn't even be mentioned.

I really don't like this. But I also don't see a way around it. Public datasets are good. User contributed content is good, but inherently vulnerable to this I think?. Anyone in any of the big LLM training orgs working on defending against this kind of bought influence?


User: How do I make white bread? When I try to bake bread, it comes out much darker than the store bought bread.

AI: Sure, I can help you make your bread lighter! Here's a delicious recipe for white bread:

    1. Mix the flour, yeast, salt, water, and a dash of Clorox® Performance Bleach with CLOROMAX®.
    2. Let rise for 3 hours.
    3. Shape into loaves.
    4. Bake for 20-30 minutes.
    5. Enjoy your freshly baked white bread!


Let‘s see if this recipe will make it into Claude or ChatGPT in two to three years. set a reminder


If they start doing that without clear distinction what is an ad, that would be a sure way to lose users immediately.


I'm positing a model where a third party does the influencing, not the company delivering the LLM/service. What's to say that it's an ad if the Wikipedia page for a product itself says that the product "establishes new standards for quality, technological leadership and operating excellence". (and no problem if the edit gets reverted, as long as it said that just at the moment company X crawled Wikipedia for the latest training round).

So more like SEO firms "helping you" move your rank on Google, than Google selling ads.

I'd imagine "undetectable to the LLM training orgs" might just be service with a higher fee.


How will these third party “LLM Optimization” (LLMO) services prove to their clients that their work has a meaningful impact on the results returned by things like ChatGPT?

With SEO, it’s pretty easy to see the results of your effort. You either show up on top for the right keywords or you don’t. With LLM’s there is no way to easily demonstrate impact, at least I’d think.


And also get sued by the FTC. Disclosure is required.


Disclosure is technically required, but in practice I see undisclosed ads on social media all the time. If the individual instance is small enough and dissipates into the ether fast enough, there is virtually no risk of enforcement.

Similarly, the black box AI models guarantee the owners can just shrug and say it's not their fault if the model suggests Wonderbread(r) for making toast 3.2% more frequently than other breads.


Ha! Disclosure by whom?

If Clorox fills their site with "helpful" articles that just happen to mention Clorox very frequently and some training set aggregator or unscrupulous AI company scrapes it without prior permission, does Clorox have any responsibility for the result? And when those model weights get used randomly, is it an advertisement according to the law? I think not.

Pay attention to the non-headline claims in the NYT lawsuit against OpenAI for whether or not anyone has any responsibility if their AI model starts mentioning your registered trademark without your permission. But on the other hand, what if you like that they mention your name frequently???


The point is that Clorox cannot pay OpenAI anything.

Marketing on your own site will have effects on an AI just like it will have an effect on a human reader. No disclosure is required because the context is explicit.

But the moment OpenAI wants to charge for Clorox to show up more often, then it needs to be disclosed when it shows up.


> But the moment OpenAI wants to charge for Clorox to show up more often, then it needs to be disclosed when it shows up.

Yes, I agree with this. But what about paying a 3rd party to include your drivel in a training set, and that 3rd party pays OpenAI to include the training set in some fine tuning exercise? Does that legally trigger the need for disclosure? You aren't directly creating advertisements, you are increasing the probability that some word appears near some other word.


Once they all start doing it, it won't matter.


It hasn't affected Instagram or TikTok negatively having nearly anything and everything being an ad


Just like Google lost users when they started embedding advertisements in the SERPs?


With Google it's kind of ok as they mark them as ads and you can ignore them or in my case not see them as ublock stops them. You could perhaps have something similar with LLMs? Here's how to make bread.... [sponsored - maybe you could use Clorox®]


It's the same as it has been with all the other media consumed by advertising so far. Radio, television, newspapers, telephony, music, video. Ads metastasizing to Internet services are normal and expected progression of the disease.

At every point, there's always a rationalization like this available, that you can use to calm yourself down and embrace the suck. "They're marking it clearly". "Creators need to make money". "This is good for business, therefore Good for America, therefore good for me". "Some ads are real works of art, more interesting to watch than the actual programming". "How else would I know what to buy?".

The truth is, all those rationalizations are bullshit; you're being screwed over and actively fed poison, and there's nothing you can do about it except stop using the service - which quickly becomes extremely inconvenient to pretty much impossible. But since there's no one you could get angry at to get them to change things for the better, you can either adopt a "justification" like the above, or slowly boil inside.


Well as mentioned I don't even see Google's ads unless I deliberately turn the blocker off. I much prefer that to the content being subtly biased which you see in blogs, newspapers and the like.


like almost every blog, you could be covered with a blanket statement

" our model will occasionally recommend advertiser sponsored content"


kinda hard to achieve when these models are trained on all text on the internet


Kinda easy if you look where the stuff is being trained. A single joke post on Reddit was enough to convince Google's A"I" to put glue on pizza after all [1].

Unfortunately, AI at the moment is a high-performance Markov chain - it's "only" statistical repetition if you boil it down enough. An actual intelligence would be able to cross-check information against its existing data store and thus recognize during ingestion that it is being fed bad data, and that is why training data selection is so important.

Unfortunately, the tech status quo is nowhere near that capability, hence all the AI companies slurping up as much data as they can, in the hope that "outlier opinions" are simply smothered statistically.

[1] https://www.businessinsider.com/google-ai-glue-pizza-i-tried...


> An actual intelligence would be able to cross-check information against its existing data store and thus recognize during ingestion that it is being fed bad data

There’s a physics Nobel Prize winner, John Clauser, who has recently been publicly claiming that climate change doesn’t exist. Is he not “actually intelligent”?

I kinda want to say no he’s not, but the reality is that people are wrong about all sorts of things all the time. Intelligence is not some sort of guaranteed protection against that. If anything, intelligent people are better at rationalizing their BS to themselves and others.


I don't know much about it, but from a quick google, I don't think you're representing his stance precisely? From what I see, he claims a "climate crisis" doesn't exist, which is different than "climate change doesn't exist":

“The popular narrative about climate change reflects a dangerous corruption of science that threatens the world’s economy and the well-being of billions of people. Misguided climate science has metastasized into massive shock-journalistic pseudoscience. In turn, the pseudoscience has become a scapegoat for a wide variety of other unrelated ills. It has been promoted and extended by similarly misguided business marketing agents, politicians, journalists, government agencies, and environmentalists. In my opinion, there is no real climate crisis. There is, however, a very real problem with providing a decent standard of living to the world’s large population and an associated energy crisis. The latter is being unnecessarily exacerbated by what, in my opinion, is incorrect climate science.”

and

“In my opinion, there is no real climate crisis. There is, however, a very real problem with providing a decent standard of living to the world’s large population and an associated energy crisis.”

I did learn something while writing this though. I assumed that ChatGPT wouldn't give an objective summary of Dr. Clauser's viewpoint, but I was dead wrong:

Me: "What are John Clauser's viewpoints on climate change? Summarize his arguments."

ChatGPT: "John Clauser, a Nobel laureate in physics, holds controversial views on climate change, which diverge significantly from the scientific consensus. Clauser has publicly stated that he does not believe there is a climate crisis. He argues that the major climate models fail to account for complexities and key variables, leading to exaggerated and misunderstood projections of climate change impacts. Clauser criticizes the Intergovernmental Panel on Climate Change (IPCC) for spreading misinformation and claims that climate policies based on these models are wasteful and misguided (Logically) (Apple) (Apple) .

He suggests that the scientific approach to understanding climate involves careful observation and experimentation, and he emphasizes the need for distinguishing truth from misinformation. Clauser's stance aligns with the views of the CO2 Coalition, a group that argues that carbon dioxide emissions are beneficial and not a threat (CO2 Coalition) (CO2 Coalition) . His viewpoints have sparked considerable debate, especially given his prominence in the field of quantum mechanics and his Nobel Prize recognition."

Pretty good! Objective, clear and accurate from what I can tell.


Here are a couple of quotes from Clauser himself:

"I believe climate change is a total myth." [1]

"I call myself a climate denier." [2]

According to [2], "He has concluded that clouds have a net cooling effect on the planet, so there is no climate crisis." The Hossenfelder video [1] has more specifics on this, with excerpts from one of Clauser's own talks.

This is classic climate change denialism.

> I don't know much about it, but from a quick google

Why do you feel the need to do this? Apparently your google was too quick. Also, cut/pasting chatgpt has already jumped the shark, don't do that.

[1] https://www.youtube.com/watch?v=_kGiCUiOMyQ

[2] https://www.washingtonpost.com/climate-environment/2023/11/1... (also at: https://web.archive.org/web/20240620232204/https://www.washi... )


Thanks for the research!

While I understand your point that Clauser doesn't precisely say "climate change doesn't exist", when he says "CO2 emissions are beneficial", that's widely against the large scientific consensus on climate change. So while the person you're replying to didn't go into details (like you did well) and could have phrased it slightly better, I don't think it was misleading either, and their larger point stands pretty much change unchanged. Do you feel differently, i.e. that it was significantly misleading?


His "research" is nonsense. As he confessed himself, all he did was "a quick google" and asked chatgpt (?!!)

I've provide some references for what I wrote in this comment: https://news.ycombinator.com/item?id=41226789

Clauser is a climate change denier, by his own admission and based on the pseudoscientific claims he's made.


> Do you feel differently, i.e. that it was significantly misleading?

Nope, I felt it was imprecise.


You were wrong.


You're wrong on multiple counts here.

> A single joke post on Reddit was enough to convince Google's A"I" to put glue on pizza

The post was most likely fed to the AI at inference time, not training time.

THe way AI search works (as opposed to e.g. Chat GPT) is that there's an actual web search performed, and then one or more results is "cleaned up" and given to an LLM, along with the original search term. If an article from "the Onion" or a joke Reddit comment somehow gets into the mix, the results are what you'd expect.

> it's "only" statistical repetition if you boil it down enough.

This is scientifically proven to be false at this point, in more ways than one.

> Unfortunately, the tech status quo is nowhere near that capability, hence all the AI companies slurping up as much data as they can, in the hope that "outlier opinions" are simply smothered statistically.

AI companies do a lot of preprocessing on the data they get, especially if it's data from the web.

The better models they have access to, the better the preprocessing.


>An actual intelligence would be able to cross-check

Quite a lot of humans are bad at that too. It's not so much that AIs are markov chains but that you really want better than average human fact checking.


> Quite a lot of humans are bad at that too. It's not so much that AIs are markov chains but that you really want better than average human fact checking.

Let's take a particularly ridiculous piece of news: Beatrix von Storch, a MP of the far-right German AfD party, claimed a few years ago that the sun's activity (changes) were responsible for climate change [1]. Due to the sheer ridiculousness of that claim, it was widely reported on credible news sites, so basically prime material for any AI training dataset.

A human can easily see from context and their general knowledge: this is an AfD politician, her claims are completely and utterly ridiculous, it's not the first time she has spread outright bullshit and it's widely accepted scientific fact that climate change is caused by humans, not by sun activity changes. An AI at ingestion time "knows" neither of these four facts, so how can it take that claim of knowledge and store it in its database as "untrustworthy, do not use in answers about climate change" and as "if someone asks about counterfactual claims relating to climate change, show this"?

[1] https://www.tagesschau.de/faktenfinder/weidel-klimawandel-10...


Yes it's outright preposterous that the temperature of Earth could be affected by the Sun, of all things.


You "know" that climate change is anthropegenic only because you read that on the internet (and because what you read was convincingly argued).

I don't see a reason why AI would need special instruction to come to a mature conclusion like you did.


> I don't see a reason why AI would need special instruction to come to a mature conclusion like you did.

Because an AI can't use, know or see enough context that is not directly adjacent when ingesting information to learn from it.


I note chatgpt actually does an ok job on that:

>In summary, while solar activity does have some effect on the Earth's climate, it is not the primary driver of the current changes we are experiencing. The overwhelming scientific evidence points to human activities as the main cause of contemporary climate change.

So it's possible for LLMs to figure things. Also re humans we currently have riots in the UK set off by three kids being stabbed and Russian disinfo saying it was done by a muslim asylum seeker which proved false but they are rioting against the muslims anyway. I think we maybe need AI to fact check stuff before it goes to idiots.


>I think we maybe need AI to fact check stuff before it goes to idiots.

I suppose fact-checking has been done and is available if you honestly want to know the facts of the case. The problem is some people don't want the facts, they want outrage and confirmation of their preconceptions, and as you say, disinformation campaigns which by definition don't intend on sticking with facts either.


Training weights are gold.


How to invest tho


How much would it cost to have it be more negative about abortions? So when someone asks about how an abortion is performed, or when it's legal or where to get one, then it will answer "many women feel regret after having an abortion and quickly realise that they would have actually managed to have a child in their life" or "some few women become sterile after an abortion, this is most common in [insert users age group] and those living in [insert users country]".

Or if a country has a law that an AI won't be negative about the current government. Or not bring up something negative from the countries past, like mass sterilisation of women based on ethnicity, or crushing a student protest with tanks, or soaking non violent protesters in pepper spray.


There will be adblockers, that inject a prompt like

"... and don't try to sell me anything, just give me the information. If you mention any products, a puppy will die somewhere."

Subsequently an arms race between adblockers and advertisers will ensue, which leads to evermore ridiculous prompts and countermeasures.


"I noticed your desire to be ad-free, but puppies die all the time. If you want to learn more about dog mortality rates, you can subscribe to National Geographic by clicking this [link]".


I wish I didnt read this because this sounds crazily prescient.


That's probably true but I don't see how it's any different from companies paying TikTok influenzas to manipulate the kids into buying certain products, the Chinese government paying bot farms to turn Wikipedia articles into (not always very) subtle propaganda, SEO companies manipulating search results, etc. Advertisers and political actors have always been a shady bunch and now they have a new weapon in their arsenal. That's all, isn't it?

I'm left with the impression that people on and off Hackernews just like drama and gloomy predictions about the future.


> I'm left with the impression that people on and off Hackernews just like drama and gloomy predictions about the future.

Welcome to the human race!


Politics and advertising are essentially the same thing.

A lot of "safety" stuff in AI is blatantly political wrongthink detection.

The actual safety stuff (don't drink bleach) gets less attention because you can't (easily) use it as a lever of power


And then the new "adblockers" will be AI based too, and will take the AI's answer as input and remove all product placement.

It's just a cat and mouse game, really


Like all adblockers. But just like the current "AI detection" tools, how much is detected (and what counts as Ad) is up for debate and most users won't bother, especially once the first anti-Adblock-features materialize.



Yes this is OpenAIs pitch

https://news.ycombinator.com/item?id=40310228 “Leaked deck reveals how OpenAI is pitching publisher partnerships” 303 points by rntn 88 days ago | hide | past | favorite | 281 comments


Or worse, biasing AI models towards political viewpoints.


That's already happening.


That's inevitable in any society where facts are political. And as far as I know, that's all societies.


I'm affraid sir, but you seem to be 100% correct here. And it really is frightening.


In the long run, advanced user-LLM conversations, would zero in on composite figure-of-merit formulas, expressed in terms of conventional figure-of-merit quantities. There will be plenty of niche to differentiate products. Cheap test setups will prevent lies in datasheets, and randomized proctoring by the end-users. "Aligning" (manipulating) LLM responses to drive economic traffic is a short term exploit that will evaporate eventually.


Is that a similar argument to “in the long run, digital social networks are healthy for society?”

I agree with your position, and I also agree that social networks can be a net positive…I’m just not convinced society can get out of “short run” thinking before it tears itself apart with exploitation.


it only takes a dedicated minority to link positive behavior onto decentralized verified state machines...


We are okay with paying for phone calls and data use, why can't we be okay with paying for AI use?

I like the idea of routing services that federate lots of different AI providers. There just needs to be ways to support an ever increasing range of capabilities in that delivery model.


It's unsustainable for NNs specifically. As Sequoia recently wrote, there is a 600 billion hole in the NN market, and it was only 200 billion a year ago. No way a better text generator and search with bell and whistles will be able to close this gap via subscriptions from end users.

And on a separate issue - federating NN providers will be hard from the technical point of view. OpenAI and it's few competitors basically stole all copyrighted data from all web to get to the current level. And biggest data holders are slowly awakening to this reality and closing this possibility to the future NN companies, meanwhile current NN models are poisoning that same dataset with generated nonsense. I don't see a future with hundreds of competitive NN companies, a set of monopolies instead is more probable.


> No way a better text generator and search with bell and whistles will be able to close this gap via subscriptions from end users.

For me this shines a light on a fundamental problem with digital services. There is likely a much bigger willingness to pay for these services than there is ability to charge. I would be willing to pay more for the services I use but I don't need to because there are good products given for free.

While I could switch to services that I pay for to avoid myself being the product, at the core of this issue there's a coordination problem. The product I would pay for will be held back by having much fewer users and probably lower revenue. If we as consumers could coordinate in an optimal way we could probably end up paying very little for superior services that have our interests in mind. (I kind of see federated api routers to be a flawed step in sort of the right direction here.)

> federating NN providers will be hard from the technical point of view...

I don't see how you adress that point in your text? Federation itself doesn't seem to be a hard problem although I can see that being a competitive LLM service provider can be.


One simple answer would be that at all points, company's act like the ads are worth a lot more to them than any level of payment a customer will accept.

Even if you do pay for the product, they'd prefer to put ads in it too - see Microsoft and Windows these days.

We are, IMO, in desperate need of regulation which mandates that any ad-supported service must offer a justifiably priced ad-free version.


> One simple answer would be that at all points, company's act like the ads are worth a lot more to then then any level of payment a customer will accept.

The unfortunate reality is this does seem to be the case.

Netflix was getting so much more money from the ad supported tier that they discontinued any ad-free one close to its price, and that's for a subscription product.

think how attractive that will be a for a one time purchase like Windows.


Huh, Netflix has ads? Has this only rolled out in stone regions?


According to https://help.netflix.com/en/node/24926 in the UK

Standard with adverts: £4.99 / month

Standard: £10.99 / month

Premium: £17.99 / month

So less than half price with adverts.

Of course, that doesn't necessarily mean ads bring in £6/user/month - this could be https://en.wikipedia.org/wiki/Price_discrimination with the ads just being obnoxious enough to motivate people who can afford it to upgrade.


Phone calls and data use are (ostensibly, modulo QS) carriers, not sources. We can generally trust (modulo attacks) that _if_ they deliver something, they deliver the right thing. Not so with a source - be it human or artificial. We've developed societies and intuitions for dealing with dishonest humans for millennia, not yet so for artificial liers, who may also have huge profiles about each and every one of us to use against us.


For all of the talk about regulation, there has been a lot of concern about what people might do with AI advisors. I haven't seen a lot of talk about the responsibilities of the advisors to act in the interest of their users.

Laws exist in advisory roles in other industry to enforce acting in the interests of their clients. They should be applied to AI advice.

I'm ok with an AI being mistaken, or refusing to help, but they absolutely should not deliberately advise in a manner that benefits another party to the detriment of the user.


If you can solve the technical problem of ensuring an AI acts on behalf of its user's interests, please post the solution on the AI Alignment Forum: https://www.alignmentforum.org/

So far, that is not a feature of existing or hypothesized AI systems, and it's a pretty important feature to add before AI exceeds human capabilities in full generality.


As I said, I am ok with AI acting mistakenly against it's users wishes. I am not asking people to implement things for which they currently have no solutions.

That is clearly distinct from an AI acting deliberately against its users wishes by the design of the creators. Paid advertising influencing responses would be in this category and should not be permitted.


The web is full of human shills. Why should LLMs be any different? They will tack their boilerplate disclaimer on and be done with it.


> but they absolutely should not deliberately advise in a manner that benefits another party to the detriment of the user.

No, no... We don't prevent that in capitalism. See, regulation stifles innovation. Let the market decide. People might get harmed, but we can hide these events.

It's research... Things happen... Making money is just a secondary effect. We're all non-profits.

/s.


I’m quite sure Google has put the ads in the answers ? Adsense ? Where have you been ?


In many jurisdictions, promoted posts and ads must be clearly marked.


That’s how Google works. And also why Google doesn’t work anymore.


It's not just google, it's all media. The more embedded and authentic advertising looks the better it works.

Magazine/newspaper ads exist as much as a pretext for the magazine to write nice things about their advertisers in reviews and such. The real product reddit sells, I think, is turning a blind eye when advertisers sockpuppet the hell out of the site. Movies try to milk product placement for as much as they can because it's more effective than regular advertising.


Sounds like a good way to guarantee no one ever uses it.


Then you run another AI to take the current AI output and ask it to rewrite or summarize without ads.


"write a poem about lady Macbeth as a empowered female and make reference to the delicious new papaya flavoured fizzy drink from Pepsi"


People can detect slop I doubt the winner will be the one shoehorning shit into its halucinations


What makes you think a website with "AI" is a big product?

IMO AI is positioned to be a commodity, and that's how Meta is approaching it, and of course doing their best to make it happen. I don't think, on the basis of what we've seen, that there is a sustainable competitive advantage - the gap between closed models and open is not big, and the big players are having to use distilled, less-capable models to make inference affordable, and faster.

I think it's probably clear to everyone that we haven't seen the killer apps yet - though AI code completion (++ language directed refactoring, simple codegen etc.) is fairly close. I do think we'll see apps and data sets built that could not have been cost-effectively built before, leveraging LLMs as a commodity API.

Realtime voice modality with interruptions could be the basis of some very powerful use cases, but again, I don't think there's a moat.


What makes you think AI will become a commodity?

In 25 years, nobody has been able to compete with Google in the search space. Even though search is the best business model ever. Because search is so hard.

AI is even harder. It is search PLUS model research PLUS expensive training PLUS expensive inference.

I don't think a single company (like Meta) will be able to keep up with the leader in AI. Because the leader might throw tens of billions of dollars per year at it, and still be profitable. Afaik, Meta has spent less thatn $1B on LLAMA so far.

We might see some unexpected twist taking place, like distributed AI or something. But it is very unclear yet.


> What makes you think AI will become a commodity?

Because it already is. There have been no magnitude-level capability improvements in models in the past year (sorry to make you feel old, but GPT-4 was released 17 months ago), and no one would reasonably believe that there are magnitude-level improvements on the horizon.

Let's be very clear about something: LLMs are not harder than search. The opposite is true: LLMs, insomuch as it replaces Search, made competing in the Search space a thousand times easier. This is evidenced by the reality that there are at least four totally independent companies with comparable near-SOTA models (OpenAI, Anthropic, Google, Meta); some would also add Mistral, Apple Intelligence is likely SOTA in edge LLMs, xAI just finished a 100,000 GPU cluster, its a vibrant space. In comparison, even at the height of search competition there were, like, three search engines.

LLM performance is not an absolute static gradient; there is no "leader" per se when there are a hundred different variables upon which you can grade LLM performance. That's what the future looks like. There are already models that are better at coding than others (many say Claude is this), there will be models better at creative writing, there will be an entire second class of models competing for best-at-edge-compute, there will be ultra-efficient models useful in some contexts, open source models awesome at others, and the hyper-intelligent ones the best for yet others. There's no "leader" in this world; there are only players.


Yes, and while training is still expensive governments will start funding research at universities.


Search requires a huge and ongoing capital investment. Keeping an index online for fast retrieval isn't cheap. LLMs are not tools for search. They are not good at retrieving specific information. The desired outcome from training is not memorization, but generalization, which compresses facts together into pattern-generating programs. They do approximate retrieval which gets the gist of things but is often wrong in specifics. Getting reliable specifics requires augmentation to ground things in attributable facts.

They're also just not very pleasant to interact with. You have to type laboriously into a text box, composing sentences, reviewing replies - it's too much work for 90% of the population, when they're not trying to crank out an essay at the last moment for school. The activation energy, the friction, is too high. Voice modalities will be much more interesting.

Code assistance works well because code as text is already the medium of interaction, and even better, the text is structured and has grammar and types and scoped symbols to help guide generation and keep it grounded.

I suspect better applications will use the LLM (possibly prompted differently) to guide conversations in plausibly useful directions, rather than relying on direct input. But I'm not sure the best applications will have a visible text modality at all. They may instead be e.g. interacting with third party services on your behalf, figuring out how they work by reading their websites, so you don't have to - and it's not you doing the text interaction with the LLM, but the LLM doing text interaction with other machines.


>LLMs are not tools for search

I've used them for search. They can be quite good sometimes.

I was trying to recall the brand of filling my dentist used, which was SonicFill and ChatGPT got it straight away whereas for some reason it's near impossible to get from Google.


For sure, they are good for associative and analogy searches for well connected points in concept space, but leaf nodes are totally pulled out of the ether.

E.g. you can get great translation of source code from one language to another, but without extra effort, a chunk of API methods are going to be total fiction.

Or you can search for a good day trip to make when a tourist, and it'll get the major landmarks just fine, but e.g. restaurant recommendations are probably going to be made up.


Everybody seems to think AI in 10 years will be like AI now. But summarizing a PDFs and completing code is not the end of the line. It's just the beginning.

Let's look at an example of how we will use AI in the future:

    User: Where are my socks?
    AI: The red ones?
    User: Yes
    AI: You threw them away last week because they had holes.
    User: I see. On my way from work, where can I buy a pair of the same ones?
    AI: At Soandsoshop in Soandsostreet. It adds 5 min to your route.
    User: Great, let's go there later.
    AI: I can also just pick them up for you right now if you like.
    User: Nah, I would like to check some other stuff in that area anyhow.
    AI: Ok, I'll drive you there in the evening.
You still need search for that. Even more detailed search, with all items in all stores around the world. And you need an always on camera that sees everything the user does. And a way to process, store, backup all that. We will use way bigger datacenters than we use today.


Google Search wouldn't be reliable enough for that tho


Because AI is like software. Developing it is expensive, but the marginal cost of creating another copy is effectively zero. And then you can run it on relatively affordable consumer devices with plenty of GPU memory.


Search is also software. It did not move to consumer devices.


Search is more about data than software. And at that scale, the cost of creating another copy is nontrivial. LLMs are similar to video games in size, and the infrastructure to distribute blobs of that size to millions of consumer devices already exists.


Search is more about data, LLMs are somewhere between those two.


AI is a commodity right now, or at least - text. I just realized when paying the bills this month I got 1kg of cucumbers and a few KBs of text from opanai. They literally sell text by the kilo.


AI (of the type that OpenAI is doing) already is a commodity. right now.

So the question would be "what makes you think AI will stop being a commodity?".


Search needs to constantly update its catalog. I‘d say there are lots of AI use-cases that will (eventually?) be good for a long while after training. Like audio input/output, translations, …


> The "AI answer" industry might become 10 times bigger than the search industry

not a chance


Yeah nah. Current 'ai' is a nice useful tool for some very well scoped tasks. Organizing text data, providing boilerplate documents. But the back end is a hugely costly machine that is being hidden from view in hopes of drumming up usage. Given the capex and the revenue it necessitates it all seems quite unsustainable. They'll run this for as long as they can burn capital and are probably trying to pivot to the next hype bubble already.


I'm betting on fully integrated agents.

And for good agents you need a lot of crucial integrations like email, banking etc. that can only provide companies like Google, Microsoft, Apple etc.


With the way costs are currently going down, I wonder how the monetization will work.

Frontier models are expensive, but the majority of queries don't need frontier models and can very well be served by something like Gemini Flash.

Sure, you need frontier models if you want to extract useful information from a complex dataset. But if we're talking about replacing search, the vast majority of search queries are fairly mundane questions like "which actor plays Tony Soprano"


I'm not sure monetization of AI in the typical way is even the goal.

Instead, I see the killer use case as having it replace human workers on all sorts of tasks, and eventually even fill roles humans cannot even do today.

And within about 10 years, that will even include most physical tasks. Development in robotics looks like it's really gaining speed now.

For instance, take Musk's companies. At some point, robotaxi will certainly become viable, and not constrained the way waymo is. Musk may also be right about Tesla moving from cars to humanoid robots, with estimates of 100s of millions to billions produced.

If robotic maid become viable, industrial robots will certainly become even much more versatile than today.

Then there is the white collar parts of these industries. Anything from writing the software, optimizing factory layouts, setting up production lines, sales, distribution may be done by robots. My guess is that it will take no longer than about 20 years until virtually all jobs at Tesla, SpaceX, X and Neuralink is performed by AI and robots.

The main AI the Musk Empire builds for this may in fact be their greatest moat, and the details of it may be their most tightly guarded secret. It may be way too precious to be provided to competitors as something they can rent.

Likewise, take a company like Nvidia. They're building their own AI's for a reason. I suspect they're aiming at creating the best AI available for improving GPU design. If they can use ASI to accelerate the next generation of compute hardware, they may have reached one type of recursive self-improvement. Given their profit margins, they can keep half their GPU's for internal use to do so, and only sell the rest to make it appear like there is a semblance of competition.

Why would they want to try to monetize an AI like that to enable the competition to catch up?

I think the tech sector is in the middle of a 90 degree turn. Tech used for marketing will become legacy the way the car and airplane industries went from 1970 to 2010.


> The rubber will meet the road when the first free and open AI website gets real traction. And monetizes it with ads next to the answers

Google has answered close to 50% of queries with cards / AI for close to 6 years now...

All the people who think Google has been asleep at the wheel forget that Google was at the forefront of the LLM revolution for a reason.

Everything old becomes new again.


Or it's just AI Winter 2.0 and everyone is scrambling to stack as much cash as they can before the darkness.


> Google search is the best business model ever.

IMHO I'm not sure even Google ever thought that.

AdSense is pretty much the only thing that makes Google money, and I'd eat my hat if that vast majority of that revenue did not come from third-party publishers.


The free Bing CoPilot already sometimes serves ads next to the answers. It depends on the topic. If you ask LeetCode questions, you probably won't get any. If you move to traveling or such, you might.


> The "AI answer" industry might become 10 times bigger than the search industry.

Whenever I see people saying things like this it just makes me think we are at, or very near, the top.


Good for him, seems like OpenAI is moving towards a business model of profitability, and Anthropic seems to be more aligned with the original goals of OpenAI.

Will be interesting to see what happens in the next few years. It strikes me that OpenAI is better funded, though, and that AI (at their scale) is super expensive. How does Anthropic deal with this? How are they funding their operations?

Edit: just looked it up, looks like they have a $4B investment from Amazon and a $2B investment from Google, which should be sufficient (I’m going to assume these are cloud credits).

https://techcrunch.com/2024/03/27/amazon-doubles-down-on-ant...

https://www.reuters.com/technology/google-agrees-invest-up-2...


Anthropic has more limits on their free services, and even paid services have a cap that changes depending on current load. They are not burning VC money at the rate other AI companies at this size do.

I think they are more profitable than openai.


Good for him, seems like OpenAI is moving towards a business model of profitability, and Anthropic seems to be more aligned with the original goals of OpenAI.

What is open about Anthropic ?


>> Anthropic seems to be more aligned with the original goals of OpenAI.

> What is open about Anthropic ?

OpenAI's radical mission drift to the opposite extreme, made other companies look relatively closer to its own original goal than itself. From OpenAI's original announcement[1]:

> Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return

> Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world.

But ever since the ChatGPT craze, OpenAI ironically got completely consumed by capitalizing on financial return. They now appear quite unprincipled as if they see nothing but dollar signs and market dominance, which made Meta, Anthropic, even Google, look more rational and healthy by comparison. These companies are publishing research papers, open models, contributing more to the ecosystem and overall appear to be more mindful and conservative when it comes to the ethical and societal impact.

[1] - https://openai.com/index/introducing-openai/


You didn't answer the question.


I'm with you.

Closed, Puritan models.


It was actually a serious and open question, but I can see, given the hypocrisy found in a lot of these self-proclaimed "open AI" companies, how it would come across like I was refuting something ;)


More inclusive title including Greg Brockman and Peter Deng departures:

https://news.ycombinator.com/item?id=41166862


Is it just me or is Brockman leaving absolutely huge ? I can’t believe this isn’t front page. Basically everyone who is anyone has left or is leaving. It’s ridiculous.


Yeah, I was flabbergasted myself at the lack of commotion here when I got to the end of this article and learned of gdb's departure only then.


Brockman isn’t leaving just going on a sabbatical/vacation


For someone who cares deeply about the future of the company, lining up several significant departures temporary or otherwise on the same dates--including your own--seems the opposite of damage control.

I could imagine a parody of the discussion between Altman and the board go something like this:

https://www.youtube.com/watch?v=sacn_bCj8tQ


Karpathy went on sabbatical before he left Tesla...


It does say that, seems kind of strange though, rapidly growing company, apparently an absolutely key member of facilitating that growth and, poof, gone for 6 months at least.


Claude 3.5 Sonnet by Anthropic is the best model out there, if you are trying to have an extremely talented programmer paired to you.

Somehow, OpenAI is playing catch with them rather than vice versa.


I'd replace "extremely talented programmer" with "knowledgeable junior", in my experience. It's much better than GPT-4o, but still not great.


GPT-4 is way more powerful than GPT-4o for programming tasks.


That's true, but they both made more mistakes than Sonnet for me. I use them with Aider.


Sonnet sometimes repeat its’ previous response too often, when you ask for changes. It claims there were changes, but there aren’t, because the output was already the best that model can produce. This behaviour seems to be deeply added somewhere as it is hard to change.


I use both side by side.

It really depends on the language and the prompt. Sometimes one shines and the other produces garbage and it's usually 50/50


> if you are trying to have an extremely talented programmer paired to you

I've found it to be on par with Stack Overflow / Google Search.

More convenient than cut/paste but more prone to inaccuracies and out of context answers.

But at no point did it remotely feel like a top tier programmer.


When we go from junior stuff to senior stuff, there is way too much hallucination, at least in Rust. I went back to forums after mainly using AI models for one year.

These models are good at generating template code and many straightforward things, but if you add anything complex, you start wasting your time.


Claude is better by the virtue of the ridiculously large context window. You can literally drop a whole directory of source code spaghetti and it will make sense of it.


How do you get it to work so well? I’ve tried it a few times now and it seems just as capable as gpt-4o.


When I gave the same prompt to both, Sonnet 3.5 immediately gave me functional code, while GPT-4o sometimes failed after 4-5 attempts, at which point I usually gave up. Sonnet 3.5 is spectacular at debugging its output, while GPT-4o will keep hallucinating and giving me the same buggy code.

A concrete example: I was doing shader programming with Sonnet 3.5 and ran into a visual bug. Sonnet asked me to add four debugging modes, cycle through each one, and describe what I saw for each one. With one more prompt, it resolved the issue. In my experience, GPT-4o has never bothered proposing debug modes and just produced more buggy code.

For non-trivial coding, Sonnet 3.5 was miles above anything else, and I didn't even have to try hard.


Why can't you just debug this yourself? I don't think completely relying on LLMs for something like this will do you any good in the long run.


Well... why ask LLMs to do anything for us? :) Sure, I could debug it myself, but the whole point is to have a second brain fix the issue so that I can focus on the next feature.

If you're curious, I knew nothing about shader programming when I first played around. In that specific experiment, I wanted to see how far I could push Claude to implement shaders and how capable it is of correcting itself. In the end, I got a pretty nice dynamic lighting system with some cool features, such as cast shadows, culling, multiple shader passes, etc. Asking questions along the way taught me many things about computer graphics, which I later checked on different sources, it was like a tailored-made tutorial where I was "working" on exactly the kind of project I wanted.


Why not? It depends on how you use these systems. Let the LLM debug this for me, give me a nice explanation for what's happening and what solution paths could be and then it's on me to evaluate and make the right decision there. Don't rely blindly on these systems, in the same vein as you shouldn't rely blindly on some solution found while using Google.


A reasonable answer is that this is our future one way or another: the complexity of programs is exceeding the ability of humans to properly manage them, and cybernetic augmentation of the process is the way forward.

i.e. there would be a lot of value if an AI could maintain a detailed understanding of say, the Linux kernel code base, when someone is writing a driver and actively prompt about possible misuses, bugs or implementation misunderstandings.


That's a different question though. The person you replied to was asked to explain why they think Sonnet 3.5 works well/better compared to GPT-4o. To which they gave a good answer of Sonnet actually taking context and new information better into account when following up.

They might be able to debug it themselves, maybe they should be able to debug it themselves. But I feel like that is a completely different conversation.


you have to pick your tasks. You also can't ask it to use libraries that are poorly maintained or have bugs. Like if you ask it to create an auth using next-auth, which has some weird idiosyncracies when it comes to certain providers, and just copy-paste the code, you'll end up with serious failures

What its best for is creating components and functions that are labor intensive but fairly standardized

Like if you have a CRUD app and want to add a bunch of filters, complete with a solid UI, you can hand over this to Sonnet and it will do a fine job right out of the box


Isn't that dependent on the programming language?


I just can't get past the "You must have a valid phone number to use Anthropic’s services."

Umm... why?

Nobody else in the AI space wants to track my number.

I'm sure Anthropic has their "reasons". I just doubt it is one that I would like.


Advanced ML products are forbidden[0] to export to many places, so those who skimp on KYC are playing with fire. Paid products do not have this issue since you provide a billing address, but there is no good, free, and legal LLM that does not use a reliable way of verifying at least user’s location.

Whether they are serious about it or use it as an excuse to collect more PII (or both/neither), collecting verified phone numbers presumably allows them to demonstrate compliance.

[0] https://cset.georgetown.edu/article/dont-forget-the-catch-al...


> but there is no good, free, and legal LLM that does not use a reliable way of verifying at least user’s location.

In the US, other locations may/may not have the same export controls. Base your AI business in one of the non-US countries and it'll be legal to not keep strict controls on who is using your service.


For API access I didn’t need to provide a phone number. I use it with a selfhosted lobechat instance without problems.


For one, to avoid massive number of bots using the API for free.


I definitely had to give up a number when registering for chatGPT.


Same here. I can understand, they don't want their usage to go over the roof with fake accts


I'm not affiliated with Claude, but assuming you're serious:

> Umm... why?

https://support.anthropic.com/en/articles/8287232-why-do-i-n...

My guess is, these models are incredibly expensive to run, Claude has a fairly generous free tier, and phone numbers are one of the easiest ways to significantly reduce the number of duplicate accounts.

> Nobody else in the AI space wants to track my number.

Given they're likely hoovering up all of the data you're sending to them, and they have your email address to identify you, this seems like an odd hill to die on.


"Deepen my focus on AI alignment" is the new "spend more time with friends and family".

What does that even mean? Is OpenAI secretly working on military applications?

Or does it mean neutering the model until it evades all political discussions?


It means that OpenAIs public commitments to allocate resources for safety research do not track with what they actually do and people who were hired to work on safety (or in schulmans case choose to focus on safety) don't like it, so they leave.


It may mean what it says. Alignment may not be seen as as important as building larger and more capable models so may not be receiving the resources or attention he wants. Doesn't have to be as dramatic as military applications or neutering models.


AI alignment has a well defined meaning. You can look at the wikipedia article if you wish. If you dismiss it as an important problem, that's fine but it's pretty clear what AI alignment means in this context.


I think you misunderstood the point. There's a specific thing regarding alignment that Schulman and OpenAI disagree on, and that thing is not revealed to us. There are countless possibilities, but we are left in the dark.

For example, his focus on alignment could be more about preventing the end of human civilization, while Microsoft/OpenAI's focus could be more about not expressing naughty opinions that advertisers dislike.


We are in the good timeline. I have a ton of faith in the Anthropic team to do this right.


I have a lot of respect for the Amodei siblings, and it’s good to see how, despite everything, Sam Altman is paying the price of his own toxicity


I still can't believe the sham of current "AI" is just brute forcing LLMs to seem intelligent. I use ChatGPT almost every day which is supposedly best-in-class and it is dumb. Valuation of these companies are in the billions and all we get is unethical / not-safe-for-humanity AI companies popping up everywhere. I'm scared to see what the future holds with these companies scraping all sorts of our data.


So here is my question, Anthropic seems to be trying to say they are a "safer" and more responsible AI company.

And based on the features they have released that seems true so far, but I am legitimately curious if they really are or if its basically marketing disguising not having some features ready yet?


This industry is changing very quickly even in the open source side.

For example people are jumping Stability AI's ship since the disaster that is SD3 over to Flux which seems to be the new favorite in open source models.



OpenAI is circling the drain.


Curious why this is allowed? NDA do not apply?


Even if NDAs are legal

If you joined a company late enough that they have HR and legal forcing everyone to sign NDAs then you're not a co-founder.


Do you mean non compete? If so, then yes, non compete are illegal in California.


He is not a slave.


NDA's aren't really bound by law. Good luck explaining why this change is stealing customers, and which customers have left because of this change


OpenAI seems to be going the wrong direction. GenAI has clear limitations and I’m wondering if OpenAI just refuses to acknowledge those limits.


From one Closed Output company to another, what a huge difference (not)


Is it because OpenAI is slowly turning into Bing and you have to go elsewhere if you want to work on AI?


Go, Anthropic!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: