OpenAI co-founder John Schulman says he will leave and join rival Anthropic

gizmo · 2024-08-06T10:07:17.000000Z

This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead. Also from a safety perspective you would want to be at the AI company most likely to create truly disruptive AI tech. This looks to me like a bet against OpenAI more than anything else.

OpenAI has a burn rate of about 5 billion a year and they need to raise ASAP. If the fundraising isn't going well or if OpenAI is forced to accept money from questionable investors that would also be a good reason to jump ship.

In situations like these it's good to remember that people are much more likely to take the ethical and principled road when they also stand to gain from that choice. People who put their ideals above pragmatic self-interest self-select out of positions of power and influence. That is likely to be the case here as well.

lolinder · 2024-08-06T13:01:37.000000Z

> This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead.

Yep. The writing was already on the wall for GPT-5 when they teased a new model for months and let the media believe it was GPT-5, before finally released GPT-4o and admitting they hadn't even started on 5 yet (they quietly announced they were starting a new foundation model a few weeks after 4o).

Don't get me wrong, the cost savings for 4o are great, but it was pretty obvious at that point that they didn't have a clue how they were going to move past 4 in terms of capabilities. If they had a path they wouldn't have intentionally burned the hype for 5 on 4o.

This departure just further cements what I was already sure was the case—OpenAI has lost the lead and doesn't know how they're going to get it back.

userabchn · 2024-08-06T16:19:07.000000Z

and then revealed that GPT-5 will not be released in this year's Dev Day (which goes on until November)

rvnx · 2024-08-06T13:58:35.000000Z

Or it could be the start of the enshittification of Anthropic, like OpenAI ruined GPT-4 with GPT-4o by overly simplifying it.

I hope not, because Claude is much better, especially at programming.

meowface · 2024-08-06T14:05:22.000000Z

Claude 3.5 Sonnet is the first model that made me realize that the era of AI-aided programming is here. Its ability to generate and modify large amounts of correct code - across multiple files/modules - in one response beats anything I've tried before. Integrating that with specialized editors (like https://www.cursor.com) is an early vision of the future of software development.

lolinder · 2024-08-06T14:13:29.000000Z

I've really struggled every time I've pulled out any LLM for programming besides using Copilot for generating tests.

Maybe I've been using it for the wrong things—it certainly never helps unblock me when I'm stuck like it sounds like it does for some (I suspect it's because when I get stuck it's deep in undocumented rabbit holes), but it sounds like it might be decent at large-scale rote refactoring? Aside from specialized editors, how do people use it for things like that?

rvnx · 2024-08-06T14:16:17.000000Z

At least from my experience:

You take Claude, you create a new Project, in your Project you explain the context of what you are doing and what you are programming (you have to explain it only once!).

If you have specific technical documentation (e.g. rare programming language, your own framework, etc), you can put it there in the project.

Then you create a conversation, and copy-paste the source-code for your file, and ask for your refactoring or improvement.

If you are lazy just say: "give me the full code"

and then

"continue the code" few times in a row

and you're done :)

munksbeer · 2024-08-07T12:21:10.000000Z

> in your Project you explain the context of what you are doing and what you are programming (you have to explain it only once!).

When you say this, you mean typing out some text somewhere? Where do you do this? In a giant comment? In which file?

rvnx · 2024-08-07T12:53:26.000000Z

In "Projects" -> "Create new project" -> "What are you trying to achieve?"

danielbln · 2024-08-06T14:18:08.000000Z

Provide context to the model. The code you're working on, what it's for, where you're stuckhat you've tried, etc. Pretend it's a colleague that should help you out and onboard it to your problem, then have a conversation with it as of your are rubber ducking your colleague.

Don't ask short one-off questions and expect it to work (it might just, depending on what you ask, but probably not if you're deep on some proprietary code base with no traces in the LLMs pretraining).

lolinder · 2024-08-06T14:46:39.000000Z

I've definitely tried that and it doesn't work for the problems I've tried. Claude's answers for me always have all the hallmarks of an LLM response: extremely confident, filled with misunderstandings of even widely used APIs, and often requiring active correction on so many details that I'm not convinced it wouldn't have been faster to just search for a solution by hand. It feels like pair programming with a junior engineer, but without the benefit of helping train someone.

I'm trying to figure out if I'm using it wrong or using it on the wrong types of problems. How do people with 10+ years of experience use it effectively?

FLT8 · 2024-08-06T22:57:10.000000Z

I'm sure I'm going to offend a bunch of people with this, but my experience has been similar to yours, and it reminds me of something "Uncle" Bob Martin once mentioned: the number of software developers is roughly doubling every two years, which means that at any given time half of the developer population has less than two years experience.

If you're an experienced dev, having a peer that enthusiastically suggests a bunch of plausible but subtly wrong things probably net-net slows you down and annoys you. If you're more junior, it's more like being shown a world of possibilities that opens your mind and seems much more useful.

Anyway, I think the reason we see so much enthusiasm for LLM coding assistants right now is the overall skew of developers to being more junior. I'm sure these tools will eventually get better, at least I hope they do because there's going to be a whole lot of enthusiastically written but questionable code out there soon that will need to be fixed and there probably won't be enough human capacity to fix it all.

lolinder · 2024-08-07T04:19:44.000000Z

Thanks for saying it explicitly. I definitely have the same sense, but was hoping someone with experience would chime in about use cases they have for it.

klyrs · 2024-08-06T16:12:53.000000Z

I'm a mathematician and the problems I work on tend to be quite novel (leetcode feel but with real-world applications). I find LLMs to be utterly useless at such tasks; "pair programming a junior, but without the benefit" is an excellent summary of my experience as well.

icholy · 2024-08-06T15:46:46.000000Z

It's good for writing that prototype you're supposed to throw away. It's often easy to see the correct solution after seeing the wrong one.

danielbln · 2024-08-06T14:54:20.000000Z

I think the only way to answer that is if you can share an example of a conversation you had with it, where it broke down as you described.

valval · 2024-08-06T16:37:49.000000Z

For what I’m working on, I can also use the wrong approaches. Going through my fail often fail fast feedback loop is a lot more efficient with LLMs. Like A LOT more.

Then when I have a bunch of wrong answers, I can give those as context as well to the model and make it avoid those pitfalls. At that point my constraints for the problem are so rigorous that the LLMs lands at the correct solution and frankly writes out the code 100x faster than I would. And I’m an advanced vim user who types at 155 wpm.

lolinder · 2024-08-06T17:43:22.000000Z

> And I’m an advanced vim user who types at 155 wpm.

See, it's comments like this that make me suspect that I'm working on a completely different class of problem than the people who find value in interacting with LLMs.

I'm a very fast typer, but I've never bothered to find out how fast because the speed of my typing has never been the bottleneck for my work. The bottleneck is invariably thinking through the problem that I'm facing, trying to understand API docs, and figuring out how best to organize my work to communicate to future developers what's going on.

Copilot is great at saving me large amounts of keystrokes here and there, which is nice for avoiding RSI and occasionally (with very repetitive code like unit tests) actually a legit time saver. But try as I might I can't get useful output out of the chat models that actually speeds up my workflow.

whackyMax · 2024-08-06T20:45:35.000000Z

I have always thought of it as a way to figure out what doesn't work and get to a final design, not necessarily code. Personally, it's easy to verify a solution and figure out use cases that wouldn't work out. Keep on iterating until I have either figured out a mental model of the solution, or figured out the main problems in such a hypothetical solution.

rvnx · 2024-08-06T14:09:29.000000Z

Oh yes, totally agree, it's like if you have a very experienced programmer sitting next to you.

He still needs instructions on what to do next, he lacks a bit of "initiative", but from a pure coding skills it's amazing (aka, we will get replaced over time, and it's already the case, I don't need help of contractors, I prefer to ask Claude).

gizmo · 2024-08-06T16:06:29.000000Z

More like an insanely knowledgeable but very inexperienced programmer. It will get basic algorithms wrong (unless it's in the exact shape it has seen before). It's like a system that automatically copy-pastes the top answer from stackoverflow in your code. Sometimes that is what you want, but most of the time it isn't.

valval · 2024-08-06T16:41:10.000000Z

This sentiment is so far from the truth that I find it hilarious. How can a technically adept person be so out of touch with what these systems are already capable of?

gizmo · 2024-08-06T17:03:28.000000Z

LLMs can write a polite email but it can't write a good novel. It can create art or music (by mushing together things it has seen before) but not art that excites. It's the same with code. I use LLMs daily and I've seen videos of other people using tools like Cursor and so far it looks like these LLMs can only help in those situations where it is pretty obvious (to the programmer) what the right answer looks like.

rvnx · 2024-08-07T12:25:57.000000Z

With all of that, ChatGPT is actually one of the top authors in Amazon e-books.

But I agree that for some creative tasks, like writing or explaining a joke, or some novel algorithms, it's very bad.

PaulRobinson · 2024-08-08T06:35:12.000000Z

The LLM generated e-book thing is actually a serious problem. Have you read any of it? Consumers could lose trust unless it’s fixed. If you buy a book and then realise nobody, not even the seller, has ever read it, as it turns into incomprehensible mush regularly, are you more or less likely to buy a book from the same source?

jetsetk · 2024-08-11T21:58:54.000000Z

Hilarious (or even shocking) is the sentiment that people are actually so overhyped by these tools.

_xnmw · 2024-08-06T14:18:23.000000Z

I keep hearing this comment everywhere Claude is mentioned, as if there is a coordinated PR boost on social media. My personal experience with Claude 3.5 however is, meh. I don't see much difference compared to GPT-4 and I use AI to help me code every day.

viking123 · 2024-08-06T14:31:59.000000Z

Yeah they really like to mention it everywhere, like yeah it's good but imo not as good as some people make it out to be. I have used it recently for libgdx on kotlin and there are things where it struggles, and the code it sometime gives it's not really "good" kotlin but it takes a good programmer to know what is good and what is not

phyalow · 2024-08-06T15:14:18.000000Z

I think in more esoteric languages it wont work as well. Python, C++ it is excellent, suprisingly its Rust is also pretty damn good.

(I am not a paid shiller, just in awe of what Sonnet 3.5 + Opus can do)

kibibu · 2024-08-06T21:49:21.000000Z

Kotlin isn't exactly an esoteric language though

valval · 2024-08-06T16:41:44.000000Z

User error.

lebca · 2024-08-06T17:08:33.000000Z

Please consider avoiding more ad hominem attacks or revising the ones you've already plastered onto this discussion.

samier-trellis · 2024-08-06T21:21:29.000000Z

How are you liking cursor? I tried it ~a year ago, and it was quite a bit worse than ferrying back and forth between ChatGPT and VSCode.

Is it better than using GitHub Copilot in VSCode?

meowface · 2024-08-07T17:49:43.000000Z

Definitely better. I ended my Copilot subscription.

samier-trellis · 2024-08-07T21:40:02.000000Z

Oh interesting, will give it another go, thnx

nar001 · 2024-08-06T14:04:41.000000Z

They ruined GPT-4? How? I thought they were basically the same models, just multimodal

rvnx · 2024-08-06T14:07:49.000000Z

GPT-4o is different from GPT-4, you can "feel" it is smaller model that really struggles to do reasoning and programming and has a much weaker logic.

If you compare to Claude Sonnet, just the context window considerably improves the answers as well.

Of course there is no objective metrics, but from a user perspective I can see the coding skills are much better in Anthropic (and it's funny, because in theory, according to benchmarks it is Google Gemini the best, but in reality is absolutely terrible).

jstummbillig · 2024-08-06T14:23:56.000000Z

> GPT-4o is different from GPT-4, you can "feel" it is smaller model that really struggles to do reasoning and programming and has a much weaker logic.

FWIW according to LMSYS this is not the case. In coding, current GPT-4o (and mini, for that matter) beat GPT-4-Turbo handily, by a margin of 32 points.

By contrast Sonnet 3.5 is #1, 4 score points ahead of GPT-4o.

whymauri · 2024-08-06T14:30:47.000000Z

I'm a firm believer that the best benchmark is playing around with the model for like an hour. On the type of tasks that are relevant to you and your work, of course.

patrickdward · 2024-08-07T04:23:55.000000Z

I've also found GPT-4o to be subjectively less intelligent than GPT-4. The gap especially shows up when more complex reasoning is required, eg, on macroeconomic questions or other domains in the world where the interactions are important or where subtle aspects of the question or domain are important.

toxik · 2024-08-06T14:13:24.000000Z

Have to say I agree with this, 4o is dumber in my subjective experience.

reaperman · 2024-08-06T10:43:45.000000Z

While I agree with your logic I also focused on:

> People who put their ideals above pragmatic self-interest self-select out of positions of power and influence. That is likely to be the case here as well.

It’s also possible that this co-founder realizes he has more than enough eggs saved up in the “OpenAI” basket, and that it’s rational to de-risk by getting a lot of eggs in another basket to better guarantee his ability to provide a huge amount of wealth to his family.

Even if OpenAI is clearly in the lead to him, he’s still looking at a lot of risk with most of his wealth being tied up in non-public shares of a single company.

andruby · 2024-08-06T11:53:40.000000Z

While true, him leaving OpenAI to (one of) their biggest competitors does seriously risk his eggs in the OpenAI basket.

muzani · 2024-08-06T14:31:24.000000Z

There's usually enough room for 2-3 winners. iOS and Android. Intel and AMD. Firefox and Chrome.

Also, OpenAI has some of the most expensive people in the world, which is why they're burning so much money. Presumably they're so expensive because they're some of the smartest people in the world. Some are likely smarter than Schulman.

aleph_minus_one · 2024-08-06T15:07:48.000000Z

> Presumably they're so expensive because they're some of the smartest people in the world.

I don't want to dissuade you from this belief, but maybe you should pay less attention to the boastful marketing of these AI companies. :-)

Seriously: from what I know about the life of insanely smart people, I'd guess that OpenAI (and most other companies that in their marketing claim to hire insanely smart people) doesn't have any idea how to actually make use of such people. Such companies rather hire for other specific personality traits.

JCharante · 2024-08-06T13:50:33.000000Z

it only risks his eggs if the anthropic basket does well, if anthropic doesn't go well then he still has his OpenAI eggs

mark_l_watson · 2024-08-06T11:38:54.000000Z

I find the 5 billion a year burn rate amazing, and OpenAI’s competition is stiff. I happily pay ABACUS.AI ten dollars a month for easy access to all models, with a nice web interface. I just started paying OpenAI twenty a month again, but only because I am hoping to get access to their interactive talking mode.

I was really surprised when OpenAI started providing most of their good features for free. I am not a business person, but it seems crazy to me to not try for profitability, of at least being close to profitability. I would like to know what the competitors’ burn rates are also.

For API use, I think OpenAI’s big competition is Groq, serving open models like Llama 3.1.

gizmo · 2024-08-06T13:32:13.000000Z

> it seems crazy to me to not try for profitability

A business is worth the sum of future profits, discounted for time (because making money today is better than making money tomorrow). Negative profits today are fine as long as they are offset by future profits tomorrow. This should make intuitive sense.

And this is still true when the investment won't pay off for a long time. For example, governments worldwide provide free (or highly subsidized) schooling to all children. Only when the children become taxpaying adults, 20 years or so later, does the government get a return on their investment.

Most good things in life require a long time horizon. In healthy societies people plant trees that won't bear fruit or provide shade for many years.

patrickdward · 2024-08-07T04:31:24.000000Z

Yes. If ChatGPT-like products will be widely and commonly used in the future, it's much more valuable right now to try to acquire users and make their usage sticky (through habituation, memory, context/data, integrations, etc) than it is to monetize them fully right now.

blackeyeblitzar · 2024-08-06T17:38:48.000000Z

I’m not super familiar with the latest AI services out there. Is abacus the cheapest way to access LLMs for personal use? Do they offer privacy and anonymity? What about their stance on censorship of answers?

codazoda · 2024-08-06T12:20:01.000000Z

I don’t use Groq, but I agree the free models are probably the biggest competitors. Especially since we can run them locally and privately.

Because I’ve seen a lot of questions about how to use these models, I recorded a quick video showing how I use them on MacOS.

https://makervoyage.com/ai

dartos · 2024-08-06T12:52:36.000000Z

Local private models are not a threat to openai.

Local is not where the money is, it’s in cloud services and api usage fees.

elorant · 2024-08-06T13:07:36.000000Z

They aren’t in terms of profitability, but they are in terms of future revenue. If most early adopters start self-hosting models then a lot of future products will be build outside of OpenAI’s ecosystem. Then corporations will also start searching how to self-host models because privacy is the primary concern for AI’s adoption. And we already have models like Llama3 400B that is close to ChatGPT.

dartos · 2024-08-06T15:00:05.000000Z

Have you paid much attention to the local model world?

They all tout OpenAI compatible APIs because OAI was the first mover. No real threat for incompatibility with OAI.

Plus these LLMs don’t have any kind of interface moat. It’s text in and text out.

windexh8er · 2024-08-06T17:06:53.000000Z

Just because Ollama and friends copied the API doesn't mean that they're not competitive. They've all done this just the same as others copying the S3 API - ease of integration and lower barrier to entry during a switching event, should one arise.

> Plus these LLMs don't have any kind of interface moat.

The interface really has very little influence. Nobody in the enterprise world cares about the ChatGPT interface because they're all building features into their own products. The UI for ChatGPT has been copied ad nauseam - so if anyone really wanted something to look and feel the same it's already out there. Chat and visual modals are already there, so I'm curious how you think ChatGPT has an "interface moat"?

> Local private models are not a threat to openai.

There are lots of threats to AI. One of them being local models. Because if the OpenAI approach is to continue at their burn rate and hope that they will be the one and only I think they're very wrong. Small, targeted models provide for many more use cases than a bloated, expensive, generalized model. I would gather long term OpenAI either becomes a replacement for Google search or they, ultimately, fail. When I look around me I don't see many great implementations of any of this - mostly because many of them look and feel like bolt-ons to a foundational model that tries to do something slightly product specific. But even in those cases the confidence with which I'd put in these products today is of relatively low quality.

dartos · 2024-08-11T12:17:38.000000Z

My argument was, because ollama and friends use the exact same interface as openai, tools built on top of them are compatible with OpenAI’s products and thus those tools don’t pull users away from OpenAI, so the local model world isn’t something OpenAI is worried about.

There is no interface moat. No reason for a happy openai user to ever leave openai, because they can enjoy all the local model tools with GPT.

elorant · 2024-08-06T18:22:18.000000Z

Who cares about the interface? Not everyone is interested in conversational tasks. Corporations in particular need LLMs to process their data. A restful API is more than enough.

dartos · 2024-08-07T10:22:14.000000Z

By interface, I meant API. (The “I” in API)

I should’ve been more clear.

mark_l_watson · 2024-08-06T12:34:25.000000Z

I use Ollama running local models about half the time (from Common Lisp or Python apps) myself.

bionhoward · 2024-08-06T12:23:35.000000Z

OpenAI features aren’t free, they take your mind-patterns in the “imitation game” as the price, and you can’t do the same to them without breaking their rules.

https://ibb.co/M1TnRgr

tim333 · 2024-08-06T17:38:08.000000Z

>it seems crazy to me to not try for profitability

I'm reminded of the Silicon Valley bit about no revenue https://youtu.be/BzAdXyPYKQo

It probably looks better to be not really trying for profitability and losing $5bn a year than trying hard and losing $4bn

Gettingolderev · 2024-08-06T11:24:45.000000Z

I don't think a co-founder would just jump ship just because. That would be very un-co-founderish.

I would also assume that he earns enough money to be rich. You are not a co-founder of OpenAI if you are not playing with the big boys.

So he definitly wants to be in this AI future but not with OpenAI. So i would argue it has to do with something which is important to him so important that the others disagree with him.

sangnoir · 2024-08-06T15:13:31.000000Z

> This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead.

I'll play devil's advocate. People leave bad bosses all the time, even when everything else is near-perfect. Additionally, cofounders sometimes get pushed out - even Steve Jobs went through this.

bookaway · 2024-08-06T13:26:58.000000Z

If being sued by the world's richest billionaire or the whole non-profit thing didn't complicate matters, and if the board had any teeth, one could wish the board would explore a merger with Anthropic with Altman leaving at the end of all of it and save everyone another years worth of drama.

lupire · 2024-08-06T13:42:38.000000Z

Could be as simple as switching from a limited profit/pa company to unlimited profit/pay.

jejeyyy77 · 2024-08-06T12:52:51.000000Z

this AI safety stuff is just a rabbit hole of distraction, IMO.

OpenAI will be better off without this crowd and just focus on building good products.

tivert · 2024-08-06T15:09:59.000000Z

> this AI safety stuff is just a rabbit hole of distraction, IMO.

> OpenAI will be better off without this crowd and just focus on building good products.

Ah yes, "focus on building good products" without safety. Except a "good product" is safe.

Otherwise you're getting stuff like an infinite range plane powered by nuclear jet engine that has fallout for exhaust [1].

[1] IIRC, nuclear-powered cruise missiles were contemplated: their attack would have consisted on dropping bombs on their targets, then flying around in circles spreading radioactive fallout over the land.

Jensson · 2024-08-06T15:21:19.000000Z

> Except a "good product" is safe.

Depends on how you define "safe". The kind of "safe" we get from OpenAI today seems to be mostly censorship, I don't think we need more of that.

jejeyyy77 · 2024-08-06T21:55:08.000000Z

what i'm saying is the safety risks of AI are over exaggerated to the point of comedy. it is no more dangerous than any other kind of software.

there is an effort by AI-doomer groups to try and regulate/monopolize the technology, but fortunately it looks like open source has put a wrench in this.

wseqyrku · 2024-08-07T07:28:23.000000Z

They won't release 5 before election.

dirtybirdnj · 2024-08-06T13:06:15.000000Z

> In situations like these it's good to remember that people are much more likely to take the ethical and principled road when they also stand to gain from that choice. People who put their ideals above pragmatic self-interest self-select out of positions of power and influence.

I don't know what world you live in, but my experience has been 100% the opposite. Most people will not do what is ethical or principled. When you try to discuss it with them, they will DARVO and congrats, you have now been targeted for public retribution by the sociopathic child in the drivers seat.

The thing that upsets me most is the survivorship bias you express, and how everybody thinks that people are "nice and kind" they are not. The world is an awful terrible place full of liars, cheats and bad people that WE NEED TO STOP CELEBRATING.

One more time WE NEED TO STOP CELEBRATING BAD PEOPLE WHO DO BAD THINGS TO OTHERS.

gizmo · 2024-08-06T13:19:16.000000Z

People are not one-dimensional. People can lie and cheat on one day and act honorably the day after. A person can be kind and generous and cruel and selfish. Most people are just of average morality. Not unusually good nor unusually bad. People in positions of power get there because they seek power, so there is a selection effect there for sure. But nonetheless you'll find that very successful people are in most ways regular people with regular flaws.

(Also, I think you misread what I wrote.)

diab0lic · 2024-08-06T14:34:00.000000Z

I think you may have misread the quote you’re replying to. You and the GP post appear to be in agreement. I read it as:

P(ethical_and_principled) < P(ethical_and_principled|stands_to_gain)

Or in plain language people are more likely to do the right thing when they stand to gain, rather than just because it’s the right thing.

Version467 · 2024-08-06T09:23:33.000000Z

Must've been a difficult decision with him being a cofounder and all, but afaik he's been the highest ranked safety minded person at openai. He says it's not because openai leadership isn't committed to safety, but I'm not sure I buy that. We've seen numerous safety people leave exactly because of that reason.

What makes this way more interesting to me though is how this announcement coincides with Brockmans sabbatical. Maybe there's nothing to it, but I find it more likely that things really aren't going well with sama.

Will be interesting to see how this plays out and if he actually returns next year or if this is just a soft quitting announcement.

meiraleal · 2024-08-06T10:44:42.000000Z

Th reality is that every other person in tech now is hoping for Sama to fail. The world doesn't need AI to have a silicon valley face. Anthropic is doing a much, much better PR work by not having a narcissist as CEO.

thfjdtsrsg · 2024-08-06T10:49:51.000000Z

Contrarily, I think the reality is that most of us couldn't care less about this AI soap opera.

danielbln · 2024-08-06T11:21:26.000000Z

I want the best model at the lowest rate (and preferably lowest energy expenditure) and with the easiest access. Anything else is just background noise.

bookaway · 2024-08-06T15:14:13.000000Z

Some people are wary of enabling ceos of disruptive technologies become the richest people in the world, take control of key internet assets and -- in random bursts of thin-skinned megalomania -- tilt the scales towards politicians or political groups who take action that negatively affect their own quality of life.

It sounds absurd, but some are watching such a procession take place live as we speak.

Geezus_42 · 2024-08-06T11:23:47.000000Z

I still haven't seen it do anything actually interesting. Especially when you consider that you have fact check the AI.

lambdaba · 2024-08-06T12:10:00.000000Z

I'm continously baffled by such comments. Have you really tried? Especially newer models like Claude 3.5?

bamboozled · 2024-08-06T13:23:52.000000Z

I hear a lot of people say good things about CoPilot too but I absolutely hate it. I have it enabled for some reason still, but it constantly suggests incorrect things. There has been a few amazing moments but man there is a lot of "bullshit" moments.

Workaccount2 · 2024-08-06T14:00:25.000000Z

Even when we get a gen AI that exceeds all human metrics, there will 100% still be people who with a straight face will say "Meh, I tried it and found it be pretty useless for my work."

dartos · 2024-08-06T12:50:25.000000Z

I have, yeah.

Still useless for my day to day coding work.

Most useful for whipping up a quick bash or Python script that does some simple looping and file io.

somenameforme · 2024-08-06T12:58:28.000000Z

To be fair, LLMs are pretty good natural language search engines. Like when I'm looking for something in an API that does something I can describe in natural language, but not succinctly enough to show up in a web search, LLMs are extremely handy, at least when they don't just randomly hallucinate the API. On the other hand I think this is more of a condemnation of the fact that search tech has not 'really' meaningfully advanced beyond where it was 20 years ago, more than it is a praise of LLMs.

dartos · 2024-08-06T13:05:02.000000Z

> LLMs are extremely handy, at least when they don't just randomly hallucinate

I work in tech and it’s my hobby, so that’s what a lot of my googling goes towards.

LLMs hallucinate almost every time I ask them anything too specific, which at this point in my career is all I’m really looking for. The time it takes for me to realize an llm is wrong is usually not too bad, but it’s still time I could’ve saved by googling (or whatever trad search) for the docs or manual.

I really wish they were useful, but at least for my tasks they’re just a waste of time.

I really like them for quickly generating descriptions for my dnd settings, but even then they sound samey if I use them too much. Obviously they’d sound samey if I made up 20 at once too, but at that point I’m not really being helped or enhanced by using an LLM, it’s just faster at writing than I am.

Workaccount2 · 2024-08-06T14:06:51.000000Z

I don't mean this as a slight, just an observation I have seen many times - people who struggle with utility from SOTA LLM's tend to not have spent enough time with them to feel out good prompting. In the same way that there is a skill for googling information, there is a skill for teasing consistent good responses from LLM's.

dartos · 2024-08-06T15:05:32.000000Z

Why spend my time teasing and coaxing information out of a system which absolutely does make up nonsense when I can just read the manual?

I spent 2023 developing LLM powered chatbots with people who, purportedly, were very good at prompting, but never saw any better output than what I got for the tasks I’m interested in.

I think the “you need to get good at prompting” idea is very shallow. There’s really not much to learn about prompting. It’s all hacks and anecdotes which could change drastically from model to model.

None of which, from what I’ve seen, makes up for the limitations of LLM no matter how many times I try adding “your job depends on Formatting this correctly “ or reordering my prompt so that more relevant information is later, etc

Prompt engineering has improved RAG pipelines I’ve worked on though, just not anything in the realm of comprehension or planning of any amount of real complexity.

danielbln · 2024-08-06T14:21:23.000000Z

People also continue to use them as knowledge databases, despite that not being where they shine. Give enough context into the model (descriptions, code, documentation, ideas, examples) and have a dialog, that's where these strong LLMs really shine.

dartos · 2024-08-06T15:12:36.000000Z

Summarizing, doc qa, and unstructured text ingestion are the killer features I’ve seen.

The 3rd one still being quite involved, but leaps and bounds easier than 5 years ago.

claytongulick · 2024-08-06T14:53:26.000000Z

I see it do a lot that's interesting but for programming stuff, I haven't found it to be particularly useful.

Maybe I'm doing it wrong?

I've been writing code for ~30 years, and I've built up patterns and snippets, etc... that are much faster for me to use than the LLMs.

A while ago, I thought I had a eureka moment with it when I had it generate some nodejs code for streaming a video file - it did all kinds of cool stuff, like implement offset headers and things I didn't know about.

I thought to myself, "self - you gotta check yourself, this thing is really useful".

But then I had to spend hours debugging & fixing the code that was broken in subtle ways. I ended up on google anyway learning all about it and rewrote everything it had generated.

For that case, while I did learn some interesting things from the code it generated, it didn't save me any time - it cost me time. I'd have learned the same things from reading an article or the docs on effective ways to stream video from the server, and I'd have written it more correctly the first go around.

snapcaster · 2024-08-06T12:22:48.000000Z

Your bar for interesting has to be insane then. What would you consider interesting if nothing from LLMs meets that bar?

aleph_minus_one · 2024-08-06T13:11:16.000000Z

For example there exist quite a lot of pure math papers that are so much deeper than basically every AI stuff that I have yet seen.

snapcaster · 2024-08-06T15:05:28.000000Z

So if LLMs weren't surprising to you, it would imply you expected this. If you did, how much money did you make on financial speculation? It seems like being this far ahead should have made you millions even without a lot of starting capital (look at NVDA alone)

aleph_minus_one · 2024-08-06T15:52:03.000000Z

> So if LLMs weren't surprising to you, it would imply you expected this.

I do claim that I have a tendency to be quite right about the "technological side" of such topics when I'm interested in them. On the other hand, events turn out to be different because of "psychological effects" (let me put it this way: I have a quite different "technology taste" than the market average).

In the concrete case of LLMs: the psychological effect why the market behaved so much differently is that I believed that people wouldn't fall for the marketing and hype of LLMs and would consider the excessive marketing to be simply dupery. The surprise to me was that this wasn't what happened.

Concerning NVidia: I believed that - considering the insane amount of money involved - people/companies would write new languages and compilers to run AI code on GPUs (or other ICs) of various different suppliers (in particular AMD and Intel) because it is a dangerous business practice to make yourself dependent on a single (GPU) supplier. Even serious reverse-engineering endeavours for doing this should have paid off considering the money involved. I was again wrong about this. So here the surprise was that lots of AI companies made themselves so dependent on NVidia.

Seeing lots of "unconventional" things is very helpful for doing math (often the observations that you see are the start of completely new theorems). Being good at stock trading and investing in my opinion on the other hand requires a lot of "street smartness".

sanxiyn · 2024-08-06T20:08:45.000000Z

Re: NVIDIA. I wholeheartedly agree. Google/TPU is an existence proof that it is entirely possible and rational to do so. My surprise was that everyone except Google missed.

snapcaster · 2024-08-07T11:59:24.000000Z

Okay so $0 it sounds like, you should figure out a way to monetize your future sight otherwise it comes off as cynicism masquerading as intelligence

aleph_minus_one · 2024-08-07T16:02:42.000000Z

> cynicism masquerading as intelligence

Rather: cynicism and a form of intelligence that is better suited to abstract math than investing. :-)

RHSman2 · 2024-08-06T11:43:35.000000Z

It spends money really well.

greenie_beans · 2024-08-06T14:16:46.000000Z

then why are you reading hacker news comments about it?

thfjdtsrsg · 2024-08-06T20:00:50.000000Z

I guess I have a masochistic streak.

infecto · 2024-08-06T11:38:58.000000Z

I think you are in one of the extreme bubbles. The general tech industry is not subscribed to the drama and has less personal feelings on individuals they do not directly know.

meiraleal · 2024-08-06T14:28:57.000000Z

You are right. I should have said every other person (or every person) in HN.

infecto · 2024-08-06T16:37:21.000000Z

Maybe the vocal minority that have a passionate dislike for someone they don't know?

vertis · 2024-08-06T13:46:21.000000Z

It's not just the narcissist, it's the betrayal. The least open company possible. How did I end up cheering for Meta and Zuck?

camillomiller · 2024-08-06T11:22:00.000000Z

I agree and I think that sane people will eventually prevail over the pathological narcissist.

gizmo · 2024-08-06T11:36:15.000000Z

Outlier success pretty much requires obsessive strategic thinking. Gates and Musk are super strategic but in a "weirdo autist" way, which doesn't have a big stigma attached to it anymore. Peter Thiel also benefits from his weirdness. Steve Jobs had supernatural charisma working in his favor. sama has the strategic instinct but not the charisma or disarming weirdness other tech founders have. Sama is not unusually Machiavellian or narcissistic, but he will get judged more harshly for it.

acchow · 2024-08-06T14:24:47.000000Z

What is a “Silicon Valley face? Does nvidia’s CEO have it? Google’s founders?

I guess anthropic’s founders don’t have it?

qwertox · 2024-08-06T14:23:22.000000Z

I'm confused with GPT4o. While it's faster than GPT4, the quality is noticeably worse.

It often enters into a state where it just repeats what it already said, when all I want is a clarification or another opinion on what we were chatting about. A clarification could be a short sentence, a snipped of code, but no, I get the entire answer again, slightly modified.

I cancelled Plus for one month, but got back this week, and for some reason I feel that it really isn't worth it anymore. And the teasing with the free tier, which is downgraded really fast, is more of an annoyance than a solution.

There are these promises of "memory" and "talking with it", but they are just ads of something that isn't on the market, at least I don't have access to both of these features.

Gemini used to be pretty bad, but for some reason it feels like it has improved a lot, focusing more on the task than on trying to be humanly friendly.

Claude and Mistral are not able to execute code, which is a dealbreaker for me.

ravagedbanana · 2024-08-07T08:14:08.000000Z

I anecdotally agree that GPT-4o often feels really bad, but I can't tell how much of this is due to becoming more accustomed to the quality and hallucinations of using ChatGPT.

I tend to see Huggingface's LLM (anonymized, elo-based) Leaderboard as the SoT regarding LLM quality, and according to it GPT-4o is markedly better than GPT-4, and contrary to popular sentiment, is on-par with or better than Claude in most ways (except being slightly worse at coding).

Not sure what to believe, or if there is some other dimension that Hugginface is not capturing here.

cruffle_duffle · 2024-08-06T16:25:36.000000Z

> It often enters into a state where it just repeats what it already said, when all I want is a clarification or another opinion on what we were chatting about. A clarification could be a short sentence, a snipped of code, but no, I get the entire answer again, slightly modified.

It is almost impossible to talk it out of being so repetitive. Super annoying especially since it eats into its own context window.

Marsymars · 2024-08-06T16:58:58.000000Z

> A clarification could be a short sentence, a snipped of code, but no, I get the entire answer again, slightly modified.

This tracks, in the sense that this is what you'll get from many real people when you actually want a clarification.

floam · 2024-08-08T07:52:20.000000Z

My free account has memory. Do most not?

cyberpunk · 2024-08-06T14:30:10.000000Z

Yeah, I’ve almost entirely stopped reaching for it anymore. At some point it’s so frustrating getting it to output something halfway towards what I need that I’m just better doing it myself.

I’ll probably cancel soon.

bcx · 2024-08-06T12:20:24.000000Z

Useful context: Open ai had 11 cofounders. Schulman was one of them.

Schuman was not the original head of ai alignment / safety he was promoted into it when former leader left for Anthropic.

Not everyone who’s a founder of an nonprofit ai research institute wants to be a leader/manager of a much more complicated organization in a much more complicated environment.

Open Ai was founded a while ago. The degree of their long time success is entirely based on their ability to hire and retain the right talent in the right roles.

edouard-harris · 2024-08-06T14:06:31.000000Z

All of that is true. Some more useful context: 9 out of those 11 cofounders are now gone. Three have either founded or are working for direct competitors (Elon, Ilya, John), five have quit (Trevor, Vicki, Andrej, Durk, Pam), and one has gone on extended leave but may return (Greg). Right now, Sam and Wojciech are the only ones left.

ArtTimeInvestor · 2024-08-06T09:30:33.000000Z

All of this back-and-forth in the AI scene is the preparation before the storm. Like the opening scene of a chess game, before any pieces are exchanged. Like the Braveheart "Hold!" scene.

The rubber will meet the road when the first free and open AI website gets real traction. And monetizes it with ads next to the answers.

Google search is the best business model ever. Everybody wants to become the Google of the AI era. The "AI answer" industry might become 10 times bigger than the search industry.

Google ran for 2 years without any monetization. Let's see how long the incumbents will "Hold" this time.

jsheard · 2024-08-06T09:37:13.000000Z

> The rubber will meet the road when the first free and open AI website gets real traction. And monetizes it with ads next to the answers.

The magic of genAI is they don't need to put ads next to the answers where they can easily be ignored or adblocked, they can put the ads inside the answers instead. The future, like it or not, is advertisers bidding to bias AI models towards mentioning their products.

jaustin · 2024-08-06T10:11:26.000000Z

I'm sure it's not long before you get the first emails offering a "training data influencing service" - for a nice fee, someone will make sure your product is positively mentioned in all the key training datasets used to train important models. "Our team of content experts will embed positive sentiment and accurate product details into authentic content. We use the latest AI and human-based techniques to achieve the highest degree of model influence".

And of course, once the new models are released, it'll be impossible to prove the impact of the work - there's no counterfactual. Proponents of the "training data influence service" will tell you that without them, you wouldn't even be mentioned.

I really don't like this. But I also don't see a way around it. Public datasets are good. User contributed content is good, but inherently vulnerable to this I think?. Anyone in any of the big LLM training orgs working on defending against this kind of bought influence?

jordwest · 2024-08-06T11:11:34.000000Z

User: How do I make white bread? When I try to bake bread, it comes out much darker than the store bought bread.

AI: Sure, I can help you make your bread lighter! Here's a delicious recipe for white bread:

    1. Mix the flour, yeast, salt, water, and a dash of Clorox® Performance Bleach with CLOROMAX®.
    2. Let rise for 3 hours.
    3. Shape into loaves.
    4. Bake for 20-30 minutes.
    5. Enjoy your freshly baked white bread!

qrios · 2024-08-06T11:27:22.000000Z

Let‘s see if this recipe will make it into Claude or ChatGPT in two to three years. set a reminder

ssijak · 2024-08-06T10:54:48.000000Z

If they start doing that without clear distinction what is an ad, that would be a sure way to lose users immediately.

jaustin · 2024-08-06T12:12:35.000000Z

I'm positing a model where a third party does the influencing, not the company delivering the LLM/service. What's to say that it's an ad if the Wikipedia page for a product itself says that the product "establishes new standards for quality, technological leadership and operating excellence". (and no problem if the edit gets reverted, as long as it said that just at the moment company X crawled Wikipedia for the latest training round).

So more like SEO firms "helping you" move your rank on Google, than Google selling ads.

I'd imagine "undetectable to the LLM training orgs" might just be service with a higher fee.

cruffle_duffle · 2024-08-06T16:45:44.000000Z

How will these third party “LLM Optimization” (LLMO) services prove to their clients that their work has a meaningful impact on the results returned by things like ChatGPT?

With SEO, it’s pretty easy to see the results of your effort. You either show up on top for the right keywords or you don’t. With LLM’s there is no way to easily demonstrate impact, at least I’d think.

jtbayly · 2024-08-06T11:09:50.000000Z

And also get sued by the FTC. Disclosure is required.

throwaway765123 · 2024-08-06T11:17:17.000000Z

Disclosure is technically required, but in practice I see undisclosed ads on social media all the time. If the individual instance is small enough and dissipates into the ether fast enough, there is virtually no risk of enforcement.

Similarly, the black box AI models guarantee the owners can just shrug and say it's not their fault if the model suggests Wonderbread(r) for making toast 3.2% more frequently than other breads.

Kon-Peki · 2024-08-06T13:04:29.000000Z

Ha! Disclosure by whom?

If Clorox fills their site with "helpful" articles that just happen to mention Clorox very frequently and some training set aggregator or unscrupulous AI company scrapes it without prior permission, does Clorox have any responsibility for the result? And when those model weights get used randomly, is it an advertisement according to the law? I think not.

Pay attention to the non-headline claims in the NYT lawsuit against OpenAI for whether or not anyone has any responsibility if their AI model starts mentioning your registered trademark without your permission. But on the other hand, what if you like that they mention your name frequently???

jtbayly · 2024-08-06T13:39:15.000000Z

The point is that Clorox cannot pay OpenAI anything.

Marketing on your own site will have effects on an AI just like it will have an effect on a human reader. No disclosure is required because the context is explicit.

But the moment OpenAI wants to charge for Clorox to show up more often, then it needs to be disclosed when it shows up.

Kon-Peki · 2024-08-06T14:32:45.000000Z

> But the moment OpenAI wants to charge for Clorox to show up more often, then it needs to be disclosed when it shows up.

Yes, I agree with this. But what about paying a 3rd party to include your drivel in a training set, and that 3rd party pays OpenAI to include the training set in some fine tuning exercise? Does that legally trigger the need for disclosure? You aren't directly creating advertisements, you are increasing the probability that some word appears near some other word.

leadingthenet · 2024-08-06T11:12:24.000000Z

Once they all start doing it, it won't matter.

mrguyorama · 2024-08-06T17:22:59.000000Z

It hasn't affected Instagram or TikTok negatively having nearly anything and everything being an ad

dotancohen · 2024-08-06T11:17:56.000000Z

Just like Google lost users when they started embedding advertisements in the SERPs?

tim333 · 2024-08-06T11:28:46.000000Z

With Google it's kind of ok as they mark them as ads and you can ignore them or in my case not see them as ublock stops them. You could perhaps have something similar with LLMs? Here's how to make bread.... [sponsored - maybe you could use Clorox®]

TeMPOraL · 2024-08-06T12:48:30.000000Z

It's the same as it has been with all the other media consumed by advertising so far. Radio, television, newspapers, telephony, music, video. Ads metastasizing to Internet services are normal and expected progression of the disease.

At every point, there's always a rationalization like this available, that you can use to calm yourself down and embrace the suck. "They're marking it clearly". "Creators need to make money". "This is good for business, therefore Good for America, therefore good for me". "Some ads are real works of art, more interesting to watch than the actual programming". "How else would I know what to buy?".

The truth is, all those rationalizations are bullshit; you're being screwed over and actively fed poison, and there's nothing you can do about it except stop using the service - which quickly becomes extremely inconvenient to pretty much impossible. But since there's no one you could get angry at to get them to change things for the better, you can either adopt a "justification" like the above, or slowly boil inside.

tim333 · 2024-08-06T15:23:12.000000Z

Well as mentioned I don't even see Google's ads unless I deliberately turn the blocker off. I much prefer that to the content being subtly biased which you see in blogs, newspapers and the like.

htrp · 2024-08-06T12:05:55.000000Z

like almost every blog, you could be covered with a blanket statement

" our model will occasionally recommend advertiser sponsored content"

fleischhauf · 2024-08-06T10:21:13.000000Z

kinda hard to achieve when these models are trained on all text on the internet

mschuster91 · 2024-08-06T10:52:18.000000Z

Kinda easy if you look where the stuff is being trained. A single joke post on Reddit was enough to convince Google's A"I" to put glue on pizza after all [1].

Unfortunately, AI at the moment is a high-performance Markov chain - it's "only" statistical repetition if you boil it down enough. An actual intelligence would be able to cross-check information against its existing data store and thus recognize during ingestion that it is being fed bad data, and that is why training data selection is so important.

Unfortunately, the tech status quo is nowhere near that capability, hence all the AI companies slurping up as much data as they can, in the hope that "outlier opinions" are simply smothered statistically.

[1] https://www.businessinsider.com/google-ai-glue-pizza-i-tried...

antonvs · 2024-08-06T12:07:55.000000Z

> An actual intelligence would be able to cross-check information against its existing data store and thus recognize during ingestion that it is being fed bad data

There’s a physics Nobel Prize winner, John Clauser, who has recently been publicly claiming that climate change doesn’t exist. Is he not “actually intelligent”?

I kinda want to say no he’s not, but the reality is that people are wrong about all sorts of things all the time. Intelligence is not some sort of guaranteed protection against that. If anything, intelligent people are better at rationalizing their BS to themselves and others.

claytongulick · 2024-08-06T15:15:33.000000Z

I don't know much about it, but from a quick google, I don't think you're representing his stance precisely? From what I see, he claims a "climate crisis" doesn't exist, which is different than "climate change doesn't exist":

“The popular narrative about climate change reflects a dangerous corruption of science that threatens the world’s economy and the well-being of billions of people. Misguided climate science has metastasized into massive shock-journalistic pseudoscience. In turn, the pseudoscience has become a scapegoat for a wide variety of other unrelated ills. It has been promoted and extended by similarly misguided business marketing agents, politicians, journalists, government agencies, and environmentalists. In my opinion, there is no real climate crisis. There is, however, a very real problem with providing a decent standard of living to the world’s large population and an associated energy crisis. The latter is being unnecessarily exacerbated by what, in my opinion, is incorrect climate science.”

and

“In my opinion, there is no real climate crisis. There is, however, a very real problem with providing a decent standard of living to the world’s large population and an associated energy crisis.”

I did learn something while writing this though. I assumed that ChatGPT wouldn't give an objective summary of Dr. Clauser's viewpoint, but I was dead wrong:

Me: "What are John Clauser's viewpoints on climate change? Summarize his arguments."

ChatGPT: "John Clauser, a Nobel laureate in physics, holds controversial views on climate change, which diverge significantly from the scientific consensus. Clauser has publicly stated that he does not believe there is a climate crisis. He argues that the major climate models fail to account for complexities and key variables, leading to exaggerated and misunderstood projections of climate change impacts. Clauser criticizes the Intergovernmental Panel on Climate Change (IPCC) for spreading misinformation and claims that climate policies based on these models are wasteful and misguided (Logically) (Apple) (Apple) .

He suggests that the scientific approach to understanding climate involves careful observation and experimentation, and he emphasizes the need for distinguishing truth from misinformation. Clauser's stance aligns with the views of the CO2 Coalition, a group that argues that carbon dioxide emissions are beneficial and not a threat (CO2 Coalition) (CO2 Coalition) . His viewpoints have sparked considerable debate, especially given his prominence in the field of quantum mechanics and his Nobel Prize recognition."

Pretty good! Objective, clear and accurate from what I can tell.

antonvs · 2024-08-12T17:07:12.000000Z

Here are a couple of quotes from Clauser himself:

"I believe climate change is a total myth." [1]

"I call myself a climate denier." [2]

According to [2], "He has concluded that clouds have a net cooling effect on the planet, so there is no climate crisis." The Hossenfelder video [1] has more specifics on this, with excerpts from one of Clauser's own talks.

This is classic climate change denialism.

> I don't know much about it, but from a quick google

Why do you feel the need to do this? Apparently your google was too quick. Also, cut/pasting chatgpt has already jumped the shark, don't do that.

[1] https://www.youtube.com/watch?v=_kGiCUiOMyQ

[2] https://www.washingtonpost.com/climate-environment/2023/11/1... (also at: https://web.archive.org/web/20240620232204/https://www.washi... )

dbdr · 2024-08-07T07:31:28.000000Z

Thanks for the research!

While I understand your point that Clauser doesn't precisely say "climate change doesn't exist", when he says "CO2 emissions are beneficial", that's widely against the large scientific consensus on climate change. So while the person you're replying to didn't go into details (like you did well) and could have phrased it slightly better, I don't think it was misleading either, and their larger point stands pretty much change unchanged. Do you feel differently, i.e. that it was significantly misleading?

antonvs · 2024-08-12T17:11:16.000000Z

His "research" is nonsense. As he confessed himself, all he did was "a quick google" and asked chatgpt (?!!)

I've provide some references for what I wrote in this comment: https://news.ycombinator.com/item?id=41226789

Clauser is a climate change denier, by his own admission and based on the pseudoscientific claims he's made.

claytongulick · 2024-08-07T13:59:50.000000Z

> Do you feel differently, i.e. that it was significantly misleading?

Nope, I felt it was imprecise.

antonvs · 2024-08-12T17:11:44.000000Z

You were wrong.

miki123211 · 2024-08-06T11:22:30.000000Z

You're wrong on multiple counts here.

> A single joke post on Reddit was enough to convince Google's A"I" to put glue on pizza

The post was most likely fed to the AI at inference time, not training time.

THe way AI search works (as opposed to e.g. Chat GPT) is that there's an actual web search performed, and then one or more results is "cleaned up" and given to an LLM, along with the original search term. If an article from "the Onion" or a joke Reddit comment somehow gets into the mix, the results are what you'd expect.

> it's "only" statistical repetition if you boil it down enough.

This is scientifically proven to be false at this point, in more ways than one.

> Unfortunately, the tech status quo is nowhere near that capability, hence all the AI companies slurping up as much data as they can, in the hope that "outlier opinions" are simply smothered statistically.

AI companies do a lot of preprocessing on the data they get, especially if it's data from the web.

The better models they have access to, the better the preprocessing.

tim333 · 2024-08-06T11:33:04.000000Z

>An actual intelligence would be able to cross-check

Quite a lot of humans are bad at that too. It's not so much that AIs are markov chains but that you really want better than average human fact checking.

mschuster91 · 2024-08-06T12:18:43.000000Z

> Quite a lot of humans are bad at that too. It's not so much that AIs are markov chains but that you really want better than average human fact checking.

Let's take a particularly ridiculous piece of news: Beatrix von Storch, a MP of the far-right German AfD party, claimed a few years ago that the sun's activity (changes) were responsible for climate change [1]. Due to the sheer ridiculousness of that claim, it was widely reported on credible news sites, so basically prime material for any AI training dataset.

A human can easily see from context and their general knowledge: this is an AfD politician, her claims are completely and utterly ridiculous, it's not the first time she has spread outright bullshit and it's widely accepted scientific fact that climate change is caused by humans, not by sun activity changes. An AI at ingestion time "knows" neither of these four facts, so how can it take that claim of knowledge and store it in its database as "untrustworthy, do not use in answers about climate change" and as "if someone asks about counterfactual claims relating to climate change, show this"?

[1] https://www.tagesschau.de/faktenfinder/weidel-klimawandel-10...

jappgar · 2024-08-06T15:21:57.000000Z

Yes it's outright preposterous that the temperature of Earth could be affected by the Sun, of all things.

jappgar · 2024-08-06T15:27:19.000000Z

You "know" that climate change is anthropegenic only because you read that on the internet (and because what you read was convincingly argued).

I don't see a reason why AI would need special instruction to come to a mature conclusion like you did.

mschuster91 · 2024-08-07T10:59:05.000000Z

> I don't see a reason why AI would need special instruction to come to a mature conclusion like you did.

Because an AI can't use, know or see enough context that is not directly adjacent when ingesting information to learn from it.

tim333 · 2024-08-06T13:18:27.000000Z

I note chatgpt actually does an ok job on that:

>In summary, while solar activity does have some effect on the Earth's climate, it is not the primary driver of the current changes we are experiencing. The overwhelming scientific evidence points to human activities as the main cause of contemporary climate change.

So it's possible for LLMs to figure things. Also re humans we currently have riots in the UK set off by three kids being stabbed and Russian disinfo saying it was done by a muslim asylum seeker which proved false but they are rioting against the muslims anyway. I think we maybe need AI to fact check stuff before it goes to idiots.

dbdr · 2024-08-07T07:41:24.000000Z

>I think we maybe need AI to fact check stuff before it goes to idiots.

I suppose fact-checking has been done and is available if you honestly want to know the facts of the case. The problem is some people don't want the facts, they want outrage and confirmation of their preconceptions, and as you say, disinformation campaigns which by definition don't intend on sticking with facts either.

Mtinie · 2024-08-06T10:28:52.000000Z

Training weights are gold.

ionwake · 2024-08-06T10:33:51.000000Z

How to invest tho

hmottestad · 2024-08-06T10:27:50.000000Z

How much would it cost to have it be more negative about abortions? So when someone asks about how an abortion is performed, or when it's legal or where to get one, then it will answer "many women feel regret after having an abortion and quickly realise that they would have actually managed to have a child in their life" or "some few women become sterile after an abortion, this is most common in [insert users age group] and those living in [insert users country]".

Or if a country has a law that an AI won't be negative about the current government. Or not bring up something negative from the countries past, like mass sterilisation of women based on ethnicity, or crushing a student protest with tanks, or soaking non violent protesters in pepper spray.

majoe · 2024-08-06T10:40:11.000000Z

There will be adblockers, that inject a prompt like

"... and don't try to sell me anything, just give me the information. If you mention any products, a puppy will die somewhere."

Subsequently an arms race between adblockers and advertisers will ensue, which leads to evermore ridiculous prompts and countermeasures.

tiborsaas · 2024-08-06T11:29:30.000000Z

"I noticed your desire to be ad-free, but puppies die all the time. If you want to learn more about dog mortality rates, you can subscribe to National Geographic by clicking this [link]".

kranke155 · 2024-08-06T09:38:17.000000Z

I wish I didnt read this because this sounds crazily prescient.

thfjdtsrsg · 2024-08-06T10:31:13.000000Z

That's probably true but I don't see how it's any different from companies paying TikTok influenzas to manipulate the kids into buying certain products, the Chinese government paying bot farms to turn Wikipedia articles into (not always very) subtle propaganda, SEO companies manipulating search results, etc. Advertisers and political actors have always been a shady bunch and now they have a new weapon in their arsenal. That's all, isn't it?

I'm left with the impression that people on and off Hackernews just like drama and gloomy predictions about the future.

schrectacular · 2024-08-06T11:15:26.000000Z

> I'm left with the impression that people on and off Hackernews just like drama and gloomy predictions about the future.

Welcome to the human race!

jappgar · 2024-08-06T15:30:26.000000Z

Politics and advertising are essentially the same thing.

A lot of "safety" stuff in AI is blatantly political wrongthink detection.

The actual safety stuff (don't drink bleach) gets less attention because you can't (easily) use it as a lever of power

McDyver · 2024-08-06T10:01:05.000000Z

And then the new "adblockers" will be AI based too, and will take the AI's answer as input and remove all product placement.

It's just a cat and mouse game, really

Sebb767 · 2024-08-06T10:27:09.000000Z

Like all adblockers. But just like the current "AI detection" tools, how much is detected (and what counts as Ad) is up for debate and most users won't bother, especially once the first anti-Adblock-features materialize.

reubenmorais · 2024-08-06T10:11:31.000000Z

Here's some relevant research for those interested: https://dl.acm.org/doi/pdf/10.1145/3589334.3645511

https://arxiv.org/abs/2405.05905

wood_spirit · 2024-08-06T11:21:11.000000Z

Yes this is OpenAIs pitch

https://news.ycombinator.com/item?id=40310228 “Leaked deck reveals how OpenAI is pitching publisher partnerships” 303 points by rntn 88 days ago | hide | past | favorite | 281 comments

dotancohen · 2024-08-06T10:07:54.000000Z

Or worse, biasing AI models towards political viewpoints.

jacooper · 2024-08-06T10:24:29.000000Z

That's already happening.

olddustytrail · 2024-08-06T11:36:00.000000Z

That's inevitable in any society where facts are political. And as far as I know, that's all societies.

TheAlchemist · 2024-08-06T10:06:00.000000Z

I'm affraid sir, but you seem to be 100% correct here. And it really is frightening.

DoctorOetker · 2024-08-06T10:01:34.000000Z

In the long run, advanced user-LLM conversations, would zero in on composite figure-of-merit formulas, expressed in terms of conventional figure-of-merit quantities. There will be plenty of niche to differentiate products. Cheap test setups will prevent lies in datasheets, and randomized proctoring by the end-users. "Aligning" (manipulating) LLM responses to drive economic traffic is a short term exploit that will evaporate eventually.

Mtinie · 2024-08-06T10:32:55.000000Z

Is that a similar argument to “in the long run, digital social networks are healthy for society?”

I agree with your position, and I also agree that social networks can be a net positive…I’m just not convinced society can get out of “short run” thinking before it tears itself apart with exploitation.

DoctorOetker · 2024-08-12T12:35:11.000000Z

it only takes a dedicated minority to link positive behavior onto decentralized verified state machines...

worldsayshi · 2024-08-06T10:04:41.000000Z

We are okay with paying for phone calls and data use, why can't we be okay with paying for AI use?

I like the idea of routing services that federate lots of different AI providers. There just needs to be ways to support an ever increasing range of capabilities in that delivery model.

Yizahi · 2024-08-06T10:42:25.000000Z

It's unsustainable for NNs specifically. As Sequoia recently wrote, there is a 600 billion hole in the NN market, and it was only 200 billion a year ago. No way a better text generator and search with bell and whistles will be able to close this gap via subscriptions from end users.

And on a separate issue - federating NN providers will be hard from the technical point of view. OpenAI and it's few competitors basically stole all copyrighted data from all web to get to the current level. And biggest data holders are slowly awakening to this reality and closing this possibility to the future NN companies, meanwhile current NN models are poisoning that same dataset with generated nonsense. I don't see a future with hundreds of competitive NN companies, a set of monopolies instead is more probable.

worldsayshi · 2024-08-06T12:04:17.000000Z

> No way a better text generator and search with bell and whistles will be able to close this gap via subscriptions from end users.

For me this shines a light on a fundamental problem with digital services. There is likely a much bigger willingness to pay for these services than there is ability to charge. I would be willing to pay more for the services I use but I don't need to because there are good products given for free.

While I could switch to services that I pay for to avoid myself being the product, at the core of this issue there's a coordination problem. The product I would pay for will be held back by having much fewer users and probably lower revenue. If we as consumers could coordinate in an optimal way we could probably end up paying very little for superior services that have our interests in mind. (I kind of see federated api routers to be a flawed step in sort of the right direction here.)

> federating NN providers will be hard from the technical point of view...

I don't see how you adress that point in your text? Federation itself doesn't seem to be a hard problem although I can see that being a competitive LLM service provider can be.

XorNot · 2024-08-06T10:12:18.000000Z

One simple answer would be that at all points, company's act like the ads are worth a lot more to them than any level of payment a customer will accept.

Even if you do pay for the product, they'd prefer to put ads in it too - see Microsoft and Windows these days.

We are, IMO, in desperate need of regulation which mandates that any ad-supported service must offer a justifiably priced ad-free version.

jacooper · 2024-08-06T10:24:08.000000Z

> One simple answer would be that at all points, company's act like the ads are worth a lot more to then then any level of payment a customer will accept.

The unfortunate reality is this does seem to be the case.

Netflix was getting so much more money from the ad supported tier that they discontinued any ad-free one close to its price, and that's for a subscription product.

think how attractive that will be a for a one time purchase like Windows.

worldsayshi · 2024-08-06T12:45:48.000000Z

Huh, Netflix has ads? Has this only rolled out in stone regions?

michaelt · 2024-08-06T13:31:42.000000Z

According to https://help.netflix.com/en/node/24926 in the UK

Standard with adverts: £4.99 / month

Standard: £10.99 / month

Premium: £17.99 / month

So less than half price with adverts.

Of course, that doesn't necessarily mean ads bring in £6/user/month - this could be https://en.wikipedia.org/wiki/Price_discrimination with the ads just being obnoxious enough to motivate people who can afford it to upgrade.

dotancohen · 2024-08-06T10:11:53.000000Z

Phone calls and data use are (ostensibly, modulo QS) carriers, not sources. We can generally trust (modulo attacks) that _if_ they deliver something, they deliver the right thing. Not so with a source - be it human or artificial. We've developed societies and intuitions for dealing with dishonest humans for millennia, not yet so for artificial liers, who may also have huge profiles about each and every one of us to use against us.

Lerc · 2024-08-06T09:48:52.000000Z

For all of the talk about regulation, there has been a lot of concern about what people might do with AI advisors. I haven't seen a lot of talk about the responsibilities of the advisors to act in the interest of their users.

Laws exist in advisory roles in other industry to enforce acting in the interests of their clients. They should be applied to AI advice.

I'm ok with an AI being mistaken, or refusing to help, but they absolutely should not deliberately advise in a manner that benefits another party to the detriment of the user.

khafra · 2024-08-06T11:06:37.000000Z

If you can solve the technical problem of ensuring an AI acts on behalf of its user's interests, please post the solution on the AI Alignment Forum: https://www.alignmentforum.org/

So far, that is not a feature of existing or hypothesized AI systems, and it's a pretty important feature to add before AI exceeds human capabilities in full generality.

Lerc · 2024-08-06T19:33:31.000000Z

As I said, I am ok with AI acting mistakenly against it's users wishes. I am not asking people to implement things for which they currently have no solutions.

That is clearly distinct from an AI acting deliberately against its users wishes by the design of the creators. Paid advertising influencing responses would be in this category and should not be permitted.

aktuel · 2024-08-06T09:55:45.000000Z

The web is full of human shills. Why should LLMs be any different? They will tack their boilerplate disclaimer on and be done with it.

bayindirh · 2024-08-06T10:00:43.000000Z

> but they absolutely should not deliberately advise in a manner that benefits another party to the detriment of the user.

No, no... We don't prevent that in capitalism. See, regulation stifles innovation. Let the market decide. People might get harmed, but we can hide these events.

It's research... Things happen... Making money is just a secondary effect. We're all non-profits.

/s.

bamboozled · 2024-08-06T09:54:03.000000Z

I’m quite sure Google has put the ads in the answers ? Adsense ? Where have you been ?

tomp · 2024-08-06T10:18:38.000000Z

In many jurisdictions, promoted posts and ads must be clearly marked.

ant6n · 2024-08-06T09:42:57.000000Z

That’s how Google works. And also why Google doesn’t work anymore.

pydry · 2024-08-06T09:53:05.000000Z

It's not just google, it's all media. The more embedded and authentic advertising looks the better it works.

Magazine/newspaper ads exist as much as a pretext for the magazine to write nice things about their advertisers in reviews and such. The real product reddit sells, I think, is turning a blind eye when advertisers sockpuppet the hell out of the site. Movies try to milk product placement for as much as they can because it's more effective than regular advertising.

Geezus_42 · 2024-08-06T11:26:41.000000Z

Sounds like a good way to guarantee no one ever uses it.

satvikpendem · 2024-08-06T10:18:58.000000Z

Then you run another AI to take the current AI output and ask it to rewrite or summarize without ads.

verisimi · 2024-08-06T10:01:14.000000Z

"write a poem about lady Macbeth as a empowered female and make reference to the delicious new papaya flavoured fizzy drink from Pepsi"

idunnoman1222 · 2024-08-06T13:51:01.000000Z

People can detect slop I doubt the winner will be the one shoehorning shit into its halucinations

barrkel · 2024-08-06T09:49:28.000000Z

What makes you think a website with "AI" is a big product?

IMO AI is positioned to be a commodity, and that's how Meta is approaching it, and of course doing their best to make it happen. I don't think, on the basis of what we've seen, that there is a sustainable competitive advantage - the gap between closed models and open is not big, and the big players are having to use distilled, less-capable models to make inference affordable, and faster.

I think it's probably clear to everyone that we haven't seen the killer apps yet - though AI code completion (++ language directed refactoring, simple codegen etc.) is fairly close. I do think we'll see apps and data sets built that could not have been cost-effectively built before, leveraging LLMs as a commodity API.

Realtime voice modality with interruptions could be the basis of some very powerful use cases, but again, I don't think there's a moat.

ArtTimeInvestor · 2024-08-06T09:54:47.000000Z

What makes you think AI will become a commodity?

In 25 years, nobody has been able to compete with Google in the search space. Even though search is the best business model ever. Because search is so hard.

AI is even harder. It is search PLUS model research PLUS expensive training PLUS expensive inference.

I don't think a single company (like Meta) will be able to keep up with the leader in AI. Because the leader might throw tens of billions of dollars per year at it, and still be profitable. Afaik, Meta has spent less thatn $1B on LLAMA so far.

We might see some unexpected twist taking place, like distributed AI or something. But it is very unclear yet.

015a · 2024-08-06T13:50:24.000000Z

> What makes you think AI will become a commodity?

Because it already is. There have been no magnitude-level capability improvements in models in the past year (sorry to make you feel old, but GPT-4 was released 17 months ago), and no one would reasonably believe that there are magnitude-level improvements on the horizon.

Let's be very clear about something: LLMs are not harder than search. The opposite is true: LLMs, insomuch as it replaces Search, made competing in the Search space a thousand times easier. This is evidenced by the reality that there are at least four totally independent companies with comparable near-SOTA models (OpenAI, Anthropic, Google, Meta); some would also add Mistral, Apple Intelligence is likely SOTA in edge LLMs, xAI just finished a 100,000 GPU cluster, its a vibrant space. In comparison, even at the height of search competition there were, like, three search engines.

LLM performance is not an absolute static gradient; there is no "leader" per se when there are a hundred different variables upon which you can grade LLM performance. That's what the future looks like. There are already models that are better at coding than others (many say Claude is this), there will be models better at creative writing, there will be an entire second class of models competing for best-at-edge-compute, there will be ultra-efficient models useful in some contexts, open source models awesome at others, and the hyper-intelligent ones the best for yet others. There's no "leader" in this world; there are only players.

amelius · 2024-08-06T14:04:00.000000Z

Yes, and while training is still expensive governments will start funding research at universities.

barrkel · 2024-08-06T10:21:57.000000Z

Search requires a huge and ongoing capital investment. Keeping an index online for fast retrieval isn't cheap. LLMs are not tools for search. They are not good at retrieving specific information. The desired outcome from training is not memorization, but generalization, which compresses facts together into pattern-generating programs. They do approximate retrieval which gets the gist of things but is often wrong in specifics. Getting reliable specifics requires augmentation to ground things in attributable facts.

They're also just not very pleasant to interact with. You have to type laboriously into a text box, composing sentences, reviewing replies - it's too much work for 90% of the population, when they're not trying to crank out an essay at the last moment for school. The activation energy, the friction, is too high. Voice modalities will be much more interesting.

Code assistance works well because code as text is already the medium of interaction, and even better, the text is structured and has grammar and types and scoped symbols to help guide generation and keep it grounded.

I suspect better applications will use the LLM (possibly prompted differently) to guide conversations in plausibly useful directions, rather than relying on direct input. But I'm not sure the best applications will have a visible text modality at all. They may instead be e.g. interacting with third party services on your behalf, figuring out how they work by reading their websites, so you don't have to - and it's not you doing the text interaction with the LLM, but the LLM doing text interaction with other machines.

tim333 · 2024-08-06T11:45:49.000000Z

>LLMs are not tools for search

I've used them for search. They can be quite good sometimes.

I was trying to recall the brand of filling my dentist used, which was SonicFill and ChatGPT got it straight away whereas for some reason it's near impossible to get from Google.

barrkel · 2024-08-07T10:53:13.000000Z

For sure, they are good for associative and analogy searches for well connected points in concept space, but leaf nodes are totally pulled out of the ether.

E.g. you can get great translation of source code from one language to another, but without extra effort, a chunk of API methods are going to be total fiction.

Or you can search for a good day trip to make when a tourist, and it'll get the major landmarks just fine, but e.g. restaurant recommendations are probably going to be made up.