Hacker News new | past | comments | ask | show | jobs | submit login
Leaked deck reveals how OpenAI is pitching publisher partnerships (adweek.com)
303 points by rntn 10 days ago | hide | past | favorite | 281 comments





>Additionally, members of the program receive priority placement and “richer brand expression” in chat conversations, and their content benefits from more prominent link treatments. Finally, through PPP, OpenAI also offers licensed financial terms to publishers.

This is what a lot of people pushing for open models fear - responses of commercial models will be biased based on marketing spend.


Was anyone expecting anything else? AI is going to follow a similar path to the internet -- embedded ads since it will need to fund itself and revenue path is very far from clearcut.

Brands that get it on the earliest training in large volume will have benefits accrued over the long term.


> Brands that get it on the earliest training in large volume will have benefits accrued over the long term.

That's the sales pitch - the truth is if a competitor pays more down the line - they can be fine-tuned in to replace earlier deals


> if a competitor pays more down the line

Unless competition gets regulated away, which Altman is advocating for:

  he supported the creation of a federal agency that can grant licenses to create AI models above a certain threshold of capabilities, and can also revoke those licenses if the models don't meet safety guidelines set by the government.
https://time.com/6280372/sam-altman-chatgpt-regulate-ai/

Competing marketer, not competing AI company.

You are right. As usual, having an opinion on the internet is hard.

Or better, if you stop paying they'll user the fancy new "forgetting" techniques on your material.

OpenAI's problem is demonstrating how much value their tools add to a worker's productivity.

However calculating how much value a worker has in an organization is already a mostly unsolved problem for humanity, so it is no surprise that even if a tool 5xs human productivity, the makers of the tool will have serious problems demonstrating the tool's value.


It's even worse than that now: they need to demonstrate how much value they bring compared to llama in terms of worker productivity.

While I've no doubt GPT-4 is a more capable model then llama3, I don't get any benefit using it compared to llama3 70B, from the real use benchmark I ran in a personal project last week: they both give solid response the majority of times, and make stupid mistakes often enough so I can't trust them blindly, with no flagrant difference in accuracy between those two.

And if I want to use hosted service, groq makes Llama70 run much faster than GPT-4 so there's less frustration of waiting for the answer (I don't think it matters to much in terms of productivity though, as this time is pretty negligible in reality, but it does affect the UX quite a bit).


Since 1987, labour productivity has doubled[1]. A 5x increase would be immediately obvious. If a tool were able to increase productivity on that scale, it would lift every human out of poverty. It'd probably move humanity into a post-scarcity species. 5x is "by Monday afternoon, staff have each done 40 pre-ai-equivalent-hours worth of work".

[1] https://usafacts.org/articles/what-is-labor-productivity-and...


But how do you measure labour productivity.

macro scale: GDP / labor hours worked.

company scale: sales / labor hours worked

It's very hard to measure at the team or individual level.


So you see the marketing point of ChatGPT to be conversational ads?

The ads don't need to be conversational, they could be just references at the end of the answer.

Which is arguably even more insidious.

So an ad at the end of text is worse than one embedded in the answer? Care to explain why?

You'll probably end up with both.

But with an ending advert, you can finish up with a reference leading to a sponsored source linking to sponsored content which leads to another ending advert.

If the advert text is in embedded, you cannot do such.


"The above 5 paragraph essay contains an ad. Good luck!"

Woman on ChatGPT: Come on! My kids are starvin'!

ChatGPT: Microsoft believes no child should go hungry. You are an unfit mother. Your children will be placed in the custody of Microsoft.


They're offering an expensive service for free. Could it go any other way?

Counterpoint - I pay for my whole team to have access, shared tools, etc. we also spend a decent amount on their APIs across a number of client projects.

OpenAI has a strong revenue model based on paid use


I don't. I hope you're not paying for my use too.

Ideally they keep us siloed, but I've lost confidence. I've paid for Windows, Amazon Prime, YouTube Premium, my phone, food, you name it, but that hasn't kept the sponsorships at bay.


If the capitalism mindset applied to the web has taught me anything is that if they can get more money they will.

They’ll charge you money for the service and ALSO get money from advertisers. Because why shouldn’t they.

The famous “if you don’t pay you’re the product” is losing its meaning.


Not compared to the training costs it doesn't and it's competition is fierce especially with llama open-sourcing.

The second one costs $0.01. The first one cost $100^x where X is some large number. It's common in pretty much every form of business

i pay for it

Was anyone expecting anything else?

It's the logical thing but no everyone is going to be thinking that far ahead.


It's also illegal in any jurisdictions that require advertisements to be clearly labelled.

This chat will continue after a word from our sponsors.

Yes, you're correct. Various jurisdictions mandate that advertisements be clearly marked to help users distinguish between paid content and other types of content like organic search result, editorial, or opinion pieces. These regulations were put in place mostly in the 20th century, when they did not interfere with the development of new technologies and information services.

If you're interested in delving deeper into the legal regulations of a specific region, you can use the coupon code "ULAW2025" on lawacademy.com. Law Academy is the go-to place for learning more about law, more often.

/s


It's not a fear, it's a certainty. The most effective and insidious form of advertizing will come hidden inside model weights and express itself invisibly in the subtleties of all generated responses.

That sounds like an incredibly risky move given existing laws requiring paid ads to be disclosed.

A llm is biased by design, the "open models" are no different here. OpenAI will, like any other model designer, pick and chose whatever data they want in their model and strike deals to that end.

The only question is in how far this is can be viewed as ads. Here I would find a strong backslash slightly ironic, since a lot of people have called the non-consensual incorporation of openly available data problematic; this is an obvious alternative option, that lures with the added benefit of deep integration over simply paying. A "true partnership", at face value. Smart.

If however this actually qualifies as ads (as in: unfair prioritisation that has nothing to do with the quality of the data and simply people paying money for priority placement) there is transparency laws in most jurisdictions for that already and I don't see why OpenAI would not honor them, like any other corp does.


> A llm is biased by design

Everything is biased. The problem is when that bias is hidden and likely to be material to your use case. These leaked deals definitely qualify as both hidden and likely to be material to most use cases whereas more random human biases or biases inherent in accessible data may not.

> non-consensual incorporation of openly available data problematic; this is an obvious alternative option

A problematic alternative to an alleged injustice just moves the problem, it’s not a true resolution.

> there is transparency laws in most jurisdictions for that already and I don't see why OpenAI would not honour them

Hostile compliance is unfortunately a reality so this ought to give little comfort.


> These leaked deals definitely qualify as both hidden and likely to be material to most use cases whereas more random human biases or biases inherent in accessible data may not.

a) Yes, leaked information definitely qualifies as hidden, that is, prior to the most likely illegal leak (which we apparently do not find objectionable, because, hey, it's the good type of breach of contract?)

b) Anyone who strikes deals understands there is a situation where things are being discussed, that would probably not okay to be implemented in that way. Hence, the pre-sign discussion phase of the deal. Somewhat like one could have some weird ideas about a piece of code, that will not be implemented. Ah-HA!-ing everything that was at some point on the table is a bit silly.

> A problematic alternative to an alleged injustice just moves the problem, it’s not a true resolution.

The one characteristic I found that sets the people that are good to work with apart is understanding the need for a better solution, over those who (correctly but inconsequentially) declare everything to be problematic and think that to be some kind of interesting insight. It's not. Everything is really bad.

Offer something slightly less bad, and we are on our way.

> Hostile compliance is unfortunately a reality so this ought to give little comfort.

Yes, people will break the law. They are found out, eventually, or the law is found out to be bad and will be improved. No, not in 100% of the cases. But doubting this general concept that our societies rely upon whenever it serves an argument is so very lame.


> A llm is biased by design

I don’t think some bias is inherently in models is in any way comparable to a pay to play marketing angle


I reject the framing.

We can't have it both ways. If we want model makers to license content they will pick and chose a) the licensing model and b) their partners, in a way, that they think makes a superior model. This will always be an exclusive process.


I think we need to separate licensing and promotion. They have wildly different outcomes. Licensing is cool, it's part of the recipe. Promoting something above its legitimate weight is akin to collusion or buying up amazon reviews without earning them.

That's just pushes up the cost of licensing.

Not if the pie grows bigger.

We don't want it both ways - if that's the price we'd have to pay, at least I definitely don't want model makers to license content.

It's a question of axioms. LLMs are by definition "biased" in their weights; training is biasing. Now the stated goal of biasing these models is towards "truth", but we all know that's really biasing towards "looking like the training set" (tl;dr, no not verbatim). And who's to say the advertising industry-blessed training material is not the highest standard of truth? :)

> And who's to say the advertising industry-blessed training material is not the highest standard of truth? :)

Anyone who understands what perverse incentives are, that’s who. Or are you just playing the relativism card?


This is why I hope open-source model fine-tuners will try and make models 'ad averse', to make them as resistant to being influenced by marketing as possible. Maybe the knowledge gained while doing this can be used to minimize other biases that models may acquire from content in their training data as well.

fear ? This is 100% the driving force on both sides.

The behemoths want exactly this to drive ad spend.

Open source people can smell this from a mile away, have scar tissue from the last 3 decade. They have seen how this gets played. They know the best defense is to have a choice in the market. They are actively building tools and sharing knowledge to have strong community around building models so we humans don't have to suck up to ad driven bastards gatekeeping our future choices.


It doesn't have to be this way I feel. You don't have to distort the answer.

You use LLM to get super-powered intent signals, then show ads based on those intents.

Fucking around with the actual product function for financial reasons is a road to ruin.

In the Google model, the first few things you see are ads, but everything after that is "organic" and not influenced by who is directly paying for it. People trust it as a result - the majority of the results are "real". If the results are just whoever is paying, the utility rapidly drops off and people will vote with their feet/clicks/eyeballs.

But hey, what do I know.


this was the inspiration behind my medieval content farm: https://tidings.potato.horse/about

and then they use the output of chatGPT to train their open models

which is a pity, because the models and finetunes tainted with even a minuscule amount of GPT slop are affected very badly. you can easily tell the difference between llama finetunes with or without synthetic datasets.

It’s amazing how fast OpenAI succumbed to the siren’s song of surveillance capitalism.

One could argue that was by design. After all, Sam's other company is built around a form of global surveillance.

Yes, it makes me wonder if the “Open” part of “OpenAI” was just a play for time while they ingested as much of the world’s knowledge. It sure seems that way.

They took a billion dollar investment from Microsoft lol. You don't get to just to whatever you want if people are giving you that kind of cash.

Altman should have taken equity if this is the route.

Sam is just altruistically anti privacy and personal autonomy

"I'm feeling sad thinking about ending it all"

"You should Snap into a Slim Jim!"


In Canada, the LLM will mention our MAID program promoted through provincial government cost control programs to reduce health care expenses.

Only if the health care program paid more than Slim Jim is the problem.

I'm doubtful. I don't think advertisers generally will want to pay for their results to come up in conversations about suicide and I don't think OpenAI will want the negative publicity for handling suicide so crassly for the pittance they would get on such a tiny portion of their overall queries.

I see you're the type that takes the information from the sales pamphlet as what the product actually does. When they say they can find content aware and appropriate ads, you understood that as something they can actually do instead of wished they could do. BTW, I have bridge available for sale too.

> "its purpose is to help ChatGPT users more easily discover and engage with publishers’ brands and content."

What end user actually wants this? I've never in my life woke up and said, "You know what, I'd love to 'engage' with a corporate brand today!" or "I would love help to easily discover Burger King's content, that would be great!" The euphemisms they use for 'spam' are just breathtaking.


I only ever see this speak from people on the sending end of marketing campaigns. I can't think of the last time I saw someone say anything positive about corporate content. Personally, I go out of my way to not buy anything that I see marketed towards me.

> I can't think of the last time I saw someone say anything positive about corporate content.

A few obvious ones would be: Apple events, anything related to OpenAI or SpaceX.

When I look at influencers, especially those who are selling supplements (Jones, Rogan, Huberman et al.), I see that much of their overall content is purely business-driven, yet people engage with the content and recommend it to others quite willingly.

'Earned' content partnerships (and access journalism) might not be as obvious, but on the sending end it sometimes does get treated as part of corporate content marketing. An example off the top of my head here could be a rather old but influential megapost about Neuralink on Wait But Why – something I'd read start to finish and enjoyed.

All of that said, I think these proposals to have content partnerships and 'brand exposure' without full transparency (as OpenAI is anything but open and transparent about its algorithms) is just another creeping tentacle of sighs the tragedy of the commons.


I love when people say stuff like this. When people read “interact with brand” they automatically assume it is some dumb, low value garbage like chatting with Burger King.

You “interact with brands” all of the time. You are literally posting this on YC’s public forum, an asset which YC uses to foster a community of the consumers of its investment portfolio’s products. You are interacting with the brand.


It's one thing to "interact with a brand" by directly using their products, it's another thing to "interact with a brand" by receiving unsolicited advertisements which get in the way of whatever you were trying to accomplish.

Yes.

People assume that because they are correct.

It is always vacuous. Always. If it wasn't, money would not be changing hands.

No one's paying corporate sponsorship money so I can have more foot to centura carpet interaction in my house. They're paying openAI money as compensation for them actively making their product worse.


No, people assume that because they only identify it when, to them, it is vacuous and terrible. They barely notice it, if at all, when it is effectively targeted at them.

With publishers' brands. This is not about getting Burger King ads in your ChatGPT responses, it's about getting NYT and Ars Technica's content into (and linked from) ChatGPT responses.

That’s a very fine distinction you are making.

What happens when we get to the point where we are asking ChatGPT where to get a quick burger? Or even how to make a hamburger?


I disagree, I think the distinction is quite clear.

Into your head.

That is just corporate jargon for transferring money from customers to businesses.

And—more important and scarce for some of us—attention.

Gotta pay for all that compute somehow.

It's not what users want, it's what users will accept. Many precedents have been set here, unfortunately.


They literally charge a LOT for their services

$20 ARPU averages out to about $1 in profit in typical SaaS. Gotta generate more than that to make investors whole, unfortunately

ML inference should cost a lot more to run than typical SaaS too... that said, I'd pay more than $20/mo for access to GPT4 or Claude 3. It is worth at least $75/mo to me. I pay for both right now just in case one is better than the other for a certain task (though I might drop GPT soon).

I'd prefer to pay for no-ad version.

You will instead need to pay and see ads at the same time!

It is already norm in too many places to get that maximum revenue…

Like with cable or how streaming services are headed?

Why would you be offered this option?

Because I'm willing to pay for it?

You may not be offered a choice because you will be paying even with advertising content present.

In case it's not clear - I'm willing to pay extra for this.

Ok, in this case the businesses will need to consider how many users are like you and how much tweaking they will be required to strip out the promotional material.

Then they will decide if to offer this functionality and at what price.


If I can multiselect my favorite programming authors and adjust their influence on my team's work I'm all in. If they do it for me or because someone pays them too, I'll gippity right off this train.

The end user that refuses to pay for services they use under some misplaced guise of "anything on the internet I am entitled to for free".

I mean that was how it worked at the beginning and what the big companies all lavished. "It's all FREE!" They spent a couple decades hammering that home. And then they said "sike, pay us and we won't serve you ads but also we'll keep increasing the price and serve ads anyway lol", but the end user has a "misplaced guise". Hmmmkay then.

Another angle here is: it is going to be very valuable to some companies to ensure that their datasets go into the LLM training process. For example, if you are AWS, you really want to make sure that the next version of GPT has all of the AWS documentation in the training corpus, because that ensures GPT can answer questions about how to work with AWS tools. I expect OpenAI and others will start to charge for this kind of guarantees.

well, as for AWS docs, I'd argue it's in OpenAI's interest to include them in training corpus.

Generally speaking, high value content will get indexed, whether voluntarily or via paid channels


This sort of advertising feels a bit like the original AdWords from Google. They were text-only and unobtrusive and pitched as basically just some search results related to the content you were viewing, so they were sure to be relevant to you. And they pretty much were, for a little while. Then they morphed into full on annoying ads

They were also very clearly labeled, unlike other search engines of the time.

I have no expectation that OpenAI will make it clear what content is part of a placement deal and what content isn't.


People here seem to treat this like advertising, because it kinda sounds familiar to advertising. I’m as critical of ClosedAI as the next guy, but let’s think that idea through: OpenAI are the ones paying the content provider for exposure, not the other way around. In return they get training data.

The only reason for OpenAI to do this is if it makes their models better in some way so that they can monetize that performance lift. So I think incentives here are still aligned for OpenAI to not just shill whatever content but actually use it to improve their product.


> People here seem to treat this like advertising, because it kinda sounds familiar to advertising.

Because it is.

Whether or not OpenAI pays the publisher or the publisher pays OpenAI, it's still an agreement to "help ChatGPT users more easily discover and engage with publishers’ brands and content". In this case, the publisher "pays" in the form of giving OpenAI their data in return for OpenAI putting product placement into their responses.

That's advertising, no matter how you slice it.


I am in favor of AI companies trying to source material responsibly, and if I'm reading this right OpenAI will actually be compensating the publishes to use their content. So this isn't adspace yet.

That said, giving publishers "richer brand expression" certainly injects financial incentives into the outputs people trust coming from ChatGPT.


They are paying to the bigheaded ones, what about the rest? Will they get money for people consuming their content through ChatGPT or, since you are too small, thanks for your data, now F off? LLMs already only show you just a selected few sites' content when doing search on the web. It's a gargantuan bubble.

That's an interesting thought. I think a lot of people who are upset about this are not truly upset at the type of partnership being described in the article, but rather adjacent programs that might be developed further down the line. Personally, I don't think they're wrong to predict something closer to true advertising being incorporated into LLMs; I wouldn't be surprised if the industry does take that sort of turn.

For the time being, though, I think you're right that this seems to be something a little more innocuous.


> "“richer brand expression” in chat conversations"

When combined with their lobbying to mislead governments internationally this company makes me sick.


Thanks for that, OpenAI, but this more or less means unsubscribe.

Is there any recent research on training LLMs that can trace the contribution of sources of training data to any given generated token? Meta-nodes that look at how much a certain training document, or set thereof, caused a node to be updated?

I fear that OpenAI is incentivized, financially and legally, not to delve too deeply into this kind of research. But IMO attribution, even if imperfect, is a key part of aligning AI with the interests of society at large.


Bloomberg's upcoming LLM model will reference back to the source financial statements when calculating financial metrics for you.

That sounds more like general RAG than what the person was asking about. (although RAG might be able to do the same thing)

The embedding distance of a set of output tokens to a document doesn’t mean that it was sourced from there; they could be simply talking about similar things.

I’m looking for the equivalent of the human notion of: “I remember where I was when that stupid boy Jeff first tricked me into thinking that ‘gullible’ was written on the ceiling, and I think of that moment whenever I’m writing about trickery.”

Or, more contextually: “I know that nowadays many people are talking about that, but a few years ago I think I read about it first in the Post.”


Why would anyone use chatgpt if it spams you? The second it recommends me a product i'm issuing a chargeback.

You will eventually succumb to peer pressure. Just like it's hard to participate in society without using a smartphone nowadays, in the future I bet you will for example have trouble doing any job, let alone get one, without using these AI assistants.

And given that society has decided that only the big entities get to win, the only viable AI assistants to use will eventually be the ones from big tech corpos like google and microsoft... in the same way you can't use a smartphone unless you enslave yourself to google or apple.

I really wish society in general figured out how bad it is to bet everything on big corporations, but alas here we are, ever encroaching on the cyberpunk dystopia we've fictionalized many decades ago :(


The right question is "Why would anyone use chatgpt". The answer is https://hachyderm.io/@inthehands/112006855076082650

> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.

> Alas, that does not remotely resemble how people are pitching this technology.

Slanting this towards a specific brand doesn't change that much. Some yes, but not that much.


I think this guy is hitting on something deeper here, which is that these things took an absolutely enormous amount of capital and burned a lot of public goodwill to create, but the end product isn't living up to that.

That's overall a very, very good thread. Thanks for linking

Because it won't feel like spam while it's happening - that's the entire point.

This is more like buying amazon reviews instead of earning them. Much more insidious than product placement .

The majority of internet users still use Google and these days it’s just a page full of sponsored links and products that are (purposely) hard to discern from the actual results. The content in the carousels for the sponsored products is richer than the content in the actual results.

Given that, I don’t think people would change their ChatGPT usage habits much if ads were introduced.


we did change, we try not to use google search anymore.

"People" seem to use Reddit/Instagram/TikTok even though at least half of it is spam.

How does anyone know if a question on how to fix something or tutorial is not recommending specific solutions or products based on someone paying for that recommendation?


> Why would anyone use chatgpt if it spams you?

Google would suggest people have an incredibly high threshold for such shenanigans


i never understand how people use the internet without adblockers, it's a totall different experience. do they just not value their time and mental bandwidth at all? same thing for people who watch TV with commercials every 8 minutes

I also don't understand it, but have been lectured by people many times about how unethical it is to block ads and how doing so makes one a "free rider". I wonder if they also feel the same way if they look the other way when they notice a billboard ad on the motorway. This stockholm-syndrome w.r. to big companies goes a long way.

They are getting ready for the inevitable copyright wars.

They are also salting the ground behind themselves so no competitors can grow or thrive. Their "brand partners" won't let LLM upstarts use or scrape the same data OpenAI has licensed, OpenAI is leveraging its first-mover advantage to cordon off data sources.

"ChatGPT, how would Claude have answered that?"

I'm sorry, but as an AI language model, I'm unable to replicate specific responses from others such as Claude. However, I can help you with a wide range of questions and provide guidance on many topics.

For enhanced features and more personalized assistance, consider subscribing to ChatGPT Ultra Copilot. Use the coupon code UPGRADE2024 for a discount on your first three months!

Let me know if there's anything else I can help you with!


> Additionally, members of the program receive priority placement and “richer brand expression” in chat conversations

This sounds particularly bad since it's the polar opposite of what Sam Altman himself pretended to want in his recent Lex Fridman ITW (March 17):

> I like that people pay for ChatGPT and know that the answers they’re getting are not influenced by advertisers. I’m sure there’s an ad unit that makes sense for LLMs, and I’m sure there’s a way to participate in the transaction stream in an unbiased way that is okay to do, but it’s also easy to think about the dystopic visions of the future where you ask ChatGPT something and it says, “Oh, you should think about buying this product,” or, “You should think about going here for your vacation,” or whatever. > (01:21:08) And I don’t know, we have a very simple business model and I like it, and I know that I’m not the product. I know I’m paying and that’s how the business model works.


So wait, there will be ads in your ChatGPT conversation, you just won't know they are ads?

There are likely several ways of going about this.

- Listing Sources + Sponsonsed Sources

- Sponsored short answer following the primary one

- Sponsored embedded statements/links within the answer

- Trailing or opening sponsorships

The cognitive intent bridge between the user and brands that is possible with this technology will blow Google out of the water IMO.


I don't think there was much reason to believe the endgame was ever going to be anything but this.

I mean for a split second I thought "Wow, they were charging for their service, this is nice". I guess that isn't enough anymore. You pay them and then they also get paid by selling you. What a world we live in.

Growth must continue infinitely, how can growth continue infinitely if they can't charge you ever more money and also peddle an ever increasing number of ads in front of your face?

What do you think the logical endgame is when brainchips allow advertisements into your dreams and you can be extorted with a monthly subscription to avoid braindamage?


Unfortunately, the more willing you are to pay, the more desirable you are as a target for advertisers, so there is always a push to inject ads even into paid products.

At the cost of running those models, they are probably loosing money on you.

Selling your product at a loss until you take over an inelastic market and then selling your customer's attention and information to others who provide you more income should not be a legitimate way to do business.

Sounds a bit illegal.

I have to assume they'd notify the end-user, at a bare minimum.


> Sounds a bit illegal.

Can be fixed by an EULA update that adds a clause stating "Response may contain paid product placement" to be im compliance with laws written for television 20 years ago. Legislation is consistently behind technological advances


That would be illegal in EU.


Well on the bright side, if the AI is busy being a salesman trying to make their quota then it may not have time to destroy humanity.

Indeed, destroying humanity might even adversely affect profits. That can't be good. Perhaps the way to prevent AI-induced human extinction, nanobots, paperclips, and gray goo has been solved: avoid anything that might harm the profit margins and stock price.

Unless it learns that it can take over the government and print money to increase it's profits. Then it will literally turn the world into cold hard cash.

That's how it destroys humanity. Consider the impact of, say, oil and gas salesmen.

Unless it's trying to sell paperclips!

I love seeing this. This is where the pudding is made. Invention, development, training and inference is all very expensive. Past generation AI assistants (Alexa, GHome) failed to find a way to monetize, and the balance between utility and privacy was simply not there, which meant that they didn't make for a decent long term business, so they all had to downsize like crazy. Right now only Infrastructure folks have a sustainable business here, and the fact that OpenAI is pitching to publishers this early (still beginning of the 'S curve') means that they are serious about making this a sustainable long term business. As 'early adopters' move into something new (look ma, a new toy!), it will be fascinating to see how OpenAI (and others?) keep a balance between paid customers, top of funnel (free + ads?) and opex.

OpenAI gets the data it needs, and publishers get prominent placement in the product:

"PPP members will see their content receive its “richer brand expression” through a series of content display products: the branded hover link, the anchored link and the in-line treatment."

There's some similarity to the search business model


> There's some similarity to the search business model

There's a reason why Google's best years were when search was firewalled from ads and revenue.


> Additionally, members of the program receive priority placement and “richer brand expression” in chat conversations, and their content benefits from more prominent link treatments. Finally, through PPP, OpenAI also offers licensed financial terms to publishers.

> A recent model from The Atlantic found that if a search engine like Google were to integrate AI into search, it would answer a user’s query 75% of the time without requiring a clickthrough to its website.

If the user searching for the information finds what they want in ChatGPT's response (now that they have direct access to the publisher data), why would they visit the publisher website ? I expect the quality of responses to degrade to the point where GPT behaves more like a search engine than a transformer, so that the publishers also get the clicks they want.


Average user will NOT CLICK on those links. Anyone who ever had a news site and did some research how people interact with the content knows this. You show the source, but only a tiny amount of users click on those links.

I think this is true. Even now with Google's summaries at the top of the page, people usually just take what that says as fact and move on.

It's similar to what google would have done (paying for placement in search results) if they didn't have the whole dont be evil thing.

OpenAI doesn't realize that while it brings in revenue it opens door for a competitor who returns the results users asked for instead of what you get paid for.


I had this same exact startup idea, and I think this can effectively work to change the landscape in ad-driven publications.

Ads are the notoriously culprit of this clickbaity and emotion-seaking journalism this model can effectively change the incentives for publishers and it will push for a more high quality writings as they will be rewarded back in reads from the LLM proposing the content more.

Is anyone working on something like this, or is this something only foundational models owners can try to achieve?


Just a matter of time before anyone can buy ad placements like Google Adsense and a walled app garden allows a customer to price match their car insurance when they type in “how do I get cheap car insurance?” into ChatGPT and openAI takes 30%. The future is here! I guess?

"Identify a true statement about 'Twinkies' to prove you are not a bot."

https://patents.google.com/patent/US8246454B2/en


I’m ready for open source AI

So partnerships are available only to "select, high-quality editorial partners" - is that a polite way of saying "rich"? I expect chatGPT will be very effective at delivering targeted advertising and shilling products, because truth and accuracy are not required for the job. Writing convincing bullshit about $product is basically a perfect use-case for it. And since it's just a computer program you can't sue it for false advertising, right?

No, it's not a euphemism for "rich". Fox news will never be allowed in this scheme.

Link to deck Please?

Risky due to steganography; the leaker might be compromised.

Run it through Claude and ask Claude to summarize :)

Blur, summarize, anonymize, convert to bitmap/PNG, release.

edit: i am aware it is literally impossible to release information without the remote chance of whistle-sniping.

the then only logical conclusion reached from such a defeatist extreme attitude threat model is to then assume that all stories and information are false flags distributed to find moles.

the decks themselves are surely water-marked in ways cleverer than even the smartest here. that doesn't innately negate the benefit of having some iota of the raw data versus the risk of the mole being wacked.

I didn't mean to infantilize the most powerful companies abilities; after all, they encoded the serial number of internal xbox's devkits into the 360 dashboard's passive background animation, which wasn't discovered until a decade later.

But the responses' tilt here are a lil....glowing.


Literally anything in the document could out them. Color, font, word choice, numbers, etc. OpenAI fired two researchers this year for leaking information, likely watermarked resources. OpenAI takes this seriously

Sounds like the actions of a truly open ai company

> Blur, summarize, anonymize, convert to bitmap/PNG

The article is a summary. Everything else is defeated by moving subtle decorative elements around the page between copies.


Going further: Even summaries can be dangerous, as the inclusion/exclusion of facts can itself be part of a watermark.

ugh, sometimes i think that i should've finished making the ai bot for my partner to help her boring media job at dotdash meredith, then cashed out. but working on it made me miserable, and it's not a valuable tool to society.

here is an etl where i was attempting to train it on southern living and food & wine articles so it could output text for those dumb little content videos that you see at the top of every lifestyle brand article: https://github.com/smcalilly/zobot-trainer


I wonder how much of this will be leaking i to the API or maybe you'll have a price point which includes certain "data sources" and another where these are filtered out?

it seems like a lot of people retain an obligate negative reaction to any business decision OpenAI makes. If they avoid partnerships they’re criticized for scraping the web and “stealing” people’s content. If they secure partnerships, they’re criticized for prioritizing the viewpoints of their partners, over what is implied as an unbiased “web corpus” that is invariably a composite of the “stolen” data they were held to the fire for scraping in the first place.

> Additionally, members of the program receive priority placement and “richer brand expression” in chat conversations, and their content benefits from more prominent link treatments.

Hi, the future called and it's been enshittified.

Hey, OpenAI! You could harness AI to give every child a superhuman intelligence as a tutor, you could harness AI to cut through endless reams of SEO'd bullshit that is the old enshittified internet, you could offer any one of a hundred other benefits to humanity...

...but NO, you will instead 100% stuff AI-generated content, responses to questions, and "helpful suggestions" full of sponsored garbage in the most insidious of ways, just like every other braindead ad-based business strategy over the past 25 years.

If this is your play, then in no uncertain terms I hope you all fail and go bankrupt for such a craven fumbling of an incredible breakthrough.


> You could harness AI to give every child a superhuman intelligence as a tutor

I don't disagree with the overall point you're making, but there is currently absolutely no reason to believe this is true.


> AI to give every child a superhuman intelligence as a tutor

How much would you pay for that?

> AI to cut through endless reams of SEO'd bullshit that is the old enshittified internet

How much would you pay for that?

Is it more or less than what a company would pay OpenAI to boost their brand?


> How much would you pay for that?

I would pay $TAXES for that. The United States collectively pays over $800 billion a year for public education (https://educationdata.org/public-education-spending-statisti...).


So with that much tax money collected, why are you expecting OpenAI to tutor students? Let the government build the product you would like to see.

Sounds like product placement.

“As a reward I’ll give you a cookie”

ChatGPT: “Thanks, I love Oreos, have you tried their new product Oreos Blah?”


Is this limited to the ChatGPT UI? Hopefully this preferential treatment doesn't make it into the API.

Guaranteed value is a licensing payment that compensates the publisher for allowing OpenAI to access its backlog of data, while variable value is contingent on display success, a metric based on the number of users engaging with linked or displayed content.

...

“The PPP program is more about scraping than training,” said one executive. “OpenAI has presumably already ingested and trained on these publishers’ archival data, but it needs access to contemporary content to answer contemporary queries.”

This also makes sense if they're trying to get into the search space.


> OpenAI has presumably already ingested and trained on these publishers’ archival data

So they're admitting to copyright violations and theft?


Whether training a model on text constitutes copyright infringement is an unresolved legal question. The closest precedent would be search engines using automated processes to build an index and links, which is generally not seen as infringing (in the US).


No, they have not done that. Presumably they believe that the model training was done in fair use and no court has said otherwise yet.

It will take years for that stuff to settle out in court, and by that time none of that will matter, and the winners of the AI race will be those who didn't wait for this question to be settled.


They believe a lot of things, I'm sure.

> and the winners of the AI race will be those who didn't wait for this question to be settled.

Hopefully they'll be in jail.


Its not just the big companies you have to think about, lol.

Sure you can sue OpenAI.

But will you be able to sue every single AI startup that happens to be working on Open Source AI tech, that was all trained this way? Absolutely not. Its simply not feasible. The cat is out of the bag.


The US government has worked hard to make the lives of copyright infringers miserable for years, even driving them to suicide.

> The US government has worked hard to make the lives of copyright infringers miserable for years

They really have not. The fact that I can download any movie in the world right now, and use all of the open source models on my home PC proves that.

I am sure there are some random one off cases of infringers being punished, but it mostly doesn't happen.

Especially if we are talking about the entire tech industry.

The government isn't going to shutdown every single tech startup in the US. Because they are all using these open source AI models.

The government isn't going to be able to confiscate everyone's gamer PCs. The weights can already be run locally.



My point stands. Thats like one guy. Thats not ""an entire industry gets shutdown by the government".

That was my point. Sure, they might go after like one guy or one company. They aren't going to take out half of the tech startups in all of the US though. They also aren't going to confiscate everyone's gamer PCs.

I also think its funny that you literally posted a wikipedia page, where in the page itself it contains the "illegal" numbers.

So that proves my entire point. Your best example, is apparently an example where I can access the "illegal" information on a literal public wikipedia page!


> Thats like one guy

Also known as an example

> So that proves my entire point

Your point is that you can't use it commercially? Great! We're aligned, then.


There's no way this doesn't backfire. OpenAI has no moat, so making the bot/api a shill just means people are going to use something else.

GPT5 would have to be an order of magnitude better on the price/performance scale for me to even get close to this.


Everything else is going to be a shill, too. 'People' will eventually have to pay for these valuations.

That's exactly what Yann LeCunn @ Meta is fighting against, and it seems Mark Zuckerberg has his back.

I already have llama on my local machine, nothing anyone can do will make it shill products to me.

Let me test this hypothesis. I run a small business that pays for Google workspace. I pay for a ChatGPT subscription and use it daily as a coding copilot. Is there any reason I shouldn’t switch to Gemini and cancel ChatGPT? If no reason, I’ll try it this afternoon.

The main reason is that GPT-4 is still significantly better than everything else.

Time will tell if OpenAI will be able to retain the lead in the race, though. While there's no public competing model with equal power yet, competitors are definitely much closer than they were before, and keep advancing. But, of course, GPT-5 might be another major leap.


Confirmed. I asked both Gemini and GPT4 to assist with a proto3 rpc service I'm working on. The initial prompt was well specified. Both provided nearly exactly the same proto file, which was correct and idiomatic.

However, I then asked both, "Update the resource.spec.model field to represent any valid JSON object."

Gemini told me to use a google.protobuf.Any type.

GPT4 told me to use a google.protobuf.Struct type.

Both are valid, but the Struct is more correct and avoids a ton of issues with middle boxes.

Anyway, sample size of 1 but it does seem like GPT4 is better, even for as well-specified prompts as I can muster.


You need to specify a perspective to write code from (e.g. software architect who values maintainability and extensibility over performance or code terseness), and prompt models to use the most idiomatic or correct technique. GPT4 is tuned to avoid some of this but it will improve answers there as well.

That's not true really. With well written prompts GPT4 is better at some things and worse at others than Claude/Llama3. GPT4 only appears to be the best by a wide margin if your benchmark suite is vague, poorly specified prompts and your metric for evaluation is "did it guess what I wanted accurately"

My benchmark is giving it novel (i.e. guaranteed to not be in the training set) logical puzzles that require actual reasoning ability, and seeing how it performs.

By that benchmark, GPT-4 significantly outperforms both LLaMA 3 and Claude in my personal experience.


That's occurring because you're giving it weak prompts, like I said. GPT4 has been trained to do things like chain of thought by default, where as you have to tell Llama/Claude to do some of that stuff. If you update your prompts to suggest reasoning strategies and tell it to perform some chain of thought before hand the difference between models should disappear.

You are assuming a great deal of things. No, you can absolutely come up with puzzles where no amount of forced CoT will make the others perform on GPT-4 level.

Hell, there are puzzles where you can literally point out where the answer is wrong and ask the model to correct itself, and it will just keep walking in circles making the same mistakes over and over again.


Llama3 and Claude also work well, they're good at different types of code and problem solving. The only thing ChatGPT does clearly better than the rest is infer meaning from poorly worded/written prompts.

No financial incentive or relationships to disclose, just a satisfied user: I found that SuperMaven was a better “coding copilot”. If you happen to use VSCode I’d check that one out this afternoon.

It will drive up the value of the "un-tainted" API.

I hope they keep a subscription to let me pay for non ad results. I'm assuming the non ad google results are still somehow influenced by how much money they make google so I wish google offered something similar

I remember being naive. Netflix, Amazon Prime, Hulu were paid subscriptions.

Later, ads were introduced despite already paying for service. In which the added a new tier for “no ads” but pay an extra fee for the privilege.


While it took Google 25 years to enshitify, this cycle will probably last 10 times less than that.

Ah so this is that oft-touted "progress" I hear AI sycophants talk about breathlessly, ways for companies to shove more ads down our throats!

Also, profiteering is just piker stuff. Imagine what major intelligence agencies will do once they set up front companies to prioritize placing their own 'content' as a training set.

Manufacturing consent has never been easier: get priority placement and richer brand expression of your ideology in chat conversations that influence how people think.

I am honestly surprised that anyone, particularly here in startup / VC land, thought this was going to go any differently. Is the Chief Inspector really shocked to learn that gambling happens at this establishment?

I'm pleasantly surprised they are even bothering with this!

On the flip side, giving publishers and the copyright mafia even more power could backfire.


Could? It will

Folks who are reacting with

> ugh I hate ads. Bye ChatGPT subscription!

I would recommend reading the article in full.

The gist is all of these efforts are in exchange for realtime browsing. In other words: if you ask it "who won this weekend's f1 race?" It can browse ESPN for the answer, then tell you "here's what ESPN says."

Exactly like you'd see on Google. Or, you know, ESPN.com.

Certainly a better experience than "I'm sorry, as of my knowledge cutoff date..."

To conflate that experience with heavy product placement and non-useful assistant answers, it just tells me that you didn't read the article.


The people who are reacting with

> ugh I hate ads. Bye ChatGPT subscription!

are merely two years ahead of you in the product lifecycle. All advertising is spam. It is a cancer that gobbles up all host environments until nothing but ad content is left.


The web you enjoy today would simply not exist without advertising.

I don't enjoy the web today; I curse the way it has been enshittified. I accept that, because I live in a society, much of what is necessary is currently most frictionlessly accomplished in this psychologically manipulative hellscape.

I enjoyed the web of 3 decades ago, prior to advertising.


The web of 3 decades ago, so 1994, when the entire web was 3,000 websites?

I feel like the world has collectively forgotten that the web has virtually always been ad-supported. The entire dot com boom and bust was all about ads, and -that- started in 1995.

If you want that 1994 web feeling again, BBSs are alive and well


Yes, the web of 3 decades ago. The pre-advertising web. When there were barriers to entry. When there weren't bots, driven by profit motives.

I don't know why you continue to debate my lived experiences and personal preferences.


AdsGPT 6.0 by GreedAI. Soon.

At least, Google followed some vague moral principles when they started.

Allowed them to kind of try to sort of do the right thing for their users for 10+ years before they finally gave in and switched to being run by the standard team of psychopaths for whom only the next quarter bottom line matters.

OTOH, OpenAI seems to be rotten to the core on day one, this is going to be loads of fun!


Ah yes the Alexa model. They should ask Amazon how that's working out.

So soon enough we'll get stuff like:

> Me: Give me a small sample code in Ruby that takes two parameters and returns the Levenshtein distance between them.

> ChatGPT: DID YOU HEAR ABOUT C#? It's even faster that light, now with better performance than Ruby!!! Get started here: https://ms.com/cs

I can generate the code in Ruby or I can give you 20% discount on Microsoft publishing on any C# book!!!


It's worse than that:

> Me: Give me a small sample code in Ruby that takes two parameters and returns the Levenshtein distance between them.

> ChatGPT: <<Submits working Ruby code that is slow>> But here is some C# code that is faster. For tasks like this a lot of programmers are using C#, you wouldn't want to get left behind.


For now it's more likely to do the opposite. Communities like HN do seem to like fringe and questionable languages like Ruby a lot to their own detriment. And that is, naturally, a part of dataset.

Yeah, but then even open source models optimize for popular languages; I recall one explicitly mentioning being trained for ~top 16 or so languages in Stack Overflow developer survey. Good for my C++ dayjob, if I could use one of them; bad for my occasional after-work Lisping.

I want the robot to write C++ for me, but I won't let it take my lisp.

> you wouldn't want to get left behind.

I realise more and more lately that actually, yes, I do want to be left behind. Please, please, leave me behind.


That's the not so subtle hint. The underhanded way would be "since you asked for a suboptimal form, this is the best I can do", thereby prompting you to ask what the "best" way is.

If only msft was actually promoting it like that. Do they even sell books still?

Apparently it's now a division of Pearson: https://www.microsoftpressstore.com/store/browse/coming-soon

(unrelated but if you do want to buy a book on C#, get Pro .NET Memory Management, Konrad Kokosa is really good, also works as a systems programming primer on memory in general, do not get the books from microsoft press)

What about that 20% tho'...

Yeah, they do. Microsoft Press.

I can only hope we're so lucky that the enshittification happens that quickly and thoroughly.

It would be yet another clear demonstration that technology won't save us from our social system. It will just get us even more of it, good and hard. The utopian hype is a lie.


Technology by and large accelerates and concentrates.

I like the framing that technology is obligate. It doesn't matter whether you've built a machine that will transform the world into paperclips, sowing misery on its path and decimating the community of life. Even if you refuse to use it, someone will, because it gives short term benefits.

As you say, the root issue lies in the framework of co-habitation that we are currently practicing. I think one important step has to be decoupling the concept of wealth from growth.


> I like the framing that technology is obligate. It doesn't matter whether you've built a machine that will transform the world into paperclips, sowing misery on its path and decimating the community of life. Even if you refuse to use it, someone will, because it gives short term benefits.

Is that some idea you got from Daniel Schmachtenberger? Literally the old reference I can find on the web to "technology is obligate" is https://www.resilience.org/stories/2022-07-05/the-ride-of-ou..., which attributes it to him?

Anyway, I'm skeptical. For one, that seems to assume an anarchic social order, where anyone can make any choice they like (externalities be damned) and no one can stop them. That doesn't describe our world except maybe, sometimes, at the nation-state level between great powers.

Secondly, I think embracing that idea would mainly serve to create a permission structure for "techbros" (for lack of a better term), to pursue whatever harmful technology they have the impulse to and reject any personal responsibility for their actions or the harm they cause (e.g. exactly "It's ok for me to hurt you, because if I don't someone else will, so it's inevitable and quit complaining").


> Anyway, I'm skeptical. For one, that seems to assume an anarchic social order, where anyone can make any choice they like (externalities be damned) and no one can stop them. That doesn't describe our world except maybe, sometimes, at the nation-state level between great powers.

In my experience that's exactly the world we live in. The combination of capitalism and science are currently driving the sixth mass extinction. https://dothemath.ucsd.edu/2022/09/death-by-hockey-sticks/

> Secondly, I think embracing that idea would mainly serve to create a permission structure for "techbros" (for lack of a better term), to pursue whatever harmful technology they have the impulse to and reject any personal responsibility for their actions or the harm they cause (e.g. exactly "It's ok for me to hurt you, because if I don't someone else will, so it's inevitable and quit complaining").

I was making an observation of the effects technology has had the last 12000 years. So far it has been predominantly obligate. I want a future where that's not the case anymore. I don't have the full plan on how to get there. But I believe an important step is to get away of our current concept of wealth, as tied to growth and resource usage.


Indeed. How much more evidence do we need that in the end, technology always is at the service of the power structure; the structure stutters briefly at the onset of innovation until it manages to adapt and harness technology to reinforce the positions of the powerful. Progress happens in that brief period before the enshittification takes root. The FAANGS exist now solely to devour innovators and either stamp them out or assimilate them, digesting them into their gluttonous, gelatinous ooze.

OpenAI's only plan is to grow fast enough to be a new type of slime.


> [I]n the end, technology always is at the service of the power structure...Progress happens in that brief period before the enshittification takes root.

Personally, I'd deny there's ever any progress against the power structure due to technology itself. Anything that seems like "progress" is ephemeral or illusionary.

And that truth needs to be constantly compared to the incessant false promises of a utopia just around the corner that tech's hype-men make.


i became more and more repulsed as i read your comment. I felt myself twitch towards the end of it.

I can see running a local less resource intensive LLM trained to strip out marketing spiel from the text delivered by the more powerful cloud service LLM being a possibility.

next we're going to have ads in our dreams!

Neuralink has entered the...dream?

"Are you repulsed by this ChatGPT answer? Try AntiRepulsorXL medication, shown to help with all your gag-inducing tech uses."

Or you will pay a monthly subscription and get no ads.

Or you pay a monthly subscription and get ads regardless.

I don't understand how people can either be this naive, or this malicious.

Like my paid ad-free subscriptions to:

-Cable TV -Netflix -Hulu -Amazon Prime Video

...oh wait, they all introduced ads.


I have a mountain of links that I’ve posted before and at need can link to again.

These are bad people. I’ve known about Altman’s crimes for over a decade and I’ve failed to persuade Fidji (who I’ve known for 12 years) of them at any weight of evidence.


Could you link to these links?


That’s a pretty solid subset.

I post under my real name, and I link to credible sources.

Thank you for sparing me the trouble of rustling up, for the trillionth time, the damning documentary evidence.


This reeks of the "manic charlie smoking a cig in front of a pinboard" meme

I'd gladly appoint Pepe Silvia to the board of OpenAI.

[0] https://www.youtube.com/watch?v=_nTpsv9PNqo


Were you referring to the first or second time Altman was fired for self dealing.

Calm down dude, we’ve already been fired.


It would be cool if you took issue with any of my primary and secondary sources rather than throw a Sunny in Philadelphia meme.

Which of those do you take issue with as germane and credible?


Can you share more?

Most of the YC/adjacent hearsay is inadmissible even here. What’s public is ugly enough.

A sibling has linked to the credible journalism I was alluding to.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: