OpenAI is racing against two clocks: the commoditization clock (how quickly open-source alternatives catch up) and the monetization clock (their need to generate substantial revenue to justify their valuation).
The ultimate success of this strategy depends on what we might call the enterprise AI adoption curve - whether large organizations will prioritize the kind of integrated, reliable, and "safe" AI solutions OpenAI is positioning itself to provide over cheaper but potentially less polished alternatives.
This is strikingly similar to IBM's historical bet on enterprise computing - sacrificing the low-end market to focus on high-value enterprise customers who would pay premium prices for reliability and integration. The key question is whether AI will follow a similar maturation pattern or if the open-source nature of the technology will force a different evolutionary path.
The problem is that OpenAI don't really have the enterprise market at all. Their APIs are closer in that many companies are using them to power features in other software, primarily Microsoft, but they're not the ones providing end user value to enterprises with APIs.
As for ChatGPT, it's a consumer tool, not an enterprise tool. It's not really integrated into an enterprises' existing toolset, it's not integrated into their authentication, it's not integrated into their internal permissions model, the IT department can't enforce any policies on how it's used. In almost all ways it doesn't look like enterprise IT.
This remind me why enterprise don't integrated OpenAI product into existing toolset, trust is root reason.
It's hard to provide trust to OpenAI that they won't steal data of enterprise to train next model in a market where content is the most valuable element, compared office, cloud database, etc.
Microsoft 360 has over 300 million corporate users - trusting it with email, document management, and collaboration etc. It’s the defacto standard in larger companies especially in banking, medicine and finance that have more rigorous compliance regulations.
The administrative segments that decide to sell their firstborn to Microsoft all have their heads in the clouds. They'll pay Microsoft to steal their data and resell it and they'll defend their decisions making beyond their own demise.
As such Microsoft is doing the right choice in outright stealing data for whatever purpose. It will have no real consequences.
IT policy flick of the switch disables that, such as at my organization. That was instead intended to snag single, non-corporate, user accounts (still horrible, but I mean to convey that MS at no point in all that expected a company's IT department to actually leave that training feature enabled in policy).
It doesn't need to / it already is – most enterprises are already Microsoft/Azure shops. Already approved, already there. What is close to impossible is to use anything non Microsoft - with one exception – open source.
They betrayed their customers in the Storm-0558 attack.
They didn't disclose the full scale and charged the customers for the advanced logging needed for detections.
Not to mention that they abolished QA and outsourced it to the customer.
Maybe they aren't, but when you already have all your documents in sharepoint, all your emails in outlook and all your databases VMs in Azure, then Azure OpenAI is trusted in the organization.
For some reason (mainly because Microsoft has orders of magnitude more sells reps than anything else) companies have been trusting Microsoft for their most critical data for a long time.
For example when they backed the CEOs coup against the board.
With AI-CEOs - https://ai-ceo.org - This would never have happened because their CEOs have a kill switch and mobile app for the board for full observability
OpenAi enterprise plan especially says that they do not train their models with your data. It's in the contract agreement and it's also visible on the bottom of every chatgpt prompt window.
It seems like a damned if you do, damned if you don't. How is ChatGPT going to provide relevant answers to company specific prompts if they don't train on your data?
My personal take is that most companies don't have enough data, and not in sufficiently high quality, to be able to use LLMs for company specific tasks.
The model from OpenAI doesn’t need to be directly trained on the company’s data. Instead, they provide a fine-tuning API in a “trusted” environment. Which usually means Microsoft’s “Azure OpenAI” product.
But really, in practice, most applications are using the “RAG” (retrieval augmented generation) approach, and actually doing fine tuning is less common.
> The model from OpenAI doesn’t need to be directly trained on the company’s data
Wouldn't that depend on what you expect it to do? If you just want say copilot, summarize texts or help writing emails then you're probably good. If you want to use ChatGPT to help solve customer issues or debug problems specific to your company, wouldn't you need to feed it your own data? I'm thinking: Help me find the correct subscription to a customer with these parameters, then you'd need to have ChatGPT know your pricing structure.
One idea I've had, from an experience with an ISP, would be to have the LLM tell customer service: Hey, this is an issue similar to what five of your colleagues just dealt with, in the same area, within 30 minutes. You should consider escalating this to a technician. That would require more or less live feedback to the model, or am I misunderstanding how the current AIs would handle that information?
100% this. If they can figure out trust through some paradigm where enterprises can use the models but not have to trust OpenAI itself directly then $200 will be less of an issue.
> It's hard to provide trust to OpenAI that they won't steal data of enterprise to train next model
Bit of a cynical take. A company like OpenAI stands to lose enormously if anyone catches them doing dodgy shit in violation of their agreements with users. And it's very hard to keep dodgy behaviour secret in any decent sized company where any embittered employee can blow the whistle. VW only just managed it with Dieselgate by keeping the circle of conspirators very small.
If their terms say they won't use your data now or in the future then you can reasonably assume that's the case for your business planning purposes.
lawsuits over the legality of using using someone's writing as training data aren't the same thing as them saying they won't use you as training data and then doing so. they're different things. one is people being upset that their work was used in a way they didn't anticipate, and wanting additional compensation for it because a computer reading their work is different from a person reading their work. the other is saying you won't do something and then doing that anyway and lying about it.
It's not that anyone suspects OpenAI doing dodgy shit. Data flowing out of an enterprise is very high risk. No matter what security safeguards you employ. So they want everything inside their cloud perimeter and on servers they can control.
IMO no big enterprise will adopt chatGPT unless it's all hosted in their cloud. Open source models lend better to the enterprises in this regard.
> IMO no big enterprise will adopt chatGPT unless it's all hosted in their cloud
80% of big enterprises already use MS Sharepoint hosted in Azure for some of their document management. It’s certified for storing medical and financial records.
Cynical? That’d be on brand… especially with the ongoing lawsuits, the exodus of people and the CEO drama a while back? I’d have a hard time recommending them as a partner over Anthropic or Open Source.
It's not enough for some companies that need to ensure it won't happen.
I know for a fact a major corporation I do work for is vehemently against any use of generative A.I. by its employees (just had that drilled into my head multiple times by their mandatory annual cybersecurity training), although I believe they are working towards getting some fully internal solution working at some point.
Kind of funny that Google includes generative A.I. answers by default now, so I still see those answers just by doing a Google search.
I've seen the enterprise version with a top-5 consulting company, and it answers from their global knowledgebase, cites references, and doesn't train on their data.
I recently (in the last month) asked ChatGPT to cite its sources for some scientific data. It gave me completely made up, entirely fabricated citations for academic papers that did not exist.
The behavior you're describing sounds like an older model behavior. When I ask for links to references these days, it searches the internet the gives me links to real papers that are often actually relevant and helpful.
I don’t recall that it ever mentioned if it did or not. I don’t have the search on hand but from my browser history I did the prompt engineering on 11/18 (which perhaps there is a new model since then?).
I actually repeated the prompt just now and it actually gave me the correct, opposite response. For those curious, I asked ChatGPT what turned on a gene, and it said Protein X turns on Gene Y as per -fake citation-. Asking today if Protein X turns on Gene Y ChatGPT said there is no evidence, and showed 2 real citations of factors that may turn on Gene Y.
So sorry to offend your delicate sensibilities by calling out a blatant lie from someone completely unrelated to yourself. Pretty bizarre behavior in itself to do so.
as just another example, chatgpt said in the Okita paper that they switched media on day 3, when if you read the paper they switched the media on day 8. so not only did it fail to generate the correct reference, it also failed to accurately interpret the contents of a specific paper.
I’m a pretty experienced developer and I struggle to get any useful information out of LLMs for any non-trivial task.
At my job (at an LLM-based search company) our CTO uses it on occasion (I can tell by the contortions in his AI code that isn’t present in his handwritten code. I rarely need to fix the former)
And I think our interns used it for a demo one week, but I don’t think it’s very common at my company.
Won’t name my company, but we rely on Palantir Foundry for our data lake.
And the only thing everybody wants [including Palantir itself] is to deploy at scale AI capabilities tied properly to the rest of the toolset/datasets.
The issues at the moment are a mix of IP on the data, insurance on the security of private clouds infrastructures, deals between Amazon and Microsoft/OpenAI for the proper integration of ChatGPT on AWS, all these kind of things.
But discarding the enterprise needs is in my opinion a [very] wrong assumption.
Very personal feeling, but without a datalake organized the way Foundry is organized, I don’t see how you can manage [cold] data at scale in a company [both in term of size, flexibility, semantics or R&D]. Given the fact that IT services in big companies WILL fail to build and maintain such a horribly complex stack, the walled garden nature of the Foundry stack is not so stupid.
But all that is the technical part of things. Markets do not bless products. They bless revenues. And from that perspective, I have NO CLUE.
This is what's so brilliant about the Microsoft "partnership". OpenAI gets the Microsoft enterprise legitimacy, meanwhile Microsoft can build interfaces on top of ChatGPT that they can swap out later for whatever they want when it suits them
I think this is good for Microsoft, but less good for OpenAI.
Microsoft owns the customer relationship, owns the product experience, and in many ways owns the productionisation of a model into a useful feature. They also happen to own the datacenter side as well.
Because Microsoft is the whole wrapper around OpenAI, they can also negotiate. If they think they can get a better price from Anthropic, Google (in theory), or their own internally created models, then they can pressure OpenAI to reduce prices.
OpenAI doesn't get Microsoft's enterprise legitimacy, Microsoft keep that. OpenAI just gets preferential treatment as a supplier.
On the way up the hype curve it's the folks selling shovels that make all the money, but in a market of mature productionisation at scale, it's those closest to customers who make the money.
$10B of compute credits on a capped profit deal that they can break as soon as they get AGI (i.e. the $10T invention) seems pretty favorable to OpenAI.
I’d be significantly less surprised if OpenAI never made a single $ in profit than if they somehow invented “AGI” (of course nobody has a clue what that even means so maybe there is a chance just because of that..)
Leaving aside the “AGI on paper” point a sibling correctly made, your point shares the same basic structure as noting that any VC investment is a terrible deal if you only 2x your valuation. You might get $0 if there is a multiple on the liquidation preference!
OpenAI are clearly going for the BHAG. You may or may not believe in AGI-soon but they do, and are all in on this bet. So they simply don’t care about the failure case (ie no AGI in the timeframe that they can maintain runway).
OAI through their API probably does but I do agree that ChatGPT is not really Enterprise product.For the company the API is the platform play, their enterprise customers are going to be the likes of MSFT, salesforce, zendesk or say Apple to power Siri, these are the ones doing the heavy lifting of selling and making an LLM product that provides value to their enterprise customers. A bit like stripe/AWS. Whether OAI can form a durable platform (vs their competitors or inhouse LLM) is the question here or whether they can offer models at a cost that justifies the upsell of AI features their customers offer
That's why Microsoft included OpenAI access in Azure. However, their current offering is quite immature so companies are using several prices of infra to make it usable (for rate limiting, better authentication etc.).
> As for ChatGPT, it's a consumer tool, not an enterprise tool. It's not really integrated into an enterprises' existing toolset, it's not integrated into their authentication, it's not integrated into their internal permissions model, the IT department can't enforce any policies on how it's used. In almost all ways it doesn't look like enterprise IT.
What according to you is the bare minimum of what it will take for it to be an enterprise tool?
SSO and enforceable privacy and IP protections would be a start. RBAC, queues, caching results, and workflow management would open a lot of doors very quickly.
have used it at 2 different enterprises internally, the issue is price more than anything. enterprises definitely do want to self host, but for frontier tech they want frontier models for solving complicated unsolved problems or building efficiencies in complicated workflows. one company had to rip it out for a time due to price, I no longer work there anymore though so can't speak on if it was reintegrated.
Decision making in enterprise procurement is more about whether it makes the corporation money and whether there is immediate and effective support when it stops making money.
I don't think user submitted question/answer is as useful for training as you (and many others) think. It's not useless, but it's certainly not some goldmine either considering how noisy it is (from the users) and how synthetic it is (the responses). Further, while I wouldn't put it past them to use user data in that way, there's certainly a PR/controversy cost to doing so, even if it's outlined in their ToS.
In enterprise, there will be long content or document be poured into ChatGPT if there isn't policy limitation from company, which can be a meaning training data.
At least, there's possibility these content can be seen by staff in OpenAI as bad case, there's still existing privacy concerns.
No, because a lot of people asking you questions doesn't mean you have the answers to them. It's an opportunity to find the answers by hiring "AI trainers" and putting their responses in the training data.
Yeah it's a fairly standard clause in the business paid versions of SaaS products that your data isn't used to train the model. The whole thing you're selling is per-company isolation so you don't want to go back on that.
Whether your data is used for training or not is an approximation of whether you're using a tool for commercial applications, so a pretty good way to price discriminate.
I wonder if OpenAI can break into enterprise. I don’t see much of a path for them, at least here in the EU. Even if they do manage to build some sort of trust as far as data safety goes, and I’m not sure they’ll have much more luck with that than Facebook had trying to sell that corporate thing they did (still do?). But if they did, they will still be facing the very real issue of having to compete with Microsoft.
I view that competition a bit like the Teams vs anything else. Teams wasn’t better, but it was good enough and it’s “sort of free”. It’s the same with the Azure AI tools, they aren’t feee but since you don’t exactly pay list pricing in enterprise they can be fairly cheap. Co-pilot is obviously horrible compared to CharGPT, but a lot of the Azure AI tooling works perfectly well and much of it integrates seamlessly with what you already have running in Azure. We recently “lost” our OCR for a document flow, and since it wasn’t recoverable we needed to do something fast. Well the Azure Document Intelligence was so easy to hook up to the flow it was ridiculous. I don’t want to sound like a Microsoft commercial. I think they are a good IT business partner, but the products are also sort of a trap where all those tiny things create the perfect vendor lock-in. Which is bad, but it’s also where European Enterprise is at since the “monopoly” Microsoft has on the suite of products makes it very hard to not use them. Teams again being the perfect example since it “won” by basically being a 0 in the budget even though it isn’t actually free.
Man, if they can solve that "trust" problem, OpenAI could really have an big advantage. Imagine if they were nonprofit, open source, documented all of the data that their training was being done with, or published all of their boardroom documents. That'd be a real distinguishing advantage. Somebody should start an organization like that.
The cyber security gatekeepers care very little about that kind of stuff. They care only about what does not get them in trouble, and AI in many enterprises is still viewed as a cyber threat.
One of the things that i find remarkable in my work is that they block ChatGPT because they're afraid of data leaking. But Google translate has been promoted for years and we don't really do business with Google. Were a Microsoft shop. Kinda double standards.
I mean it was probably a jive at OpenAIs transition to for-profit, but you’re absolutely right.
Enterprise decision makers care about compliance, certifications and “general market image” (which probably has a proper English word). OpenAI has none of that, and they will compete with companies that do.
Sometimes I wish Apple did more for business use cases. The same https://security.apple.com/blog/private-cloud-compute/ tech that will provide auditable isolation for consumer user sessions would be incredibly welcome in a world where every other company has proven a desire to monetize your data.
Teams winning on price instead of quality is very telling of the state of business. Your #1/#2 communication tool being regarded as a cost to be saved upon.
It’s “good enough” and integrates into existing Microsoft solutions (just Outlook meeting request integration, for example), and the competition isn’t dramatically better, more like a side-grade in terms of better usability but less integration.
You still can't copy a picture out of a teams chat and paste it into an office document without jumping through hoops. It's utterly horrible. The only thing that prevents people from complaining about it is that it's completely in line with the rest of the office drone experience.
In my experience Teams is mostly used for video conferencing (i.e. as a Zoom alternative), and for chats a different tool is used. Most places already had chat systems set up (Slack, Mattermost, whatever) (or standardize on email anyway), before video conferencing became ubiquitous due to the pandemic.
And yet Teams allows me to seamlessly video call a coworker. Whereas in Slack you have this ridiculous "huddle" thing where all video call participants show up in a tiny tiny rectangle and you can't see them properly. Even a screen share only shows up in a tiny rectangle. There's no way to increase its size. What's even the point of having this feature when you can't see anything properly because everything is so small?
Seriously, I'm not a fan of Teams, but the sad state of video calls in Slack, even in 2024, seriously ruins it for me. This is the one thing — one important thing — that Teams is better at than Slack.
consider yourself lucky, my team uses skype business. Its skype except it cant do video calls or calls at all. Just a terrible messaging client with zero features!
I’m not sure you can considering how broad a term “better” is. I do know a lot of employees in a lot of non-tech organisations here in Denmark wishes they could still use Zoom.
Even in my own organisation Teams isn’t exactly a beloved platform. The whole “Teams” part of it can actually solve a lot of the issues our employees have with sharing documents, having chats located in relation to a project and so on, but they just don’t use it because they hate it.
Email, Jitsi, Matrix/Element, many of them, e2e encrypted and on-premise. No serious company (outside of US) which really care about it's own data privacy would go for MS Teams, which can't even offer decent user experience most of the time.
> I wonder if OpenAI can break into enterprise. I don’t see much of a path for them, at least here in the EU.
Uhh they're already here. Under the name CoPilot which is really just ChatGPT under the hood.
Microsoft launders the missing trust in OpenAI :)
But why do you think copilot is worse? It's really just the same engine (gpt-4o right now) with some RAG grounding based on your SharePoint documents. Speaking about copilot for M365 here.
I don't think it's a great service yet, it's still very early and flawed. But so is ChatGPT.
Agreed on the strategy questions. It's interesting to tie back to IBM; my first reaction was that openai has more consumer connectivity than IBM did in the desktop era, but I'm not sure that's true. I guess what is true is that IBM passed over the "IBM Compatible" -> "MS DOS Compatible" business quite quickly in the mid 80s; seemingly overnight we had the death of all minicomputer companies and the rise of PC desktop companies.
I agree that if you're sure you have a commodity product, then you should make sure you're in the driver seat with those that will pay more, and also try and grind less effective players out. (As a strategy assessment, not a moral one).
You could think of Apple under JLG and then being handed back to Jobs as precisely being two perspectives on the answer to "does Apple have a commodity product?" Gassée thought it did, and we had the era of Apple OEMs, system integrators, other boxes running Apple software, and Jobs thought it did not; essentially his first act was to kill those deals.
The new pricing tier suggests they're taking the Jobs approach - betting that their technology integration and reliability will justify premium positioning. But they face more intense commoditization pressure than either IBM or Apple did, given the rapid advancement of open-source models.
The critical question is timing - if they wait too long to establish their enterprise position, they risk being overtaken by commoditization as IBM was. Move too aggressively, and they might prematurely abandon advantages in the broader market, as Apple nearly did under Gassée.
Threading the needle. I don't envy their position here. Especially with Musk in the Trump administration.
The Apple partnership and iOS integration seems pretty damn big for them - that really corners a huge portion of the consumer market.
Agreed on enterprise - Microsoft would have to roll out policies and integration with their core products at a pace faster than they usually do (Azure AD for example still pales in comparison to legacy AD feature wise - I am continually amazed they do not priorities this more)
Except I had to sign in to OpenAI when setting up Apple Intelligence. Even though Apple Intelligence is doing almost nothing useful for me right now at least OpenAI’s AOI number's go up.
Right now Gemini Pro is best for email, docs, calendar integration.
That said ChatGPT Plus us a good product an I might spring for Pro for a month or two.
ChatGPT through Siri/Apple Intelligence is a joke compared to using ChatGPT's iPhone app. Siri is still a dumb one trick pony after 13 years of being on the market.
Supposedly Apple wont be able to offer a Siri LLM that acts like ChatGPT's iPhone app until 2026. That gives Apple's current and new competitors a head start. Maybe ChatGPT and Microsoft could release an AI Phone. I'd drop Apple quickly if that becomes a reality.
Well one key difference is that Google and Amazon are cloud operators, they will still benefit from selling the compute that open source models run on.
For sure. If I were in charge of AI for the US, I'd prioritize having a known good and best-in-class LLM available not least for national security reasons; OAI put someone on gov rel about a year ago, beltway insider type, and they have been selling aggressively. Feels like most of the federal procurement is going to want to go to using primes for this stuff, or if OpenAI and Anthropic can sell successfully, fine.
Grok winning the Federal bid is an interesting possible outcome though. I think that, slightly de-Elon-ed, the messaging that it's been trained to be more politically neutral (I realize that this is a large step from how it's messaged) might be a real factor in the next few years in the US. Should be interesting!
Fudged71 - you want to predict openai value and importance in 2029? We'll still both be on HN I'm sure. I'm going to predict it's a dominant player, and I'll go contra-Gwern, and say that it will still be known as best-in-class product delivered AI, whether or not an Anthropic or other company has best-in-class LLM tech. Basically, I think they'll make it and sustain.
Somehow I missed the Anduril partnership announcement. I agree with you. National Security relationships in particular creates a moat that’s hard to replicate even with superior technology.
It seems possible OpenAI could maintain dominance in government/institutional markets while facing more competition in commercial segments, similar to how defense contractors operate.
Now we just need to find someone who disagrees with us and we can make a long bet.
It feels strange to say but I think that the product moat looks harder than the LLM moat for the top 5 teams right now. I'm surprised I think that, but I've assessed so many L and MLM models in the last 18 months, and they keep getting better, albeit more slowly, and they keep getting smaller while they lose less quality, and tooling keeps getting better on them.
At the same time, all the product infra around using, integrating, safety, API support, enterprise contracts, data security, threat analysis, all that is expensive and hard for startups in a way that spending $50mm with a cloud AI infra company is not hard.
Altman's new head of product is reputed to be excellent as well, so it will be super interesting to see where this all goes.
One of the main issues that enterprise AI has is the data in large corporations. It's typically a nightmare of fiefdoms and filesystems. I'm sure that a lot of companies would love to use AI more, both internally and commercially. But first they'd have to wrangle their own systems so that OpenAI can ingest the data at all.
Unfortunately, those are 5+ year projects for a lot of F500 companies. And they'll have to burn a lot of political capital to get the internal systems under control. Meaning that the CXO that does get the SQL server up and running and has the clout to do something about non-compliance, that person is going to be hated internally. And then if it's ever finished? That whole team is gonna be let go too. And it'll all just then rot, if not implode.
The AI boom for corporations is really going to let people know who is swimming naked when it comes to internal data orderliness.
Like, you want to be the person that sell shovels in the AI boom here for enterprise? Be the 'Cleaning Lady' for company data and non-compliance. Go in, kick butts, clean it all up, be hated, leave with a fat check.
Did not know that stack, thanks.
From my perspective as a data architect, I am really focused on the link between the data sources and the data lake, and the proper integration of heterogenous data into a “single” knowledge graph.
For Palantir, it is not very difficult to learn their way of working [their Pipeline Builder feeds a massive spark cluster, and OntologyManager maintains a sync between Spark and a graph database. Their other productivity tools then rely on either one data lake and/or the other].
I wonder how Glean handles the datalake part of their stack. [scalability, refresh rate, etc]
ChatGPTs analogy is more like google. People use enough google, they ain’t gonna switch unless is w quantum leap better + with scale. On the API side things could get commoditized, but it’s more than just having a slightly better LLM in the benchmarks.
There exists no future where OpenAI both sells models through API and has its own consumer product. They will have to pick one of these things to bet the company on.
That's not necessarily true. There are many companies that have both end user products and B2 products they sell. There are a million specific use cases that OpenAI won't build specific products for.
Think Amazon that has both AWS and the retail business. There's a lot of value in providing both.
AI can be used for financial gain, to influence and lie to people, to simulate human connection, to generate infinite content for consumption,... at scale.
In the early days of ChatGPT, I'd get constantly capped, every single day, even on the paid plan. At the time I was sending them messages, begging to charge me $200 to let me use it unlimited.
The enterprise surface area that OpenAI seems to be targeting is very small. The cost curve looks similar to classic cloud providers, but gets very steep much faster. We started on their API and then moved out of the OpenAI ecosystem within ~ 2years as costs grew fast and we see equivalent or better performance with much cheaper and/or OS models, combined with pretty modest hardware. Unless they can pull a bunch of Netflix-style deals the economics here will not work out.
The "open source nature" this time is different. "Open source" models are not actually open source, in the sense that the community can't contribute to their development. At best they're just proprietary freeware. Thus, the continuity of "open source" models depends purely on how long their sponsors sustain funding. If Meta or Alibaba or Tencent decide tomorrow that they're no longer going to fund this stuff, then we're in real trouble, much more than when Red Hat drops the ball.
I'd say Meta is the most important player here. Pretty much all the "open source" models are built in Llama in one way or the other. The only reason Llama exists is because Meta wants to commoditize AI in order to prevent the likes of OpenAI from overtaking them later. If Meta one day no longer believes in this strategy for whatever reason, then everybody is in serious trouble.
> OpenAI is racing against two clocks: the commoditization clock (how quickly open-source alternatives catch up) and the monetization clock (their need to generate substantial revenue to justify their valuation).
Also important to recognize that those clocks aren’t entirely separated. Monetization timeline is shorter if investors perceive that commodification makes future monetization less certain, whereas if investors perceive a strong moat against commodification, new financing without profitable monetization is practical as long as the market perceives a strong enough moat that investment in growth now means a sufficient increase in monetization down the road.
Has anyone heard or seen it used anywhere? I was in-house when it launched to big fanfare by upper management and the vast majority of the company was tasked to create team projects utilizing Watsonm
Watson was a pre-LLM technology, an evolution of IBM's experience with the expert systems which they believed would rule the roost in AI -- until transformers blew all that away.
Am I the only one who's getting annoyed of seeing LLMs be marketed as competent search engines? That's not what they've been designed for, and they have been repeatedly bad at that.
> the commoditization clock (how quickly open-source alternatives catch up)
I believe we are already there at least for the average person.
Using Ollama I can run different LLMs locally that are good enough for what I want to do. That's on a 32GB M1 laptop. No more having to pay someone to get results.
For development Pycharm Pro latest LLM autocomplete is just short of writing everything for me.
"whether large organizations will prioritize the kind of integrated, reliable, and "safe" AI solutions"
While safe in output quality control. SaaS is not safe in terms of data control. Meta's Llama is the winner in any scenario where it would be ridiculous to send user data to a third party.
Yes, but how can this strategy work, and who would choose ChatGPT at this point, when there are so many alternatives, some better (Anthropic), some just as good but way cheaper (Amazon Nova) and some excellent and open-source?
Microsoft is their path into the enterprise. You can use their so-so enterprise support directly or have all the enterprise features you could want via Azure.
There is really not a lot of Open source large language models with that capability. the only game changer so far has been meta open sourcing llama, and that's about it with models of that caliber
I actually pay 166 Euros a month for Claude Teams. Five seats. And I only use one. For myself. Why do I pay so much? Because the normal paid version (20 USD a month) interrups the chats after a dozen questions and wants me to wait a few hours until I can use it again. But Teams plan gives me way more questions.
But why do I pay that much? Because Claude in combination with the Projects feature, where I can upload two dozen or more files, PDFs, text, and give it a context, and then ask questions in this specific context over a period of week or longer, come back to it and continue the inquiry, all of this gives me superpowers. Feels like a handful of researchers at my fingertips that I can brainstorm with, that I can ask to review the documents, come up with answers to my questions, all of this is unbelievably powerful.
I‘d be ok with 40 or 50 USD a month for one user, alas Claude won’t offer it. So I pay 166 Euros for five seats and use one. Because it saves me a ton of work.
Kagi Ultimate (US$25/mo) includes unlimited use of all the Anthropic models.
Full disclosure: I participated in Kagi's crowdfund, so I have some financial stake in the company, but I mainly participated because I'm an enthusiastic customer.
I'm uninformed about this, it may just be superstition, but my feeling while using Kagi in this way is that after using it for a few hours it gets a bit more forgetful. I come back the next day and it's smart again, for while. It's as if there's some kind of soft throttling going on in the background.
I'm an enthusiastic customer nonetheless, but it is curious.
I noticed this too! It's dramatic in the same chat. I'll come back the next day, and even though I still have the full convo history, and it's as if it completely forgot all my earlier instructions.
Makes sense. Keeping the conversation implieas that each new message carries the whole history, again. You need to create new chats from time to time, or throttle to a different model...
This is my biggest gripe with these LLMs. I primarily use Claude, and it exhibits the same described behavior. I'll find myself in a flow state and then somewhere around hour 3 it starts to pretend like it isn't capable of completing specific tasks that it had been performing for hours, days, weeks. For instance, I'm working on creating a few LLCs with their requisite social media handles and domain registrations. I _used_ to be able to ask Claude to check all US State LLC registrations, all major TLD domain registrations, and USPTO against particular terms and similar derivations. Then one day it just decided to stop doing this. And it tells me it can't search the web or whatever. Which is bullshit because I was verifying all of this data and ensuring it wasn't hallucinating - which it never was.
The flow lately has been transforming test cases to accommodate interface changes, so I'm not asking it to remember something from several hours ago, I'm just asking it to make the "same" transformation from the previous prompt, except now to a different input.
It struggles with cases that exceed 1000 lines or so. Not that it loses track entirely at that size, it just starts making dumb mistakes.
Then after about 2 or 3 hours, the size at which it starts to struggle drops to maybe 500. A new chat doesn't seem to help, but who can say, it's a difficult thing to quantify. After 12 hours, both me and the AI are feeling fresh again. Or maybe it's just me, idk.
And if you're about to suggest that the real problem here is that there's so much tedious filler in these test cases that even an AI gets bored with them... Yes, yes it is.
It probably isn’t cheaper for Kagi per token but I assume most people don’t use up as much as they can, like with most other subscriptions.
I.e. I’ve been an Ultimate subscriber since they launched the plan and I rarely use the assistant feature because I’ve got a subscription to ChatGPT and Claude. I only use it when I want to query Llama, Gemini, or Mistral models which I don’t want to subscribe to or create API keys for.
How would you rate Kagi Ultimate vs Arc search? IE is it scraping relevant websites live and summarising them? Or is it just access to ChatGPT and other models (with their old data).
At some point I'm going to subscribe to Kagi again (once I have a job) so be interested to see how it rates.
They extract concepts from their training data and can combine concepts to produce output that isn't part of their training set, but they do require those concepts to be in their training data. So you can ask them to make a picture of your favorite character fighting mecha on an alien planet and it will produce a new image, as long as your favorite character is in their training set. But the extent it imagines an alien planet or what counts as mecha is limited by the input it is trained on, which is where a human artist can provide much more creativity.
You can also expand it by adding in more concepts to better specify things. For example you can specify the mecha look like alphabet characters while the alien planet expresses the randomness of prime numbers and that might influence the AI to produce a more unique image as you are now getting into really weird combinations of concepts (and combinations that might actually make no sense if you think too much about them), but you also greatly increase the chance of getting trash output as the AI can no longer map the feature space back to an image that mirrors anything like what a human would interpret as having a similar feature space.
The paper that coined the term "stochastic parrots" would not agree with the claim that LLMs are "unable to produce a response that isn't in their training data". And the research has advanced a _long_ way since then.
[1]: Bender, Emily M., et al. "On the dangers of stochastic parrots: Can language models be too big?." Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021.
/facepalm. Woosh indeed. Can I blame pronoun confusion? (Not to mention this misunderstanding kicked off a farcically unproductive ensuing discussion.)
When combined with intellectual honesty and curiosity, the best LLMs can be powerful tools for checking argumentation. (I personally recommend Claude 3.5 Sonnet.) I pasted in the conversation history and here is what it said:
> Their position is falsifiable through simple examples: LLMs can perform arithmetic on numbers that weren't in training data, compose responses about current events post-training, and generate novel combinations of ideas.
Spot on. It would take a lot of editing for me to speak as concisely and accurately!
> you can try to convince all you want, but you're just grasping at straws.
After coming back to this to see how the conversation has evolved (it hasn't), I offer this guess: the problem isn't at the object level (i.e. what ML research has to say on this) nor my willingness to engage. A key factor seems to a lack of interest on the other end of the conversation.
Most importantly, I'm happy to learn and/or be shown to be mistaken.
Based on my study (not at the Ph.D. level but still quite intensive), I am confident the comment above is both wrong and poorly framed. Why? Seeing phrases "incapable of thought" and "stochastic parrots" are red flags to me. In my experience, people that study LLM systems are wary of using such brash phrases. They tend to move the conservation away from understanding towards combativeness and/or confusion.
Being this direct might sound brusque and/or unpersuasive. My top concern at this point, not knowing you, is that you might not prioritize learning and careful discussion. If you want to continue discussing, here is what I suggest:
First, are you familiar with the double-crux technique? If not, the CFAR page is a good start.
Second, please share three papers (or high-quality writing from experts): one that supports your claim, one that opposes it, and one that attempts to synthesize.
I'll try again... Can you (or anyone) define "thought" in way that is helpful?
Some other intelligent social animals have slightly different brains, and it seems very likely they "think" as well. Do we want to define "thinking" in some relative manner?
Say you pick a definition requiring an isomorphism to thoughts as generated by a human brain. Then, by definition, you can't have thoughts unless you prove the isomorphism. How are you going to do that? Inspection? In theory, some suitable emulation of a brain is needed. You might get close with whole-brain emulation. But how do you know when your emulation is good enough? What level of detail is sufficient?
What kinds of definitions of "thought" remains?
Perhaps something related to consciousness? Where is this kind of definition going to get us? Talking about consciousness is hard.
Anil Seth (and others) talks about consciousness better than most, for what it is worth -- he does it by getting more detailed and specific. See also: integrated information theory.
By writing at some length, I hope to show that using loose sketches of concepts using words such as "thoughts" or "thinking" doesn't advance a substantive conversation. More depth is needed.
Meta: To advance the conversation, it takes time to elaborate and engage. It isn't easy. An easier way out is pressing the down triangle, but that is too often meager and fleeting protection for a brittle ego and/or a fixated level of understanding.
Sometimes, I get this absolute stroke of brilliance for this idea of a thing I want to make and it's gonna make me super rich, and then I go on Google, and find out that there's already been a Kickstarter for it and it's been successful, and it's now a product I can just buy.
No, but then again you're not paying me $20 per month while I pretend I have absolute knowledge.
You can, however, get the same human experience by contracting a consulting company that will bill you $20 000 per month and lie to you about having absolute knowledge.
I have ChatGPT ($20/month tier) and Claude and I absolutely see this use case. Claude is great but I love long threads where I can have it help me with a series of related problems over the course of a day. I'm rarely doing a one-shot. Hitting the limits is super frustrating.
So I understand the unlimited use case and honestly am considering shelling out for the o1 unlimited tier, if o1 is useful enough.
A theoretical app subscription for $200/month feels expensive. Having the equivalent a smart employee work beside me all day for $200/month feels like a deal.
Yep, I have 2 accounts I use because I kept hitting limits. I was going to do the Teams to get the 5x window, but I got instantly banned when clicking the teams button on a new account, so I ended up sticking with 2 separate accounts. It's a bit of a pain, but I'm used to it. My other account has since been unbanned, but I haven't needed it lately as I finished most of my coding.
NotebookLM is designed for a distinct use case compared to using Gemini's models in a general chat-style interface. It's specifically geared towards research and operates primarily as a RAG system for documents you upload.
I’ve used it extensively to cross-reference and analyse academic papers, and the performance has been excellent so far. While this is just my personal experience (YMMV), it’s far more reliable and focused than Gemini when it comes to this specific use case. I've rarely experienced a hallucination with it. But perhaps that's the way I'm using it.
Have you tried LibreChat https://www.librechat.ai/ and just use it with your own API keys? You pay for what you use and can use and switch between all major model providers
The argument of more compute power for this plan can be true, but this is also a pricing tactic known as the decoy effect or anchoring. Here's how it works:
1. A company introduces a high-priced option (the "decoy"), often not intended to be the best value for most customers.
2. This premium option makes the other plans seem like better deals in comparison, nudging customers toward the one the company actually wants to sell.
In this case for Chat GPT is:
Option A: Basic Plan - Free
Option B: Plus Plan - $20/month
Option C: Pro Plan - $200/month
Even if the company has no intention of selling the Pro Plan, its presence makes the Plus Plan seem more reasonably priced and valuable.
While not inherently unethical, the decoy effect can be seen as manipulative if it exploits customers’ biases or lacks transparency about the true value of each plan.
Of course this breaks down once you have a competitor like Anthropic, serving similarly-priced Plan A and B for their equivalently powerful models; adding a more expensive decoy plan C doesn't help OpenAI when their plan B pricing is primarily compared against Anthropic's plan B.
Leadership at this crop of tech companies is more like followership. Whether it's 'no politics', or sudden layoffs, or 'founder mode', or 'work from home'... one CEO has an idea and three dozen other CEOs unthinkingly adopt it.
Several comments in this thread have used Anthropic's lower pricing as a criticism, but it's probably moot: a month from now Anthropic will release its own $200 model.
As Nvidia's CEO likes to say, the price is set by the second best.
From an API standpoint, it seems like enterprises are currently split between anthropic and ChatGPT and most are willing to use substitutes. For the consumer, ChatGPT is the clear favorite (better branding, better iPhone app)
An example of this is something I learned from a former employee who went to work for Encyclopedia Brittanica 'back in the day'. I actually invited the former employee to come back to our office so I could understand and learn from exactly what he had been taught (noting of course this was back before the internet obviously where info like that was not as available...)
So they charge (as I recall from what he told me I could be off) something like $450 for shipping the books (don't recall the actual amount but it seemed high at the time).
So the salesman is taught to start off the sales pitch with a set of encylopedia's costing at the time let's say $40,000 some 'gold plated version'.
The potential buyer laughs and then salesman then says 'plus $450 for shipping!!!'.
They then move on to the more reasonable versions costing let's say $1000 or whatever.
As a result of the first example of high priced the customer (in addition to the positioning you are talking about) the customer is setup to accept the shipping charge (which was relatively high).
That’s a really basic sales technique much older than the 1975 study. I wonder if it went under a different name or this was a case of studying and then publishing something that was already well-known outside of academia.
I use GPT-4 because 4o is inferior. I keep trying 4o but it consistently underperforms. GPT-4 is not working as hard anymore compared to a few months ago. If this release said it allows GPT-4 more processing time to find more answers and filter them, I’d then see transparency of service and happily pay the money. As it is I’ll still give it a try and figure it out, but I’d like to live in a world where companies can be honest about their missteps. As it is I have to live in this constructed reality that makes sense to me given the evidence despite what people claim. Am I fooling/gaslighting myself?? Who knows?
Glad I'm not the only one. I see 4o as a lot more of a sidegrade. At this point I mix them up and I legitimately can't tell, sometimes I get bad responses from 4, sometimes 4o.
Responses from gpt-4 sound more like AI, but I haven't had seemingly as many issues as with 4o.
Also the feature of 4o where it just spits out a ton of information, or rewrites the entire code is frustrating
Yes the looping. They should make and sell a squishy mascot you could order, something in the style of Clippy, so that when it loops, I could pluck it off my monitor and punch it in the face.
I'm a Plus member, and the biggest limitation I am running into by far is the maximum length of a context window. I'm having context fall out of scope throughout the conversion or not being able to give it a large document that I can then interrogate.
So if I go from paying $20/month for 32,000 tokens, to $200/month for Pro, I expect something more akin to Enterprise's 128,000 tokens or MORE. But they don't even discuss the context window AT ALL.
For anyone else out there looking to build a competitor I STRONGLY recommend you consider the context window as a major differentiator. Let me give you an example of a usage which ChatGPT just simply cannot do very well today: Dump a XML file into it, then ask it questions about that file. You can attach files to ChatGPT, but it is basically pointless because it isn't able to view the entire file at once due to, again, limited context windows.
Seems like something that would be worth pinging OpenAI about because it's a pretty important claim that they are making on their pricing page! Unless it's a matter of counting tokens differently.
According to the pricing page, 32K context is for Plus users and 128K context is for Pro users. Not disagreeing with you, just adding context for readers that while you are explaining that the 4o API has 128K window, the 4o ChatGPT agent appears to have varying context depending on account type.
The longer the context the more backtracking it needs to do. It gets exponentially more expensive. You can increase it a little, but not enough to solve the problem.
Instead you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context.
LLM is a cool tool. You need to build around it. OpenAI should start shipping these other components so people can build their solutions and make their money selling shovels.
Instead they want end user to pay them to use the LLM without any custom tooling around. I don't think that's a winning strategy.
Transformer architectures generally take quadratic time wrt sequence length, not exponential. Architectural innovations like flash attention also mitigate this somewhat.
Backtracking isn't involved, transformers are feedforward.
No, additional context does not cause exponential slowdowns and you absolutely can use FlashAttention tricks during training, I'm doing it right now. Transformers are not RNNs, they are not unrolled across timesteps, the backpropagation path for a 1,000,000 context LLM is not any longer than a 100 context LLM of the same size. The only thing which is larger is the self attention calculation which is quadratic wrt compute and linear wrt memory if you use FlashAttention or similar fused self attention calculations. These calculations can be further parallelized using tricks like ring attention to distribute very large attention calculations over many nodes. This is how google trained their 10M context version of Gemini.
So why are the context windows so "small", then? It would seem that if the cost was not so great, then having a larger context window would give an advantage over the competition.
The cost for both training and inference is vaguely quadratic while, for the vast majority of users, the marginal utility of additional context is sharply diminishing. For 99% of ChatGPT users something like 8192 tokens, or about 20 pages of context would be plenty. Companies have to balance the cost of training and serving models. Google did train an uber long context version of Gemini but since Gemini itself fundamentally was not better than GPT-4 or Claude this didn't really matter much, since so few people actually benefited from such a niche advantage it didn't really shift the playing field in their favor.
Marginal utility only drops because effective context is really bad, i.e. most models still vastly prefer the first things they see and those "needle in a haystack" tests are misleading in that they convince people that LLMs do a good job of handling their whole context when they just don't.
If we have the effective context window equal to the claimed context window, well, I'd start worrying a bit about most of the risks that AI doomers talk about...
There has been a huge increase in context windows recently.
I think the larger problem is "effective context" and training data.
Being technically able to use a large context window doesn't mean a model can actually remember or attend to that larger context well. In my experience, the kinds of synthetic "needle in haystack" tasks that AI companies use to show how large of a context their model can handle don't translate very well to more complicated use cases.
You can create data with large context for training by synthetically adding in random stuff, but there's not a ton of organic training data where something meaningfully depends on something 100,000 tokens back.
Also, even if it's not scaling exponentially, it's still scaling: at what point is RAG going to be more effective than just having a large context?
Great point about the meaningful datasets, this makes perfect sense. Esp. in regards to SFT and RLHF. Although I suppose it would be somewhat easier to do pretraining on really long context (books, I assume?)
Because you have to do inference distributed between multiple nodes at this point. For prefill because prefill is actually quadratic, but also for memory reasons. KV Cache for 405B at 10M context length would take more than 5 terabytes (at bf16). That's 36 H200 just for KV Cache, but you would need roughly 48 GPUs to serve bf16 version of the model. Generation speed at that setup would be roughly 30 tokens per second, 100k tokens per hour, and you can server only a single user because batching doesn't make sense at these kinds of context lengths. If you pay 3 dollars per hour per GPU, it's $1440 per million tokens cost. For fp8 version the numbers are a bit better: you need only 24 GPUs, generation speed stays roughly the same, so it's only 700 dollars per million tokens. There are architectural modifications that will bring that down significantly, but, nonetheless, it's still really really expensive, but also quite hard to get to work.
Another factor in context window is effective recall. If the model can't actually use a fact 1m tokens earlier, accurately and precisely, then there's no benefit and it's harmful to the user experience to allow the use of a poorly functioning feature. Part of what Google have done with Gemini's 1-2m token context window is demonstrate that the model will actually recall and use that data. Disclosure, I do work at Google but not on this, I don't have any inside info on the model.
Memory. I don't know the equation, but its very easy to see when you load a 128k context model at 8K vs 80K. The quant I am running would double VRAM requirements when loading 80K.
> The only thing which is larger is the self attention calculation which is quadratic wrt compute and linear wrt memory if you use FlashAttention or similar fused self attention calculations.
FFWD input is self-attention output. And since the output of self-attention layer is [context, d_model], FFWD layer input will grow as well. Consequently, FFWD layer compute cost will grow as well, no?
The cost of FFWD layer according to my calculations is ~(4+2 * true(w3)) * d_model * dff * n_layers * context_size so the FFWD cost grows linearly wrt the context size.
So, unless I misunderstood the transformer architecture, larger the context the larger the compute of both self-attention and FFWD is?
So you're saying that if I have a sentence of 10 words, and I want the LLM to predict the 11th word, FFWD compute is going to be independent of the context size?
I don't understand how since that very context is what makes the likeliness of output of next prediction worthy, or not?
More specifically, FFWD layer is essentially self attention output [context, d_model] matrix matmul'd with W1, W2 and W3 weights?
I may be missing something, but I thought that each context token would result in an 3 additional parameters per context token for self attention to build its map, since each attention must calculate a value considering all existing context
> you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context
Be aware that this tends to give bad results. Once RAG is involved you essentially only do slightly better than a traditional search, a lot of nuance gets lost.
> Instead you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context.
Isn't that kind of what Anthropic is offering with projects? Where you can upload information and PDF files and stuff which are then always available in the chat?
Because they can't do long context windows. That's the only explanation. What you can do with a 1m token context window is quite a substantial improvement, particularly as you said for enterprise usage.
The only reason I open Chat now is because Claude will refuse to answer questions on a variety of topics including for example medication side effects.
When I tested o1 a few hours ago, it seemed like it was losing context. After I asked it to use a specific writing style, and pasting a large reference text, it forgot my demand. I reminded it, and it kept the rule for a few more messages, and after another long paste it forgot again.
If a $200/month pro level is successful it could open the door to a $2000/month segment, and the $20,000/month segment will appear and the segregation of getting ahead with AI will begin.
Agreed. Where may I read about how to set up an LLM similar to that of Claude, which has the minimum length of Claude's context window, and what are the hardware requirements? I found Claude incredibly useful.
And now you can get the 405b quality in a 70b according to meta. Costs really come down massively with that. I wonder if it's really as good as they say though.
Full blown agents but they have to really able to replace a semi competent, harder than it sounds especially for edge cases where a human can easily get past
With o1-preview and $20 subscription my queries typically were answered in 10-20 seconds. I've tried $200 subscription with some queries and got 5-10 minutes answer time. Unless the load is substantially increased and I was just waiting in queue for computing resources, I'd assume that they throw a lot more hardware for o1-pro. So it's entirely possible that $200/month is still at loss.
I've been concatenating my source code of ~3300 lines and 123979 bytes(so likely < 128K context window) into the chat to get better answers. Uploading files is hopeless in the web interface.
Have you considered RAG instead of using the entire document? It's more complex but would at least allow you to query the document with your API of choice.
When talking about context windows I'm surprised no one mentions https://poe.com/.
Switched over from ChatGPT about a year ago, and it's amazing. Can use all models and the full context window of them, for the same price as a ChatGPT subscription.
Poe.com goes straight to login page, doesn't want to divulge ANY information to me before I sign up. No About Us or Product description or Pricing - nothing. Strange behavior. But seeing it more and more with modern web sites.
What don’t you like about Claude? I believe the context is larger.
Coincidentally I’ve been using it with xml files recently (iOS storyboard files), and it seems to do pretty well manipulating and refactoring elements as I interact with it.
First impressions: The new o1-Pro model is an insanely good writer. Aside from favoring the long em-dash (—) which isn't on most keyboards, it has none of the quirks and tells of old GPT-4/4o/o1. It managed to totally fool every "AI writing detector" I ran it through.
It can handle unusually long prompts.
It appears to be very good at complex data analysis. I need to put it through its paces a bit more, though.
> Aside from favoring the long em-dash (—) which isn't on most keyboards
Interesting! I intentionally edit my keyboard layout to include the em-dash, as I enjoy using it out of sheer pomposity—I should undoubtedly delve into the extent to which my own comments have been used to train GPT models!
On my keyboard (en-us) it's ALT+"-" to get an em-dash.
I use it all the time because it's the "correct" one to use, but it's often more "correct" to just rewrite the sentence in a way that doesn't call for one. :)
Just so you know, text using the em-dash like that combined with a few other "tells" makes me double check if it might be LLM written.
Other things are the overuse of transition words (e.g., "however," "furthermore," "moreover," "in summary," "in conclusion,") as well as some other stuff.
It might not be fair to people who write like that naturally, but it is what it is in the current situation we find ourselves in.
"In the past three days, I've reviewed over 100 essays from the 2024-2025 college admissions cycle. Here's how I could tell which ones were written by ChatGPT"
On Windows em dash is ALT+0151; the paragraph mark (§) is ALT+0167. Once you know them (and a couple of others, for instance accented capitals) they become second nature, and work on all keyboards, everywhere.
Startup I'm at has generated a LOT of content using LLMs and once you've reviewed enough of the output, you can easily see specific patterns in the output.
Some words/phrases that, by default, it overuses: "dive into", "delve into", "the world of", and others.
You correct it with instructions, but it will then find synonyms so there is also a structural pattern to the output that it favors by default. For example, if we tell it "Don't start your writing with 'dive into'", it will just switch to "delve into" or another synonym.
Yes, all of this can be corrected if you put enough effort into the prompt and enough iterations to fix all of these tells.
> if we tell it "Don't start your writing with 'dive into'", it will just switch to "delve into" or another synonym.
LLMs can radically change their style, you just have to specify what style you want. I mean, if you prompt it to "write in the style of an angry Charles Bukowski" you'll stop seeing those patterns you're used to.
In my team for a while we had a bot generating meeting notes "in the style of a bored teenager", and (besides being hilarious) the results were very unlike typical AI "delvish".
Of course the "delve into" and "dive into" is just its default to be corrected with additional instruction. But once you do something like "write in the style of...", then it has its own tells because as I noted below, it is, in the end, biased towards frequency.
Of course there will be a set of tells for any given style, but the space of possibilities is much larger than what a person could recognize. So as with most LLM tasks, the issue is figuring out how to describe specifically what you want.
Aside: not about you specifically, but I feel like complaints on HN about using LLMs often boil down to somebody saying "it doesn't do X", where X is a thing they didn't ask the the model to do. E.g. a thread about "I asked for a Sherlock Holmes story but the output wasn't narrated by Watson" was one that stuck in my mind. You wouldn't think engineers would make mistakes like that, but I guess people haven't really sussed out how to think about LLMs yet.
Anyway for problems like what you described, one has to be wary about expecting the LLM to follow unstated requirements. I mean, if you just tell it not to say "dive into" and it doesn't, then it's done everything it was asked, after all.
I mean, we get it. It's a UX problem. But the thing is you have to tell it exactly what to do every time. Very often, it'll do what you said but not what you meant, and you have to wrestle with it.
You'd have to come up with a pretty exhaustive list of tells. Even sentence structure and mood is sometimes enough, not just the obvious words.
This is the way. Blending two or more styles also works well, especially if they're on opposite poles, e.g. "write like the imaginary lovechild of Cormac McCarthy and Ernest Hemingway."
Also, wouldn't angry Charles Bukowski just be ... Charles Bukowski?
> ...once you've reviewed enough of the output, you can easily see specific patterns in the output
That is true, but more importantly, are those patterns sufficient to distinguish between AI-generated content from human-generated content? Humans express themselves very differently by region and country ( e.g. "do the needful" in not common in the midwest, "orthogonal" and "order of magnitude" are used more on HN than most other places). Outside of watermaking, detecting AI-generated text is with an acceptably small false-positive error rate is nearly impossible.
Not sure why you default to an uncharitable mode in understanding what I am trying to say.
I didn't say they know their own tells. I said they naturally output them for you. Maybe the obvious is so obvious I don't need to comment on it. Meaning this whole "tells analysis" would necessarily rely on synthetic data sets.
I always assumed that they were snake oil because the training objective is to get a model that writes like a human. AI detectors by definition are showing what does not sound like a human, so presumably people will train the models against the detectors until they no longer provide any signal.
The thing is, the LLM has a flaw: it is still fundamentally biased towards frequency.
AI detectors generally can take advantage of this and look for abnormal patterns in frequencies of specific words, phrases, or even specific grammatical constructs because the LLM -- by default -- is biased that way.
I'm not saying this is easy and certainly, LLMs can be tuned in many ways via instructions, context, and fine-tuning to mask this.
They're not very accurate, but I think snake oil is a bit too far - they're better than guessing at least for the specific model(s) they're trained on. OpenAI's classifier [0] was at 26% recall, 91% precision when it launched, though I don't know what models created the positives in their test set. (Of course they later withdrew that classifier due to its low accuracy, which I think was the right move. When a company offers both an AI Writer and an AI Writing detector people are going to take its predictions as gospel and _that_ is definitely a problem.)
All that aside, most models have had a fairly distinctive writing style, particularly when fed no or the same system prompt every time. If o1-Pro blends in more with human writing that's certainly... interesting.
Anecdotally, English/History/Communications professors are confirming cheaters with them because they find it easy to identify false information. The red flags are so obvious that the checker tools are just a formality: student papers now have fake URLs and fake citations. Students will boldly submit college papers which have paragraphs about nonexistent characters, or make false claims about what characters did in a story.
The e-mail correspondence goes like this: "Hello Professor, I'd like to meet to discuss my failing grade. I didn't know that using ChatGPT was bad, can I have some points back or rewrite my essay?"
Yeah but they "detect" the characteristic AI style: The limited way it structures sentences, the way it lays out arguments, the way it tends to close with an "in conclusion" paragraph, certain word choices, etc. o1-Pro doesn't do any of that. It writes like a human.
Damnit. It's too good. It just saved me ~6 hours in drafting a complicated and bespoke legal document. Before you ask: I know what I'm doing, and it did a better job in five minutes than I could have done over those six hours. Homework is over. Journalism is over. A large slice of the legal profession is over. For real this time.
Journalism is not only about writing. It is about sources, talking to people, being on the ground, connecting dots, asking the right questions. Journalists can certainly benefit from AI and good journalists will have jobs for a long time still.
While the above is true, I'd say the majority of what passes as journalism these days has none of the above and the writing is below what an AI writer could produce :(
It's actually surprising how many articles on 'respected' news websites have typos. You'd think there would be automated spellcheckers and at least one 'peer review' (probably too much to ask an actual editor to review the article these days...).
Mainstream news today is written for an 8th grade reading ability. Many adults would lose interest otherwise, and the generation that grew up reading little more than social media posts will be even worse.
AI can handle that sort of writing just fine, readers won't care about the formulaic writing style.
So AI could actually turn journalism more into what it originally was: reporting what is going on, rather than reading and rewriting information from other sources. Interesting possibility.
Is exactly the key element in being able to use spicy autocomplete. If you don't know what you're doing, it's going to bite you and you won't know it until it's too late. "GPT messed up the contract" is not an argument I would envy anyone presenting in court or to their employer. :)
> Before you ask: I know what I'm doing, and it did a better job in five minutes than I could have done over those six hours.
Seems like lawyers could do more faster because they know what they are doing. Experts dont get replaced, they get tools to amplify and extend their expertise
Replacement doesn't happen only if the demand for their services scales proportional to the productivity improvements, which is true sometimes but not always true, and is less likely to be true if the productivity improvements are very large.
It still needs to be driven by someone who knows what they're doing.
Just like when software that was coming out, it may have ended jobs.
But it also helped get things done that wouldn't otherwise, or as much.
In this case, equipping a capable lawyer to be 20x is more like an iron man suit, which is OK. If you can get more done, wit less effort, you are still critical to what's needed.
I noticed a writing style difference, too, and I prefer it. More concise. On the coding side, it's done very well on large (well as large as it can manage) codebase assessment, bug finding, etc. I will reach for it rather than o1-preview for sure.
My 10th grade english teacher (2002, just as blogging was taking off) called it sloppy and I gotta agree with her. These days I see it as youtube punctuation, like jump cut editing for text.
It's not. People just like to pretend they have moral superiority for their opinions on arbitrary writing rules, when in reality the only thing that matters is if you're clearly communicating something valuable.
I'm a professional writer and use em-dashes without a second thought. Like any other component of language, just don't _over_ use them.
That's encouraging to hear that it's a better writer, but I wonder if "quirks and tells" can only be seen in hindsight. o1-pro's quirks may only become apparent after enough people have flooded the internet with its output.
This is a huge improvement over previous GPT and Claude, which use the terrible "space, hyphen, space" construct. I always have to manually change them to em-dashes.
This shouldn’t really be a serious issue nowadays. On macOS it’s Option+Shift+'-', on Windows it’s Ctrl+Alt+Num- or (more cryptic) Alt+0151.
The Swiss army knife solution is to configure yourself a Compose key, and then it’s an easy mnemonic like for example Compose 3 - (and Compose 2 - for en dash).
No internet access makes it very hard to benefit from o1 pro. Most of the complex questions I would ask require google search for research papers, language or library docs, etc. Not sure why o1 pro is banned from the internet, was it caught downloading too much porn or something?
Macs have always been able to type the em dash — the key combination is ⌥⇧- (Option-Shift-hyphen). I often use them in my own writing. (Hope it doesn't make somebody think I'm phoning it in with AI!)
Some autocorrect software automatically converts two hyphens in a row into an emdash. I know that's how it worked in Microsoft Word and just verified it's doing that with Google Docs. So it's not like it's hard to include an emdash in your writing.
This is interesting, because at my job I have to manually edit registration addresses that use the long em-dash as our vendor only supports ASCII. I think Windows automatically converts two dashes to the long em-dash.
How do you have that configured? The Windows+. shortcut was added in a later update to W10 and pops up a GUI for selecting emojis, symbols, or other non-typable characters.
I need help creating a comprehensive Anki deck system for my 8-year-old who is following a classical education model based on the trivium (grammar stage). The child has already:
- Mastered numerous Latin and Greek root words
- Achieved mathematics proficiency equivalent to US 5th grade
- Demonstrated strong memorization capabilities
Please create a detailed 12-month learning plan with structured Anki decks covering:
1. Core subject areas prioritized in classical education (specify 4-5 key subjects)
2. Recommended daily review time for each deck
3. Progression sequence showing how decks build upon each other
4. Integration strategy with existing knowledge of Latin/Greek roots
5. Sample cards for each deck type, including:
- Basic cards (front/back)
- Cloze deletions
- Image-based cards (if applicable)
- Any special card formats for mathematical concepts
For each deck, please provide:
- Clear learning objectives
- 3-5 example cards with complete front/back content
- Estimated initial deck size
- Suggested intervals for introducing new cards
- Any prerequisites or dependencies on other decks
Additional notes:
- Cards should align with the grammar stage focus on memorization and foundational knowledge
- Please include memory techniques or mnemonics where appropriate
- Consider both verbal and visual learning styles
- Suggest ways to track progress and adjust difficulty as needed
Example of the level of detail needed for card examples:
Interesting that it thought for 1m28s on only two tasks. My intuition with o1-preview is that each task had a rather small token limit, perhaps they raised this limit.
If o1-pro is 10% better than Claude, but you are a guy who makes $300,000 per year, but now can make $330,000 because o1-pro makes you more productive, then it makes sense to give Sam $2,400.
Above example makes no sense since it says ChatGPT is 10% better than Claude at first, then pivots to use it as a 10% total productivity enhancer. Which is it?
It's never this clean, but it is direction-ally correct. If I make $300k / year, and I can tell that chatgpt already saves me hours or even days per month, $200 is a laughable amount. If I feel like pro is even slightly better, it's worth $200 just to know that I always have the best option available.
Heck, it's probably worth $200 even if I'm not confident it's better just in case it is.
For the same reason I don't start with the cheapest AI model when asking questions and then switch to the more expensive if it doesn't work. The more expensive one is cheap enough that it doesn't even matter, and $200 is cheap enough (for a certain subsection of users) that they'll just pay it to be sure they're using the best option.
That's only true if your time is metered by the hour; and the vast majority of roles which find some benefit from AI, at this time, are not compensated hourly. This plan might be beneficial to e.g. CEO-types, but I question who at OpenAI thought it would be a good idea to lead their 12 days of hollowhype with this launch, then; unless this is the highest impact release they've got (one hopes it is not).
>This plan might be beneficial to e.g. CEO-types, but I question who at OpenAI thought it would be a good idea to lead their 12 days of hollowhype with this launch, then; unless this is the highest impact release they've got (one hopes it is not).
In previous multi-day marketing campaigns I've ran or helped ran (specifically on well-loved products), we've intentionally announced a highly-priced plan early on without all of its features.
Two big benefits:
1) Your biggest advocates get to work justifying the plan/product as-is, anchoring expectations to the price (which already works well enough to convert a slice of potential buyers)
2) Anything you announce afterward now gets seen as either a bonus on top (e.g. if this $200/mo plan _also_ includes Sora after they announce it...), driving value per price up compared to the anchor; OR you're seen as listening to your audience's criticisms ("this isn't worth it!") by adding more value to compensate.
I work from home and my time is accounted for by way of my productive output because I am very far away from a CEO type. If I can take every Wednesday off because I’ve gained enough productivity to do so, I would happily pay $200/mo out of my own pocket to do so.
$200/user/month isn’t even that high of a number in the enterprise software world.
Employers might be willing to get their employees a subscription if they believe it makes their employees they are paying $$$$$ more X% productive. (Where X% of their salary works out to more than $2400/year)
There is only so much time in the day. If you have a job where increased productivity translates to increases income (not just hourly metered jobs) then you will see a benefit.
> cheapest AI model when asking questions and then switch to the more expensive if it doesn't work.
The thing is, more expensive isn't guaranteed to be better. The more expensive models are better most of the time, but not all the time. I talk about this more in this comment https://news.ycombinator.com/item?id=42313401#42313990
Since LLMs are non-deterministic, there is no guarantee that GPT-4o is better than GPT-4o mini. GPT-4o is most likely going to be better, but sometimes the simplicity of GPT-4o mini makes it better.
As you say, the more expensive models are better most of the time.
Since we can't easily predict which model will actually be better for a given question at the time of asking, it makes sense to stick to the most expensive/powerful models. We could try, but that would be a complex and expensive endeavor. Meanwhile, both weak and powerful models are already too cheap to meter in direct / regular use, and you're always going to get ahead with the more powerful ones, per the very definition of what "most of the time" means, so it doesn't make sense to default to a weaker model.
TBH it's easily in the other direction. If I can get something to clients quicker that's more valuable.
If paying this gets me two days of consulting it's a win for me.
Obvious caveat if cheaper setups get me the same, although I can't spend too long comparing or that time alone will cost more than just buying everything.
The number of times I've heard all this about some other groundbreaking technology... most businesses just went meh and moved on. But for self-employed, if those numbers are right, it may make sense.
It's not worth it if you're a W2 employee and you'll just spend those 2 hours doing other work. Realistically, working 42 hours a week instead of 40 will not meaningfully impact your performance, so doing 42 hours a week of work in 40 won't, either.
I pay $20/mo for Claude because it's been better than GPT for my use case, and I'm fine paying that but I wouldn't even consider something 10x the price unless it is many, many times better. I think at least 4-5x better is when I'd consider it and this doesn't appear to be anywhere close to even 2x better.
That's also not how pricing works, it's about perceived incremental increases in how useful it is (marginal utility), not about the actual more money you make.
Yeah, the $200 seems excessive and annoying, until you realise it depends on how much it saves you. For me it needs to save me about 6 hours per month to pay for itself.
Funny enough I've told people that baulk at the $20 that I would pay $200 for the productivity gains of the 4o class models. I already pay $40 to OpenAI, $20 to Anthropic, and $40 to cursor.sh.
ah yes, you must work at the company where you get paid per line of code. There's no way productivity is measured this accurately and you are rewarded directly in any job unless you are self-employed and get paid per website or something
Being in an AI domain does not invalidate the fundamental logic. If an expensive tool can make you productive enough to offset the cost, then the tool is worth it for all intents and purposes.
I think of them as different people -- I'll say that I use them in "ensemble mode" for coding, the workflow is Claude 3.5 by default -- when Claude is spinning, o1-preview to discuss, Claude to implement. Worst case o1-preview to implement, although I think its natural coding style is slightly better than Claude's. The speed difference isn't worth it.
The intersection of problems I have where both have trouble is pretty small. If this closes the gap even more, that's great. That said, I'm curious to try this out -- the ways in which o1-preview fails are a bit different than prior gpt-line LLMs, and I'm curious how it will feel on the ground.
Okay, tried it out. Early indications - it feels a bit more concise, thank god, certainly more concise than 4o -- it's s l o w. Getting over 1m times to parse codebases. There's some sort of caching going on though, follow up queries are a bit faster (30-50s). I note that this is still superhuman speeds, but it's not writing at the speed Groqchat can output Llama 3.1 8b, that is for sure.
Code looks really clean. I'm not instantly canceling my subscription.
When you say "parse codebases" is this uploading a couple thousand lines in a few different files? Or pasting in 75 lines into the chat box? Or something else?
$ find web -type f \( -name '.go' -o -name '.tsx' \) | tar -cf code.tar -T; cat code.tar | pbcopy
Then I paste it in and say "can you spot any bugs in the API usage? Write out a list of tasks for a senior engineer to get the codebase in basically perfect shape," or something along those lines.
Alternately: "write a go module to support X feature, and implement the react typescript UI side as well. Use the existing styles in the tsx files you find; follow these coding guidelines, etc. etc."
I pay for both GPT and Claude and use them both extensively. Claude is my go-to for technical questions, GPT (4o) for simple questions, internet searches and validation of Claude answers. GPT o1-preview is great for more complex solutions and work on larger projects with multiple steps leading to finish. There’s really nothing like it that Anthropic provides.
But $200/mo is way above what I’m willing to pay.
I have several local models I hit up first (Mixtral, Llama), if I don’t like the results then I’ll give same prompt to Claude and GPT.
Overall though it’s really just for reference and/or telling me about some standard library function I didn’t know of.
Somewhat counterintuitively I spend way more time reading language documentation than I used to, as the LLM is mainly useful in pointing me to language features.
After a few very bad experiences I never let LLM write more than a couple lines of boilerplate for me, but as a well-read assistant they are useful.
But none of them are sufficient alone, you do need a “team” of them - which is why I also don’t see the value is spending this much on one model. I’d spend that much on a system that polled 5 models concurrently and came up with a summary of sorts.
People keep talking about using LLMs for writing code, and they might be useful for that, but I've found them much more useful for explaining human-written code than anything else, especially in languages/frameworks outside my core competency.
E.g. "why does this (random code in a framework I haven't used much) code cause this error?"
About 50% of the time I get a helpful response straight away that saves me trawling through Stack Overflow and random blog posts. About 25% of the time the response is at least partially wrong, but it still helps me get on the right track.
25% of the time the LLM has no idea and won't admit it so I end up wasting a small amount of time going round in circles, but overall it's a significant productivity boost when I'm working on unfamiliar code.
Right on, I like to use local models - even though I also use OpenAI, Anthropic, and Google Gemini.
I often use one or two shot examples in prompts, but with small local models it is also fairly simple to do fine tuning - if you have fine tuning examples, and if you are a developer so you get the training data in the correct format, and the correct format changes for different models that you are fine tuning.
> But none of them are sufficient alone, you do need a “team” of them
Given the sensitivity to parameters and prompts the models have, your "team" can just as easily be querying the same LLM multiple times with different system prompts.
I haven't used ChatGPT in a few weeks now. I still maintain subscriptions to both ChatGPT and Claude, but I'm very close to dropping ChatGPT entirely. The only useful thing it provides over Claude is a decent mobile voice mode and web search.
If you don't want to necessarily have to pick between one or the other, there are services like this one that let you basically access all the major LLMs and only pay per use: https://nano-gpt.com/
I've used TypingMind and it's pretty great, I like the idea of just plugging in a couple API keys and paying a fraction, but I really wish there was some overlap.
If a random query via the API costs a fifth of a cent why can't I can't 10 free API calls w/ my $20/mo premium subscription?
I'm in the same boat — I maintain subscriptions to both.
The main thing I like OpenAI for is that when I'm on a long drive, I like to have conversations with OpenAI's voice mode.
If Claude had a voice mode, I could see dropping OpenAI entirely, but for now it feels like the subscriptions to both is a near-negligible cost relative to the benefits I get from staying near the front of the AI wave.
I've heard so much about Claude and decided to give it a try and it has been rather a major disappointment. I ended up using chatgpt as an assistant for claude's code writing because it just couldn't get things right. Had to cancel my subscription, no idea why people still promote it everywhere like it is 100x times better than chatgpt.
I've heard this a lot and so I switched to Claude for a month and was super disappointed. What are you mainly using ChatGPT for?
Personally, I found Claude marginally better for coding, but far, far worse for just general purpose questions (e.g. I'm a new home owner and I need to winterize my house before our weather drops below freezing. What are some steps I should take or things I should look into?)
It's ironic because I never want to ask an LLM for something like your example general purpose question, where I can't just cheaply and directly test the correctness of the answer
But we're hurtling towards all the internet's answers to general purpose questions being SEO spam that was generated by an LLM anyways.
Since OpenAI probably isn't hiring as many HVAC technicians to answer queries as they are programmers, it feels like we're headed towards a death spiral where either having the LLM do actual research from non-SEO affected primary sources, or finding a human who's done that research will be the only options for generic knowledge questions that are off the beaten path
-
Actually to test my hypothesis I just tried this with ChatGPT with internet access.
The list of winterization tips cited an article that felt pretty "delvey". I search the author's name and their LinkedIn profile is about how they professionally write marketing content (nothing about HVAC), one of their accomplishments is Generative AI, and their like feed is full of AI mentions for writing content.
So ChatGPT is already at a place where when it searches for "citations", it's just spitting back out its own uncited answers above answers by actual experts (since the expert sources aren't as SEO-driven)
> I can't just cheaply and directly test the correctness of the answer
I feel that, but I think for me the key is knowing that LLMs can be wrong and I should treat the answer as a starting point and not an actual expert. I find it really helpful for topics where I don't even know where to start because, like you said, most search engines are utter trash now.
For things like that, I find ChatGPT to be a good diving off point. For example, this is what I got when I asked:
```
Preparing your townhouse for winter involves addressing common issues associated with the region's wet and cool climate. Here's a concise checklist to help you get started:
1. Exterior Maintenance
Roof Inspection: Check for damaged or missing shingles to prevent leaks during heavy rains.
Gutter Cleaning: Remove leaves and debris to ensure proper drainage and prevent water damage.
Downspouts: Ensure they direct water away from the foundation to prevent pooling and potential leaks.
Siding and Trim: Inspect for cracks or gaps and seal them to prevent moisture intrusion.
2. Windows and Doors
Weatherstripping: Install or replace to seal gaps and prevent drafts, improving energy efficiency.
Caulking: Apply around window and door frames to block moisture and cold air.
3. Heating System
Furnace Inspection: Have a professional service your furnace to ensure it's operating efficiently.
Filter Replacement: Change furnace filters to maintain good air quality and system performance.
4. Plumbing
Outdoor Faucets: Disconnect hoses and insulate faucets to prevent freezing.
Pipe Insulation: Insulate exposed pipes, especially in unheated areas, to prevent freezing and bursting.
5. Landscaping
Tree Trimming: Prune branches that could break under snow or ice and damage your property.
Drainage: Ensure the yard slopes away from the foundation to prevent water accumulation.
6. Safety Checks
Smoke and Carbon Monoxide Detectors: Test and replace batteries to ensure functionality.
Fireplace and Chimney: If applicable, have them inspected and cleaned to prevent fire hazards.
By addressing these areas, you can help protect your home from common winter-related issues in Seattle's climate.
```
Once I dove into the links ChatGPT provided I found the detail I needed and things I needed to investigate more, but it saved 30 minutes of pulling together a starting list from the top 5-10 articles on Google.
Claude Sonnet 3.5 has outperformed o1 in most tasks based on my own anecdotal assessment. So much so that I'm debating canceling my ChatGPT subscription. I just literally do not use it anymore, despite being a heavy user for a long time in the past
Is a "reasoning" model really different? Or is it just clever prompting (and feeding previous outputs) for an existing model? Possibly with some RLHF reasoning examples?
OpenAI doesn't have a large enough database of reasoning texts to train a foundational LLM off it? I thought such a db simply does not exist as humans don't really write enough texts like this.
It's trained via reinforcement learning on essentially infinite synthetic reasoning data. You can generate infinite reasoning data because there are infinite math and coding problems that can be created with machine-checkable solutions, and machines can make infinite different attempts at reasoning their way to the answer. Similar to how models trained to learn chess by self-play have essentially unlimited training data.
We don't know the specifics of GPT-o1 to judge, but we can look at open weights model for an example. Qwen-32B is a base model, QwQ-32B is a "reasoning" variant. You're broadly correct that the magic, such as it is, is in training the model into a long-winded CoT, but the improvements from it are massive. QwQ-32B beats larger 70B models in most tasks, and in some cases it beats Claude.
I just tried QwQ 32B, i didn't know about it. I used it to generate, some code GPT generated 2 days ago perfect code without even sweating.
QwQ generated 10 pages of it's reasoning steps, and the code is probably not correct. [1] includes both answers from QwQ and GPT.
Breaking down it's reasoning steps to such an excruciating detailed prose is certainly not user friendly, but it is intriguing. I wonder what an ideal use case for it would be.
To my understanding, Anthropic realizes that they can’t compete in name recognition yet, so they have to overdeliver in terms of quality to win the war. It’s hard to beat the incumbent, especially when “chatgpt’ing” is basically a well understood verb.
They don't have a model that does o1-style "thought tokens" or is specialized for math, but Sonnet 3.6 is really strong in other ways. I'm guessing they will have an o1-style model within six months if there's demand
Same. Honestly if they released a $200 a month plan I’d probably bite, but OpenAI hasn’t earned that level of confidence from me yet. They have some catching up to do.
The main difficulty when pricing a monthly subscription for "unlimited" usage of a product is the 1% of power users who use have extreme use of the product that can kill any profit margins for the product as a whole.
Pricing ChatGPT Pro at $200/mo filters it to only power users/enterprise, and given the cost of the GPT-o1 API, it wouldn't surprise me if those power users burn through $200 worth of compute very, very quickly.
They are ready for this, there is a policy against automation, sharing or reselling access; it looks like there are some unspecified quotas as well:
> We have guardrails in place to help prevent misuse and are always working to improve our systems. This may occasionally involve a temporary restriction on your usage. We will inform you when this happens, and if you think this might be a mistake, please don’t hesitate to reach out to our support team at help.openai.com using the widget at the bottom-right of this page. If policy-violating behavior is not found, your access will be restored.
Is there any evidence to suggest this is true? IIRC there was leaked information that OpenAI's revenue was significantly higher than their compute spending, but it wasn't broken down between API and subscriptions so maybe that's just due to people who subscribe and then use it a few times a month.
> OpenAI's revenue was significantly higher than their compute spending
I find this difficult believe, although I don't doubt leaks could have implied it. The challenge is that "the cost of compute" can vary greatly based on how it's accounted for (things like amortization, revenue recognition, capex vs opex, IP attribution, leasing, etc). Sort of like how Hollywood studio accounting can show a movie as profitable or unprofitable, depending on how "profit" is defined and how expenses are treated.
Given how much all those details can impact the outcome, to be credible I'd need a lot more specifics than a typical leak includes.
Is compute that expensive? An H100 rents at about $2.50/hour, it's 80 hours of pure compute. Assuming 720 hours a month, 1/9 duty cycle around the clock, or 1/3 if we assume 8-hour work day. It's really intense, constant use. And I bet OpenAI spend less on operating their infra than the rate at which cloud providers rent it out.
You need enough RAM to store the model and the KV-cache depending on context size. Assuming the model has a trillion parameters (there are only rumours how many there actually are) and uses 8 bit per parameter, 16 H100 might be sufficient.
A single H100 has 80GB of memory, meaning that at FP16 you could roughly fit a 40B parameter model on it, or at FP4 quantisation you could fit a 160B parameter model on it. We don't know (I don't think) what quantisation OpenAI use, or how many parameters o1 is, but most likely...
...they probably quantise a bit, but not loads, as they don't want to sacrifice performance. FP8 seems like a possible middle ground. o1 is just a bunch of GPT-4o in a trenchcoat strung together with some advanced prompting. GPT-4o is theorised to be 200B parameters. If you wanted to run 5 parallel generation tasks at peak during the o1 inference process, that's 5x 200B, at FP8, or about 12 H100s. 12 H100s takes about one full rack of kit to run.
I was testing out a chat app that supported images. Long conversations with multiple images in the conversation can be like .10cents per message after a certain point. It sure does add up quickly
There are many use cases for which the price can go even higher. I look at recent interactions with people that were working at an interview mill: Multiple people in a boiler room interviewing for companies all day long, with a computer set up so that our audio was being piped to o1. They had a reasonable prompt to remove many chatbot-ism, and make it provide answers that seem people-like: We were 100% interviewing the o1 model. The operator said basically nothing, in both technical and behavioral interviews.
A company making money off of this kind of scheme would be happy to pay $200 a seat for an unlimited license. And I would not be surprised if there were many other very profitable use cases that make $200 per month seem like a bargain.
So, wait a minute, when interviewing candidates, you're making them invest their valuable time talking to an AI interviewer, and not even disclosing to them that they aren't even talking to a real human? That seems highly unethical to me, yet not even slightly surprising. My question is, what variables are being optimized for here? It's certainly not about efficiently
matching people with jobs, it seems to be more about increasing the number of interviews, which I'm sure benefits the people who get rewarded for the number of interviews, but seems like entirely the wrong metric.
Scams and other antisocial use cases are basically the only ones for which the damn things are actually the kind of productivity rocket-fuel people want them to be, so far.
We better hope that changes sharply, or these things will be a net-negative development.
Right? To me it's eerily similar to how cryptocurrency was sold as a general replacement for all money uses, but turned out to be mainly useful for societally negative things like scams and money laundering.
It sounds like a setup where applicants hire some third-party company to perhaps "represent the client" in the interview and that company hired a bunch of people to be the interviewee on their clients behalf. Presumably also neither the company nor the applicant disclose this arrangement to the hiring manager.
If any company wants me to be interviewed by AI to represent the client, I'll consider it ethical to let an AI represent me. Then AIs can interview AIs, maybe that'll get me the job. I have strong flashbacks to the movie "Surrogates" for some reason.
Decades ago in Santa Cruz county California, I had to have a house bagged for termites for the pending sale. Turned out there was one contractor licensed to do the poison gas work, and all the pest service companies simply subcontracted to him. So no matter what pest service you chose, you got the same outfit doing the actual work.
I used to work for a manufacturing company that did this. They offered a standard, premium, and "House Special Product". House special was 2x premium but the same product. They didn't even pretend it wasn't, they just said it was recommended and people bought it.
I had this happen once at a car wash. The first time I went I paid for a $25 premium package with all the bells and whistles. They seemed to do a good job. The next time I went for the basic $10 one. Exact same thing.
Yesterday, I spent 4.5hrs crafting a very complex Google Sheets formula—think Lambda, Map, Let, etc., for 82 lines. If I knew it would take that long, I would have just done it via AppScript. But it was 50% kinda working, so I kept giving the model the output, and it provided updated formulas back and forth for 4.5hrs. Say my time is $100/hr - that’s $450. So even if the new ChatGPT Pro mode isn’t any smarter but is 50% faster, that’s $225 saved just in time alone. It would probably get that formula right in 10min with a few back-and-forth messages, instead of 4.5hrs. Plus, I used about $62 worth of API credits in their not-so-great Playground. I see similar situations of extreme ROI every few days, let alone all the other uses. I’d pay $500/mo, but beyond that, I’d probably just stick with Playground & API.
> so I kept giving the model the output, and it provided updated formulas back and forth for 4.5hrs
I read this as: "I have already ceded my expertise to an LLM, so I am happy that it is getting faster because now I can pay more money to be even more stuck using an LLM"
Maybe the alternative to going back and forth with an AI for 4.5 hours is working smarter and using tools you're an expert in. Or building expertise in the tool you are using. Or, if you're not an expert or can't become an expert in these tools, then it's hard to claim your time is worth $100/hr for this task.
I agree going back and forth with an AI for 4.5 hours is usually a sign something has gone wrong somewhere, but this is incredibly narrow thinking. Being an open-ended problem solver is the most valuable skill you can have. AI is a huge force multiplier for this. Instead of needing to tap a bunch of experts to help with all the sub-problems you encounter along the way, you can just do it yourself with AI assistance.
That is to say, past a certain salary band people are rarely paid for being hyper-proficient with tools. They are paid to resolve ambuguity and identify the correct problems to solve. If the correct problem needs a tool that I'm unfamiliar with, using AI to just get it done is in many cases preferable to locating an expert, getting their time, etc.
If somebody claims that something can be done with LLM in 10 minutes which takes 4.5 hours for them, then they are definitely not experts. They probably have some surface knowledge, but that’s all. There is a reason why the better LLM demos are related to learn something new, like a new programming language. So far, all of the other kind of demos which I saw (e.g. generating new endpoints based on older ones) were clearly slower than experts, and they were slower to use for me in my respective field.
For no true Scotsman, you need to throw out a counter example by using a misrepresented or wrong definition, or just simply using a definition wrongly. But in any case I need a counter example for that specific fallacy. I didn’t have, and I still don’t have.
I understand that some people maybe think themselves experts, and they could achieve similar reduction (not in the cases which I said that it’s clearly possible), but then show me, because I still haven’t seen a single one. The ones which were publicly showed were not quicker than average seniors, and definitely worse than the better ones. Even in larger scale in my company, we haven’t seen any performance improvement in any single metric regarding coding after we introduced it more than half years ago.
Here's your counterexample: “Copilot has dramatically accelerated my coding. It’s hard to imagine going back to ‘manual coding,’” Karpathy said. “Still learning to use it, but it already writes ~80% of my code, ~80% accuracy. I don’t even really code, I prompt & edit.” -- https://siliconangle.com/2023/05/26/as-generative-ai-acceler...
It's not a counterexample. There is exactly zero exact information in it. It's just a statement from somebody who profits from such statements. Even if I just say that's not true has more value, because I would even benefit from what Karpathy said, if it had been true.
So, just to be specific, and specifically for ChatGPT (I think it was 4), these are very-very problematic, because all of these are clear lies:
In this case, the guy clearly slower than simple copy-paste, and modification.
I had very similar experiences. Sometimes it just used a different method, which does almost the same, just worse. I had to even check what the heck is the used method, because it's not used for obvious reasons, because it was an "internal" one (like apt and apt-get).
A more convenient manual that frequently spouts falsehoods, sure.
My favorite part is when it includes parameters in its output that are not and have never been a part of the API I'm trying to get it to build against.
My favorite part is when it includes parameters in its output that are not and have never been a part of the API I'm trying to get it to build against.
The thing is, when it hallucinates API functions and parameters, they aren't random garbage. Usually, those functions and parameters should have been there.
More than that, one of the standard practices in development is writing code with imaginary APIs that are convenient at the point of use, and then reconciling the ideal with the real - which often does involve adding the imaginary missing functions or parameters to the real API.
Long excel formulas are really just bad "one liners". You should be splitting your operation into multiple cells or finding a more elegant solution. This is especially true in excel where your debug tools are quite limited!
Expect more of this as they scramble to course-correct from losing billions every year, to hitting their 2029 target for profitability. That money's gotta come from somewhere.
> Price hikes for the premium ChatGPT have long been rumored. By 2029, OpenAI expects it’ll charge $44 per month for ChatGPT Plus, according to reporting by The New York Times.
I suspect a big part of why Sora still isn't available is because they couldn't afford to offer it on their existing plans, maybe it'll be exclusive to this new $200 tier.
Runway is $35 a month to generate 10 second clips and you really get very few generations for that. $95 a month for unlimited 10 second clips.
I love art and experimental film. I really was excited for Sora but it will need what feels like unlimited generation to explore what it can do . That is going to cost an arm and a leg for the compute.
Something about video especially seems like it will need to be ran locally to really work. Pay a monthly fee for the model that can run as much as you want with your own compute.
I give o1 a URL and I ask it to comment on how well the corresponding web page markets a service to an audience I define in clear detail.
o1 generates a couple of pages of comments before admitting it didn’t access the web page and entirely based its analysis on the definition of the audience.
If one makes $150 an hour and it saves them 1.25 hours a month, then they break even. To me, it's just a non-deterministic calculator for words.
If it getting things wrong, then don't use it for those things. If you can't find things that it gets right, then it's not useful to you. That doesn't mean those cases don't exist.
I don't think this math depends on where that time is saved.
If I do all my work in 10 hours, I've earned $1500. If I do it all in 8 hours, then spend 2 hours on another project, I've earned $1500.
I can't bill the hours "saved" by ChatGPT.
Now, if it saves me non-billing time, then it matters. If I used to spend 2 hours doing a task that ChatGPT lets me finish in 15 minutes, now I can use the rest of that time to bill. And that only matters if I actually bill my hours. If I'm salaried or hourly, ChatGPT is only a cost.
And that's how the time/money calculation is done. The idea is that you should be doing the task that maximizes your dollar per hour output. I should pay a plumber, because doing my own plumbing would take too much of my time and would therefore cost more than a plumber in the end. So I should buy/use ChatGPT only if not using it would prevent me from maximizing my dollar per hour. At a salaried job, every hour is the same in terms of dollars.
My firm's advertised billing rate for my time is $175/hour as a Sr Software Engineer. I take home ~$80/hour, accounting for benefits and time off. If I freelanced I could presumably charge my firm's rate, or even more.
This is in a mid-COL city in the US, not a coastal tier 1 city with prime software talent that could charge even more.
Ironically, the freelance consulting world is largely on fire due to the lowered barrier of entry and flood of new consultants using AI to perform at higher levels, driving prices down simply through increased supply.
I wouldn't be surprised if AI was also eating consultants from the demand side as well, enabling would-be employers to do a higher % of tasks themselves that they would have previously needed to hire for.
That's what they are billed at, what they take home from that is probably much lower. At my org we bill folks out for ~$150/hr and their take home is ~$80/hr
On the one hand, there's the moral argument: we need janitors and plumbers and warehouse workers and retail workers and nurses and teachers and truck drivers for society to function. Why should their time be valued less than anyone elses?
On the other hand there's the economic argument: the supply of people who can stock shelves is greater than the supply of people who can "create value" at a tech company, so the latter deserve more pay.
Depending on how you look at the world, high salaries can seem insane.
I don’t even remotely understand what you’re saying is wrong. Median salaries are significantly higher in the US compared to any other region. Nominal and PPP adjusted AND accounting for taxes/social benefits. This is bad?
Those jobs you referenced do not have the same requirements nor the same wages…seems like your just clumping all of those together as “lower class” so you can be champion of the downtrodden
I do wonder what effect this will have on furthering the divide between the "rich West" and the rest of the world.
If everyone in the West has powerful AI and Agents to automate everything. Simply because we can afford it, but the rest of the world doesn't have access to it.
Anecdotally, as an educator, I am already seeing a digital divide occurring, with regard to accessing AI. This is not even at a premium/pro subscription level, but simply at a 'who has access to a device at home or work' level, and who is keeping up with the emerging tech.
I speak to kids that use LLMs all the time to assist them with their school work, and others who simply have no knowledge that this tech exists.
What are some productive ways students are using LLMs for aiding learning? Obviously there is the “write this paper for me” but that’s not productive. Are students genuinely doing stuff like “2 + x = 4, help me understand how to solve for x?”
I challenge what I read in textbooks and hear from lecturers by asking for contrary takes.
For example, I read a philosopher saying "truth is a relation between thought and reality". Asking ChatGPT to knock it revealed that statement is an expression of the "correspondence theory" of truth, but that there is also the "coherence theory" of truth that is different, and that there is a laundry list of other takes too.
My son doesn't use it but I use to help him with his homework. For example, I can take a photograph of his math homework and get the LLM to mark the work, tell me what he got wrong, and make suggestions on how to correct it.
Absolutely. My son got a 6th grade AI “ban” lifted by showing how they could use it productively.
Basically they had to adapt a novel to a comic book form — by using AI to generate pencil drawings, they achieved the goal of the assignment (demonstrating understanding of the story) without having the computer just do their homework.
Huh the first prompt could have been "how would you adapt this novel to comic book form? Give me the breakdown of what pencil drawings to generate and why"
At the time, the tool available was Google Duet AI, which didn’t expose that capability.
The point is, AI is here, and it can be a net positive if schools can use it like a calculator vs a black market. It’s a private school with access to some alumni money for development work - they used this to justify investing in designing assignments that make AI a complement to learning.
I recently saw someone revise for a test by asking chatgpt to create practice questions for them on the topics they were revising. I know other people who use it to practice chatting in a foreign language they are trying to learn.
The anology I would use is extended phenotype evolution in digital space as Richard Dawkins would say. Just as crabs in oceans use shells to protect themselves.
Even if its not making you smarter, AI is definitely making you more productive. That essentially means you get to outproduce poorer people, if not out-intellectualize them
Don't you worry; the "rich West" will have plenty of disenfranchised people out of work because of this sort of thing.
Now, whether the labor provided by the AI will be as high-quality as that provided by a human when placed in an actual business environment will be up in the air. Probably not, but adoption will be pushed by the sunk cost fallacy.
I’m watching some of this happening first and second hand, and have seen a lot of evidence of companies spending a ton of money on these, spinning up departments, buying companies, pivoting their entire company’s strategy to AI, et c, and zero of its meaningfully replacing employees. It takes very skilled people to use LLMs well, and the companies trying to turn 5 positions into 2 aren’t paying enough to reliably get and keep two people who are good at it.
I’ve seen it be a minor productivity boost, and not much more.
I mean, yes, that is in practice what I’m seeing so far. A lot of spending, and if they’re lucky productivity doesn’t drop. Best case I’ve seen so far is that it’s a useful tool that gives a small boost, but even for that a lot of folks are so bad at using them that it’s not helping.
The situation now is kinda like back when it was possible to be “good at Google” and lots of people, including in tech, weren’t. It’s possible to be good at LLMs, and not a lot of people are.
Yes. The people who can use these tools to dramatically increase their capabilities and output without a significant drop in quality were already great engineers for which there was more demand than supply. That isn't going to change soon.
Ditto for other use cases, like writer and editor. There are a ton of people doing that work whom I don’t think are ever going to figure out how to use LLMs well. Like, 90% of them. And LLMs are nowhere near making the rest so much better that they can make up for that.
They’re ok for Tom the Section Manager to hack together a department newsletter nobody reads, though, even if Tom is bad at using LLMs. They’re decent at things that don’t need to be any good because they didn’t need to exist in the first place, lol.
I disagree. By far, most of the code is created by perpetually replaced fresh juniors churning out garbage. Similarly, most of the writing is low-quality marketing copy churned out by low-paid people who may or may not have "marketing" in their job title.
Nah, if the last 10-20 years demonstrated something, it's that nothing needs to be any good, because a shitty simulacrum achieves almost the same effect but costs much less time and money to produce.
(Ironically, SOTA LLMs are already way better at writing than typical person writing stuff for money.)
> (Ironically, SOTA LLMs are already way better at writing than typical person writing stuff for money.)
I’m aware of multiple companies that would love to know about these, because they’re currently flailing around trying to replace writers with editors + LLMs and it’s not going great. The closest to success are the ones that are only aiming to turn out stuff one step better than outright book-spam, and even they aren’t quite where they want to be, hardly a productivity bump at all from the LLM use and increased demand on their few talented humans.
Yeah, but it’s a bit trickier with them, given how they still operate in US and listed in NYSE. Also if they keep releasing open source code, people will still just use it… basically the Meta way of adoption into their AI ecosystem.
If $200 a month is the price, most of the West will be left behind also. If that happens we will have much bigger problems of a revolution sort on our hands.
I think the tech-elite would espouse "raising the ceiling" vs "raising the floor" models to prioritize progress. Each has it's own problems. The reality is that the dienfranchised don't really have a voice. The impact of not involving them with access is not well understood as much as the impact of prioritizing access to those who can afford it is.
We don't have a post-cold war era response akin to the kind of US led investment in a global pact to provide protection, security, and access to innovation founded in the United States. We really need to prioritize a model akin to the Bretton Woods Accord
If the models are open, the rest of the world will run them locally.
If the models are closed, the West will become a digital serfdom to anointed AI corporations, which will be able to gouge prices, inject ads, and influence politics with ease.
tbh a lot of the rest of the world already has the ability to get tasks they don't want to do done for <$200 per month in the form of low wage humans. Some of their middle classes might be scratching their heads wondering why we've delegating creativity and communication to allow more time to do laundry rather than delegating laundry to allow more time for creativity and communication...
I actually suspect the opposite. If you get access to or steal a large LLM you can potentially massively leverage the talent pool you have as a small country.
Has it really made that much of a difference in the first place? I have a feeling that we'll look back in 10 years and not even notice the "AI revolution" on any charts of productivity, creating a productivity paradox 3.0.
I can imagine the headlines now: "AI promised unlimited productivity, 10 years later, we're still waiting for the rapture"
Kai-Fu Lee's AI Superpowers is more relevant than ever.
The rich west will be in the lead for awhile and then get tiktok-ed.
The lead is just not really worth that much in the long run.
There is probably an advantage gained at some point in all this of being a developing country too that doesn't need to bother automating all these middle management and bullshit jobs they don't have.
I know a guy who owned a tropical resort on a island where competiton was sprouting up all around him. He was losing money trying to keep up with the quality offered by his neighbors. His solution was to charge a lot more for an experience that was really no better, and often worse, than the resorts next door. This didn't work.
After a few hours of $200 Pro usage, it's completely worth it. Having no limit on o1 usage is a game changer, where I felt so restricted before, the amount of intelligence at the palm of my hand UNLIMITED feels a bit scary.
I was using aider last night and ran up a $10 bill within two hours using o1 as the architect and Sonnet as the editor. It’s really easy to blow through $200 a month and o1-pro isn’t available in the API as far as I can tell.
I generally find o1, or the previous o1-preview to perform better than Claude 3.5 Sonnet in complex reasonings, new Sonnet is more on-par with o1-mini in my experience.
Creating somewhat complex python scripts at work to automate some processes which incorporate like 3-4 APIs, and next I'll be replacing our excise tax processing (which costs us like $500/month) since we already have all the data.
Personal use I'll be using it to upgrade all my website code. I literally took a screenshot of Apple.com and combined it with existing code from my website and told o1 pro to combine the two... the results were really good, especially for one shot... But again, I have unlimited fast usage so I can just keep tweaking and tweaking.
I also have this history idea I've been wanting to do for a while, might see if the models are advanced enough yet.
All this with an understanding on how programming works, but not being able to code.
Interesting, thanks for the details. I haven't played around with o1 enough yet. The kinds of tasks I had it do seemed to be performed just as well by 4o. I'm sure I just wasn't throwing enough at it.
A lot of these tools aren't going to have this kind of value (for me) until they are operating autonomously at some level. For example, "looking at" my inbox and prepping a bundle of proposed responses for items I've been sitting on, drafting an agenda for a meeting scheduled for tomorrow, prepping a draft LOI based on a transcript of a Teams chat and my meeting notes, etc. Forcing me to initiate everything is (uncomfortably) like forcing me to micromanage a junior employee who isn't up to standards: it interrupts the complex work the AI tool cannot do for the lower value work it can.
I'm not saying I expect these tools to be at this level right now. I'm saying that level is where I will start to see these tools as anything more than an expensive and sometimes impressive gimmick. (And, for the record, Copilot's current integration into Office applications doesn't even meet that low bar.)
Any AI product sold for a price that's affordable on a third-world salary is being heavily subsidized. These models are insanely expensive to train, guzzle electricity to the point that tech companies are investing in their own power plants to keep them running, and are developed by highly sought-after engineers being paid millions of dollars a year. $20/month was always bound to be an intro offer unless they figured out some way to reduce the cost of running the model by an order of magnitude.
We've been conditioned to pay $10/mo for an endless stream of gloried CRUD apps, but it is very common for specialized software to cost orders of magnitude more. Think Bloomberg Terminal, Cadence, Maya, lots of CAD software (like SOLIDWORKS), higher tiers of Adobe etc. all running in the thousands of dollars per user. And companies happily pay for them because of the value they add. ChatGPT isn't any different.
Tangent. Does any body have good tips for working in a company that is totally bought in on all this stuff, such that the codebase is a complete wreck? I am in a very small team, and I am just a worker, not a manager or anything. It has become increasingly clear that most if not all my coworkers rely on all this stuff so much. Spending hours trying to give benefit of the doubt to huge amounts of inherited code, realizing there is actually no human bottom to it. Things are merged quickly, with very little review, because, it seems, the reviewers can't really have their own opinion about stuff anymore. The idea of "idiomatic" or even "understandable" code seems foreign at this place. I asked why we don't use more structural directives in our angular frontend, and people didn't know what I was talking about!
I don't want the discourse, or tips on better prompts. Just tips for being able to interact with the more heavy AI-heads, to maybe encourage/inspire curiosity and care in the actual code, rather than the magic chatgpt outputs. Or even just to talk about what they did with their PR. Not for some ethical reason, but just to make my/our jobs easier. Because its so hard to maintain this code now, it is like truly a nightmare for me everyday seeing what has been added, what now needs to be fixed. Realizing nobody actually has this stuff in their heads, its all just jira ticket > prompt > mission accomplished!
I am tired of complaining about AI in principle. Whatever, AGI is here, "we too are stochastic parrots", "my productivity has tripled", etc etc. Ok yes, you can have that, I don't care. But can we like actually start doing work now? I just want to do whatever I can, in my limited formal capacity, to steer the company to be just a tiny bit more sustainable and maybe even enjoyable. I just don't know how to like... start talking about the problem I guess, without everyone getting super defensive and doubling down on it. I just miss when I could talk to people about documentation, strategy, rationale..
Found it better to not fight it, you can't really turn back the clock with people who have embraced it or become enamored by it. Part of the issue I've noticed with it is it enables people who couldn't do a thing at all to do the most basic version of a thing, e.g a CEO can now make a button appear on the app and maybe it'll kinda work, they then assume this magic experience to them is applicable across the rest of coding where if you actually know how to code making the button appear isn't the thing that's difficult, it's the harder work that the AI can't really solve.
But really you're never going to convince these people so I'd say if you're really passionate about coding find a workplace with similar minded people, if you really want to stay in this job then embrace it, stop caring if the codebase is good or maintainable and just let the slop flow. It's the path of least resistance and stress, trying to fight it and convince people is a losing and frustrating battle, take your passion for your work and invest it in a project outside work or find a workplace where they appreciate it too.
> Things are merged quickly, with very little review
Sounds like the real problem is lax pre-existing dev practices rather than just LLM usage. If code is getting merged with little review, that is a big red flag right away. But the 'very little' gives some hope - that means there is some review?
So what happens when you see problems with the code and give review feedback and ask why things have been done the way they were done, or suggest alternative better approaches? That should make it clear first if devs actually understand the code they are submitting, and second if they are willing to listen to suggested improvements. And if they blow you off, and the tech leads on the project also don't care, then it sounds like a place you don't want to stick around.
Question, what stops openai from downgrading existing models so that you're pushed up the subscription tiers to ever more expensive models? I'd imagine they're currently losing a ton of money supplying everyone with decent models with a ton of compute behind them because they want us to become addicted to using them right? The fact that classic free web searching is becoming diluted by low quality AI content will make us rely on these LLMs almost exclusively in a few years or so. Am I seeing this wrong?
It's definitely not impossible. I think the increase competition they've begun to face over the last year is helping as a deterrent. If people notice GPT 4 sucks now and they can get Claude 3.5 Sonnet for the same price, they'll move. If the user doesn't care enough to move, they weren't going to upgrade anyway.
Also depends on the friction to move. I admittedly have not really started using AI in my work, so I don't know. Is it easy to replace GPT with Claude or do I have to reconfigure a bunch of integration and learn new usage?
It depends on the tool you use and I guess the use case too. Some are language model agnostic like aider in the command line, I use sonnit sometimes and then 4o other times. I wonder if or when language models will become highly differentiable. Right now I see them more like a commodity that are relatively interchangeable but that is shifting slightly with other features as they battle to become platforms
They don’t need to downgrade what is already downgraded. In my experience ChatGPT was much more capable a year ago than it is now and have become more dogmatic. Their latest updates have focused on optimizing benchmark scenarios while reducing computation costs.
What's important, and I don't think has ever been revealed by OpenAI, is what the margin is on actual use of the models.
If they're losing money but just because they're investing billions in R&D, while only spending a few hundred million to serve the use that's bringing in $1.6B then it would be a positive story despite the technical loss, just like Amazon's years if aggressive growth at the cost of profits.
But if they're losing money because the server costs needed for the use that brings in $1.6B are $3B then they've got a scaling problem until they either raise prices or lower costs or both.
Part of my justification for spending $20 per month on ChatGPT Plus was that I'd have the best access to the latest models and advanced features. I'll probably roll back to the free plan rather than pay $20/mo for mid tier plan access and support.
In the past, $20 got me the most access to the latest models and tools. When OpenAI rolled out new advanced features, the $20 per month customers always got full / first access. Now the $200 per month customers will have the most access to the latest models and tools, not the (now) mid/low tier customers. That seems like less to me.
They probably didn't pay for access to a certain version of a model, they paid for access to the best available model, whatever that is at any given moment. I'm reasonably sure that is even what OpenAI implied (or outright said) their subscription would get them. Now, it's the same amount of money for access to the second best model, which would feel like a regression.
Did you read the post you're replying to? It's very short. He was paying for top-tier service, and now, despite paying the same amount, has become a second-class customer overnight.
It does not say anything about real use cases. It performs better and "reason" better than o1-preview and o1. But I was expecting some real-life scenarios when it would be useful in a way no other model can do now.
The ultimate success of this strategy depends on what we might call the enterprise AI adoption curve - whether large organizations will prioritize the kind of integrated, reliable, and "safe" AI solutions OpenAI is positioning itself to provide over cheaper but potentially less polished alternatives.
This is strikingly similar to IBM's historical bet on enterprise computing - sacrificing the low-end market to focus on high-value enterprise customers who would pay premium prices for reliability and integration. The key question is whether AI will follow a similar maturation pattern or if the open-source nature of the technology will force a different evolutionary path.