Hacker News new | past | comments | ask | show | jobs | submit | lupire's comments login

How does it do on problems whose solutions aren't published in 100 hundred year old books?

It doesn't just regurgitate the answer, it shows its work, and it still works if you change the numbers.

Being able to access, apply, and restate reasoning processes described in 100-year-old books isn't the insult you think it is.


Your code also has a bug, which would affect the result in some variations of the problem with different parameters, but fortunately does not affect this specific version. You also used 100.

Yes, if you follow the link to problem, that page gives the trivial algebra solution.

The matrix version is interesting, but it's bizarrely motivated as an overcomplicated way to solve a simple problem.


Does OpenAI have a moat?

As a layman outsider, it doesn't seem like it. Anthropic is doing great work (I personally prefer Claude) and now there are so many quality LLMs coming out that I don't know if OpenAI is particularly special anymore. They had a lead at first, but it feels like many others are catching up.

I could be _very_ wrong though.


Agreed. Sonnet 3.5 is still by far the most useful model I've found. o1-mini is priced similar and no where near as useful even if programming which it is suppose to excel. I recently tried o1-mini using `aider` and it would randomly start responding in russian mid way through despite all input being in english. If anything, I think Anthropic still has a decent lead when it comes to price to performance. Their update to Haiku and Opus will be very interesting.

They recently released a new model, called "o1-preview", that is significantly ahead of the competition in terms of mathematical reasoning.

Is it? There was some discussion on HN a while ago that it is better than gpt4o but nothing about the competition and that seems quite doubtful compared to e.g. alphaproof.

Also, if "significantly ahead" just means "a few months ahead" that does not justify the valuation.


On benchmarks where it’s impossible to verify whether there’s contamination?

> that is significantly ahead

Perhaps, but at most generous, it’s three months ahead of competitors I imagine


No.

The race is, can OpenAI innovate on product fast enough to get folks to switch their muscle memory workflows to something new?

It doesn't matter how good the model is, if folks aren't habituated to using it.

At the moment, my muscle memory to go to Claude, since it seems to do better at answering engineering questions.

The competition really is between FAANG and OpenAI, can OpenAI accumulate users faster than Apple, Google, Meta, etc layer in AI-based features onto their existing distribution surfaces.


Hard to say in my opinion. I can say that I still use OpenAI heavily compared to the competition. It really depends though. I do believe they are still leaders in offering compelling apis and solutions.

It still has the first mover advantage, based on the revenue and usage graphs.

If somebody puts a cheaper and better version, then no moat.


It's called llama. And it's free.

Llama sucks man vs the best models sorry you cannot really be serious

I have only tried it with gpt4. Seems to be doing a pretty good job? What models should I try?

Eh, in the b2c play, sure- if they nail the enterprise maybe not.

Aside from vendor lock-in by making their integrations (APIs) as idiosyncratic and multifaceted as possible, I don’t think so.

It's the company that's most likely to be the first to develop superintelligence.

Based on… CEO proclamations?

Based on a comparison with DeepMind and Anthropic.

Define super intelligence first maybe?

More intelligent than any human.

What is intelligence and how does one measure that?

What is measurement? What is a definition?

Well the original claim is super intelligence is going to be achieved by OpenAI. So I assume you have defined it and figured out a way to measure in the first place so that you know it has been achieved.

So probably on you to explain it since you came up with that claim.


I think first you have to explain what you mean with definition and measurement since you were the one to ask for those.

We heard for years that Uber was the company that's most likely to be the first to develop self-driving cars. Until they weren't. You can't just trust what the CEOs are hyping.

I think Waymo was always ahead in terms of self-driving, and still is today.

Uber's autonomous division had more hype around it, and company's evaluation was largely based on the idea of replacing human drivers "very very soon". Now the bulk of their revenue comes from food delivery.

If superintelligence happens, then money won't matter anymore anyway.

I don't disagree, but what makes you say this?

In fact they do, it’s called servers, GPUs, scale. You need them to train new models and to serve them. They also have speed and in AI speed is a non traditional moat. They got crazy connections too because of Sam. All of that together becomes a moat that someone just can’t do a “Facebook clone” on OpenAI

Someone certainly can "Facebook clone" OpenAI. Google, Meta and Apple all are more well capitalized than OpenAI, operate at a larger scale and are actively training and publishing their own models.

Not anyone, it would be tough. You could also say the same that any one of these companies can do a Facebook clone, but it won’t be easy

OpenAI is dependent on Microsoft for GPUs, who are in turn dependent on Nvidia for GPUs. It’s nearly the least moat-y version of this out there.

Used to be when they had no money

Money doesn't just give you hyperscaler datacenters or custom silicon competitive with Nvidia GPUs. Money and 5 years might, but as this shows, OpenAI only really has a 1.5 year runway at the moment, and you can't build a datacenter in that time, let alone perfect running them at scale, same with chip design.

I’m building several commercial projects with LLMs at the moment. 4o mini has been sufficient, and is also super cheap. I don’t need better reasoning at this point, I just need commodification, and so I’ll be using it for each product right up to the point that it gets cheaper to move up the hosting chain a little with Llama, at which point I won’t be giving any money to them.

They’ve built a great product, the price is good, but it’s entirely unclear to me that they’re continue to offer special sauce here compared to the competition.


It does. "ChatGPT", GPT, "OpenAI", etc... are strong brands associated with it.

Edit: You can downvote me all you want, I have plenty of karma to spare. This is OpenAI's strongest moat, whether people like it or not.


GPT is a generic tool name.

Those moats are pretty weak. People use Apple Idioticnaming or MS Copilot or Google whatever, which transparently use some interchangeable model in the background. Compared to chatgpt these might not be as smart, but have much easier access to OS level context.

In other words: Good luck defending this moat against OS manufacturers with dominant market shares.


>Those moats are pretty weak.

Name any other AI company with better brand awareness and that argument could make a little bit of sense.

Armchair analysts have been saying that since ChatGPT came out.

"Anyone could steal the market, anytime" and there's a trillion USD at play, yet no one has, why? Because that's a delusion.


What you are overlooking is the fact that AI today and especially AI in the future is going to be about integrations. Assisted document writing, image generation for creative work, etc etc. Very few people will look at the tiny gray text saying "Powered by ChatGPT" or "Powered by Claude"; name recognition is not as relevant as eg iPhone.

Anecdotally, I used to pay for ChatGPT. Now I run a nice local UI with Llama 3. They lost revenue from me.


> Name any other ~~AI~~ company with better brand awareness and that argument could make a little bit of sense.

I just gave you three of them. Right now a large share of chatgpt customers come from the integration provided by those three.

> "Anyone could steal the market, anytime" and there's a trillion USD at play, yet no one has, why? Because that's a delusion.

Bullshit. It is not about "stealing" but about carving a significant niche. And that has happened: Apple In happens in large parts on device using not-chatgpt, google's circle to search, summaries etc use not-chatgpt, copilot uses not-chatgpt.

The danger to a moat is erosion not invasion.


Nobody cares, though, really. My experience is that clients are only passingly interested in what LLM powers the projects they need and entirely interested in the deployed cost and how well the end product works.

Which provider do you use for your clients' AI solutions? Be honest.

Edit: nvm, I already know.

https://news.ycombinator.com/item?id=36028029

https://news.ycombinator.com/item?id=41725073


It's all part and parcel of a deeply mathematical culture.

"Math as a profession" is a limited subset of "professions that rely heavily on math", despite what some mathematicians might say.


They invented space ships, for one thing.

Russian was a scientific power in the 19th Century before Soviet Union, and continued during the Soviet era. The west had limited access to it, due to the Cold War.

https://en.m.wikipedia.org/wiki/Science_and_technology_in_Ru...


Soviet Union wasn't called a superpower for nothing. USSR had many world class achievements in scientific and applied areas, and some organizational achievements in social and manufacturing areas. There are examples and counterexamples, but the result is what we have, and while at some areas ex-Soviets were seen as backwards people in early 1990-s, in some others they really brought some positive advancements to the West - or First World - when the borders became open.

Case in point: The reason why the US heavily relied on Soviet rocket engines for their launches for ~15 years (before SpaceX dominance) was because they were simply more advanced and cost effective. Material science apparently was a step above - Soviet scientists were able to create an alloy for use in oxygen-rich engines which was unbelievable to Western counterparts till they visited and had it demonstrated.

This is one example, and there could be many - both where USSR had an edge and where it was behind. I believe here we want to have the overall picture - and that picture was that there actually were some novelties which were interesting on the West, even though in overall quality of life and some associated parameters USSR was notably losing. Or, saying it from another end, USSR wasn't advanced enough to avoid dissolution after - not necessarily caused by - the Cold war, even though it had some achievements unavailable on the West.

Yes, that was what I also wanted to point out here. As in, the set of novelties Soviets had over the West was at least non-zero. And that rocketry happened to be one may be surprising to some of those less informed about space technology.

Lol what

6 years ago and 10 years ago, with a few comments: https://hn.algolia.com/?q=Math+from+Three+to+Seven

This book and the culture it come from are so influential, that many people who did "enrichment" have already been exposed to many of the activities in the book. Most famous may be the Scratch JR / code.org introductory computer programming, but with pencil and paper.


Ah good find. Macroexpanded:

Math from Three to Seven: The Story of a Mathematical Circle for Preschoolers [pdf] - https://news.ycombinator.com/item?id=17018583 - May 2018 (20 comments)

Math from Three to Seven: Chapters One and Three [pdf] - https://news.ycombinator.com/item?id=8811334 - Dec 2014 (21 comments)


So it was prestigious, but it wasn't seen as a profitable investment before it made its first dollar.

It wasn't. If you told someone you dropped out of Harvard back then they would think you were making an odd choice. That said, it was never very risky since Harvard will take you back if you drop out, but it was at least unusual.

Is there a tracker or all YC companies and their outcomes?


It's used as an attack, so both.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: