I felt this way until 3.7 and then 2.5 came out, and O3 now too. Those models are clear step-ups from the models of mid-late 2024 when all the talk of stalling was coming out.
None of this includes hardware optimizations either, which lags software advances by years.
We need 2-3 years of plateauing to really say intelligence growth is exhausted, we have just been so inundated with rapid advance that small gaps seem like the party ending.
I think that the tech exists for more interactive AI generation, it's just gonna take time to implement.
I foresee something like your standard production software with heavy AI integration, where you prompt it to make the song you want, but it is made fully step by step in the production environment. You can then manually tweak it or ask the AI to fine tune whatever parameter or slice you want.
Kinda like sitting over the shoulder of someone who knows what they are doing, and working collaboratively with them to accomplish the idea you have in your head. Meanwhile you have practically no idea what all those buttons/lines/glowly bits/sliders do.
We have found out, it's just that the people who do the finding out generally have money, so their opinion is automatically discounted.
It's a bit like forever single people getting so lost in the ideas of a relationship, intimacy. That everything will be great once they have someone, once they have connection, that life will be amazing and nothing else will matter. Their life sucks because they don't have a relationship. People in relationships don't know what it's like and their opinion is invalid.
Then they get in a relationship and learn that it's actually comparatively banal and requires a lot of work and compromise, and definitely was not the insanely-built-up-over-many-years-magical-life-cure-all.
There are -endless- stories of people who made it rich early on, retired, and ended up in a mental health crisis despite having everything. That fact should be taken as a reality check to calibrate your own perceptions.
I have no data either way but I can imagine that there are many more people who are wealthy and quietly having a great time with it. Most of the retired people I've known, early or not, also enjoyed it. Some have definitely taken up work-like pursuits on their own terms.
Secondly the wealth being the means to achieve this is itself a confounding variable. I don't think it's good for your mind or soul to "have everything," no. Life isn't and shouldn't be merely a series of your own preferences. That doesn't indicate to me that lacking confidence in your mere survival is necessary for human thriving. As far as I know research indicates the opposite.
I don't understand why people are so incredulous that virtue signalling is rampant and even the "good guys" (whatever group you want to attribute that to) is mostly full of people who know the right thing to say, will gladly say it repeatedly to garner praise, but will not follow it when it comes to them.
Me, or anyone else who has tried a "virtuous venture", could have easily told this company not to waste their time. The take away here isn't "they screwed this up" or "This isn't a true test". The takeaway is "People are extremely self serving when the perceived impact is small and no one is there to judge them for it." Plan your business accordingly.
While not directly indicated in this article, I won't conclude that the experiment is useless. Presenting the option educates the consumer of what prices are like under tariffs.
Yeah, people are gawkimg at the examples and don't realize that yes, these are the legitimate costs for trying to move local immediately. You don't just "catch up" to the decades of investment China spent on manufacturing. And that catch up will be expensive.
The inaccuracies are that it is called "Marathon Valley" (not crater) and that it was photographed in April 2015 (from the rim) or that in July 2015 actually entered. The other stuff is correct.
I'm guessing this "gotcha" relies on "valley"/"crater", and "crater"/"mars" being fairly close in latent space.
ETA: Marathon Valley also exists on the rim of Endeavour crater. Just to make it even more confusing.
None of it is correct because it was not asked about Marathon Valley, it was asked about Marathon Crater, a thing that does not exist, and it is claiming that it exists and making up facts about it.
Or it's assuming you are asking about Marathon Valley, which is very reasonable given the context.
Ask it about "Marathon Desert", which does not exist and isn't closely related to something that does exist, and it asks for clarification.
I'm not here to say LLMs are oracles of knowledge, but I think the need to carefully craft specific "gotcha" questions in order to generate wrong answers is a pretty compelling case in the opposite direction. Like the childhood joke of "Whats up?"..."No, you dummy! The sky is!"
Straightforward questions with straight wrong answers are far more interesting. I don't many people ask LLMs trick questions all day.
If someone asked me or my kid "What do you know about Mt. Olampus." we wouldn't reply: "Oh, Mt. Olampus is a big mountain in greek myth...". We'd say "Wait, did you mean Mt. Olympus?"
It doesn't "assume" anything, because it can't assume, that's now the machine works.
> None of it is correct because it was not asked about Marathon Valley, it was asked about Marathon Crater, a thing that does not exist, and it is claiming that it exists and making up facts about it.
The Marathon Valley _is_ part of a massive impact crater.
If you asked me for all the details of a Honda Civic and I gave you details about a Honda Odyssey you would not say I was correct in any way. You would say I was wrong.
Anyone who uses instagram should be abundantly aware of this. The default behavior of the app became "Serve you all content we think you would like, in the order we think you would enjoy it". This pretty much means "You may or may not see the content of channels/people you specifically follow".
The app went from just showing you a stream of posts from people you follow, to just showing you a stream of posts it thinks you would like.
What is worse is that the feed is generated on the fly. Switch apps for a second and your os kills instagram in the background, and you might not ever find those posts it showed you a few minutes ago ever again.
I have the opposite problem. Every time Instagram starts in the background (allegedly to check for feed updates but probably to get my geolocation) it uses so much memory it pushes out things like my on-screen keyboard. No doubt Meta has figured out ways to manipulate Android to get priority over the keyboard, and only tested it on the very latest phones.
I use it exclusively for announcements from certain brands with e.g. seasonal rotations or sales (small shops, especially, are often way more consistent about updating one or more social media accounts, often Insta, than their website, if they even have a website) and it's such a pain in the ass for that reason. I don't trust ads or their "algorithm" to promote quality (I reckon they're more likely to promote rip-offs and fly-by-night operations) so I super don't care about anything else they want to show me, even if it's directly related to the kinds of brands I'm following. I deliberately do not do new-stuff discovery in the app, because they have incentives to screw me.
The only thing I want out of it is to see the posts made by the accounts I'm following, since the last time I checked. That's 100% of the functionality I care about, and the app goes out of its way to not deliver it.
I'm a classic engineer, so lots of experience with systems and breaking down problems, but probably <150 hours programming experience over 15 years. I know how computers work and "think", but I an awful at communicating with them. Anytime I have needed to program something I gotta crash course the language for a few days.
Having LLMs like 2.5 now are total game changers. I can basically flow chart a program and have Gemini manifest it. I can break up the program into modules and keep spinning up new instances when context gets too full.
The program I am currently working on is up to ~5500 LOC, probably across 10ish 2.5 instances. It's basically an inventory and BOM management program that takes in bloated excel BOMs and inventory, and puts it in an SQLite database, and has a nice GUI. Absolutely insane how much faster SQLite is for databases than excel, lol.
I've heard a _lot_ of stories like this. What I haven't heard is stories about the deployment of said applications and the ability of the human-side author to maintain the application. I guess that's because we're in early days for LLM coding, or the people who did this aren't talking (about their presumed failures... people tend to talk about successes publicly, not the failures).
At my day job I have 3 programs written by LLM used in production. One written by GPT-4 (in spring 2023) and recently upgraded by gemini 2.5, and the other two by Claude 3.7
One is a automatic electronics test system that runs tests and collects measurements (50k+ readings across 8-12 channels)(GPT-4, now with a GUI and faster DB thanks to 2.5). One is a QC tool to help quickly make QC reports in our companies standard form (3.7). And the last is a GUI CAD tool for rendering and quickly working through ancient manufacturing automation scripts from the 80's/90's to bring them up to compatibility with modern automation tooling (3.7).
I personally think that there is a large gap between what programs are, and how each end user ultimately uses them. The programs are made with a vast scope, but often used narrowly by individuals. The proprietary CAD program that we were going to use originally for the old files was something like $12k/yr for a license. And it is a very powerful software package. But we just needed to do one relatively simple thing. So rather than buy the entire buffet, buy the entire restaurant, Claude was able to just make simple burger.
Would I put my name on these and sell to other companies? No. Am I confident other LLM junkies could generate similar strongly positive outcomes with bespoke narrow scope programs? Absolutely.
People who can't spin pottery shouldn't be allowed to have bowls, especially mass produced by machine ones.
I understand your point, but I think it is ultimately rooted in a romantic view of the world, rather than the practical truth we live in. We all live a life completely inundated with things we have no expertise in, available to us at almost trivial cost. In fact it is so prevalent that just about everyone takes it for granted.
The best way to combat this now is to probably not talk about it. Just like vulnerabilities in white hat scenarios, let the developers know and then have a lead time before releasing the information publicly.
Ironically this study comes from two safety oriented organizations, so I question their reasoning for running to make it public that you can use SOTA models right now to do the knowledge legwork for creating deadly bioweapons.
reply