The benchmarks are cool and all but 1M context on an Opus-class model is the real headline here imo. Has anyone actually pushed it to the limit yet? Long context has historically been one of those "works great in the demo" situations.
Boris Cherny, creator of Claude Code, posted about how he used Claude a month ago. He’s got half a dozen Opus sessions on the burners constantly. So yes, I expect it’s unmetered.
Opus 4.5 starts being lazy and stupid at around the 50% context mark in my opinion, which makes me skeptical that this 1M context mode can produce good output. But I'll probably try it out and see
Has a "N million context window" spec ever been meaningful? Very old, very terrible, models "supported" 1M context window, but would lose track after two small paragraphs of context into a conversation (looking at you early Gemini).
Umm, Sonnet 4.5 has a 1m context window option if you are using it through the api, and it works pretty well. I tend not to reach for it much these days because I prefer Opus 4.5 so much that I don't mind the added pain of clearing context, but it's perfectly usable. I'm very excited I'll get this from Opus now too.
If you're getting on along with 4.5, then that suggests you didn't actually need the large context window, for your use. If that's true, what's the clear tell that it's working well? Am I misunderstanding?
Did they solve the "lost in the middle" problem? Proof will be in the pudding, I suppose. But that number alone isn't all that meaningful for many (most?) practical uses. Claude 4.5 often starts reverting bug fixes ~50k tokens back, which isn't a context window length problem.
Things fall apart much sooner than the context window length for all of my use cases (which are more reasoning related). What is a good use case? Do those use cases require strong verification to combat the "lost in the middle" problems?
Living in the EU, I'm skeptical any of this happens. Our leaders have been pretty reluctant to push back on anything so far and most of these assets are private anyway.
Hi Troy, just wanted to let you know that I just sent you an email! :)
Also, just to be sure, I sent it to on-board.ai domain as well, as that seemed like the correct website (onboard.ai just showed "for sale" page). Might help some others too.
Google login also seems to be having issues, multiple people reported to me that the login isn’t working and they’ve been logged out of their Google accounts.
Yes, I tried logging in today in two distinct Google accounts on separate Chrome profiles and it would sign me out in about ~ 5 seconds after logging in. And the login process was very sluggish.
> Tens of thousands of people each year receive a series of shots to prevent rabies after a possible exposure. It normally costs between $1,200 and $6,800. Not in this case.
Most Americans don't realize how bad they have it. They've grown accustomed to being punched and slapped around and so won't rebel. They'll keep giving all of what little and shrinking of what they have to legitimized criminals.
We do have Internet, but we've gotten used to being told that the Internet lies to us. We've been repeatedly told that people wait months for treatment in the UK, and that Canadians are streaming over the border to get health care in America.
We read horror stories like this one, but say "Whew, glad that won't happen to me." We imagine that because of capitalism, if our insurance company screws us over, we'll change to the next one -- freedom we wouldn't have if we had a national health care.
It never seems to occur to us that all of the private insurers have a capitalism-driven goal of maximizing profits, and national insurers don't.
> Stable Diffusion? Well, that’s free-ish. Problem is, you’ll probably want to run it locally, which requires a really, really beefy graphics card. I was struggling to run it on a Vega56 - a GPU that goes for ~$150 used now - so I went out and got a RTX3090 for about $1,000. If you’re already a gamer with a GPU with 8Gb+ of VRAM you’re probably good, but for most people this is a bit absurd.
I agree with the problem, on platforms like AWS you'll even need to send a manual request so they would let you use instances which can run SD. On the other hand there's already something like replicate.com, which allows you to run the SD like an API. I hope there will be more services like this.
I'm a bit surprised they don't mention Snap anywhere on the site except for a link in footer (something like "by Snap"). Generally I would expect much better experience from a single page website promoting one product.
reply