And performs very well on the latest 100 puzzles too, so isn't just learning the data set (unless I guess they routinely index this repo).
I wonder how well AIs would do at bracket city. I tried gemini on it and was underwhelmed. It made a lot of terrible connections and often bled data from one level into the next.
I mean, the repo has <200 stars, it's not like it's so mainstream that you'd expect LLM makers to be watching it actively. If they wanted to game it, they could more easily do that in RL with synthetic data anyway.
Belated update on this. Gemini reasoning did much better than quick on bracket city today (an easy puzzle but still). It only failed to solve one clue outright, got another wrong but due to ambiguity in the expression referenced and in a way that still fit the next level down making the final answer fairly cleanly solved. Still clearly has a harder time with it than the connections puzzle.
Talk to federal contractors about their experience with DOGE. They’re literally cancelling contracts and helping to their friends recapture the money as new federal contracts since the money needs to be spent anyway.
The congressional hearings over this are going to be excellent CSPAN viewing.
I think that it is a nice observation. Some people complain that explaining the formation of a rainbow scientifically makes it lose its "aweness" but I think it even deepens it.
Actually, property 5) trivially implies 1) but also 2), as `(1+2+...+n)² = n²(n+1)²/4` and either n or n+1 must be divisible by 2 hence one of the squares divisible by 4 hence it is a product of squares. But also property 4) as `(1+2+...+n)² = 1³+2³+...+n³` (easy to show by induction).
Great to know that someone else too keeps track of squares.
At the ages of perfect squares is when we all cross or achieve significant milestones in our lives as children, students, (young)adults, spouses, parents, grandparents, senior citizens of society and so on.
This year being a perfect square, I wish that it will be as much or more special as it was for everyone at those ages.
My youngest is fascinated by squares at the moment. Luckily for him, he is 4 years old, his older brother is 9, while I just turned 36. He will be delighted when I tell him that we are entering 45 squared!
Well things have already been a tad rough around this square, so if we follow the trend, the next square might turn bad even sooner. So maybe around, I dunno, 2101?
1225: ten years earlier, Magna Carta starting to limit monarchs and the seed of individual freedom
1681: eight years later was glorious revolution with a bill of rights, marking individual freedoms
1764: ten years later, beginning of American Revolution and being free of monarchs
1849: ten years-ish later, start of US civil war; was the time of an attempt by the British to end slavery around the world
1936: ten years later, colonial empires were being dismantled, UN established to attempt global cooperation, US in the ascendancy with a seed of ties being established more by economics than military force, great economic upswing lifting people out of poverty (60% in poverty then, 10% now) while the global population blossoms
2035: Majority of the global population in middle class or better, triumph of individuals over technocrats, bureaucrats, and corporatists :)
Even if they don’t (or if it’s partially networked as some recent rumors suggest), it’ll be rolled into one or both of two predictable costs (to the consumer):
1. The device sale itself, either raising the ASP or offsetting some other cost (to Apple) savings
2. Recurring payments for iCloud (or any rebranding it might undergo along with the feature)
Apple’s pricing model, if not totally predictable, is exceedingly formulaic. If they deviate from these into some sort of nickel and diming on “AI” features alone, that would almost certainly be a clear sign that they’re betting against it as a long term selling point.
This indeed seems to have been a heavy focus of their research team in the past year, eg. "Efficient Large Language Model Inference with Limited Memory" [1] and OpenELM [2]
Maybe a very cut down version - any of the more recent and capable OpenAI models are surely far too large to put on an iPhone, and far too large to run (in terms of both memory available, and processing power).
This would maybe align with the 'limited abilities are free' approach.
In the following, Opus bombed hard by ignoring the "when" component, replying with "MemoryStream"; where ChatGPT (I think correctly) said "no":
> In C#, is there some kind of class in the standard library which implements Stream but which lets me precisely control when and what the Read call returns?
---
In the following, Opus bombed hard by inventing `Task.WaitUntilCanceled`, which simply doesn't exist; ChatGPT said "no", which actually isn't true (I could `.ContinueWith` to set a `TaskCancelationSource`, or there's probably a way to do it with an await in a try-catch and a subsequent check for the task's status) but does at least immediately make me think about how to do it rather than going through a loop of trying a wrong answer.
> In C#, can I wait for a Task to become cancelled?
---
In the following exchange, Opus and ChatGPT both bombed (the correct answer turns out to be "this is undefined behaviour under the POSIX standard, and .NET guarantees nothing under those conditions"), but Opus got into a terrible mess whereas ChatGPT did not:
> In .NET, what happens when you read from stdin from a process which has its stdin closed? For example, when it was started with { ./bin/Debug/net7.0/app; } <&-
(both engines reply "the call immediately returns with EOF" or similar)
> I am observing instead the call to Console.Read() hangs. Riddle me that!
ChatGPT replies with basically "I can't explain this" and gives a list of common I/O problems related to file handles; Opus replies with word salad and recommends checking whether stdin has been redirected (which is simply a bad answer: that check has all the false positives in the world).
---
> In Neovim, how might I be able to detect whether the user has opened Neovim by invoking Ctrl+X Ctrl+E from the terminal? Normally I have CHADtree open automatically in Neovim, but when the user has just invoked $EDITOR to edit a command line, I don't want that.
Claude invents `if v:progname != '-e'`; ChatGPT (I think correctly) says "you can't do that, try setting env vars in your shell to detect this condition instead"
Now I come to think of it, maybe the problem is that I only ask these engines questions whose answers are "what you ask is impossible", and ChatGPT copes well with that condition but Opus does not.
It's literally what I did at work last week, which is why I found this submission timely. I'd have to check with my employer if it can be made public. I don't see any reason why not, there's not much to it.
What did you use to implement the regularization of the trend breakpoints? Prophet by default uses a regular grid and thins them out with STAN. I couldn't find a quick regularization replacement in numpy/scipy/statsmodels with equivalent performance. (I don't want to drag in another huge dependency with Torch or TF).
reply