Hacker Newsnew | past | comments | ask | show | jobs | submit | grantpitt's commentslogin

Sorry, my mistake. Thanks for the reminder. I don't think I can change it after 2 hrs.

I'm curious, why did you make this change? The article isn't about Oregon. Plus, the claim itself is pretty disingenuous since it is including covid data, vastly exaggerating the effect[0]

[0] https://news.ycombinator.com/item?id=46919890


I quoted from the middle of the article what I found interesting. I don't submit on here much and I just put that in the title field without thinking. I'm glad that comment provides proper context on this.

Yeah, the article links the source where every state can be viewed: https://edunomicslab.org/roi-over-time/

do say more


Makes it sound like a one trick pony


Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here.


I am eagerly awaiting swe-rebench results for November with all the new models: https://swe-rebench.com/


well, it's a big trick


Any application that can run in the browser, will eventually run in the browser.


Huh, can you share a link? I tried here: https://gemini.google.com/share/e753745dfc5d



Maybe somewhere in the original comment it would have been fair to mention you can barely see the house in the original photo. This is actually a hilarious complaint


Maybe. But this is not an edge case. I consider this genuine use of the marketed tool.


That cannot be a valid excuse. Other than adding extra windows to the clearly visible wall, it's obvious that model perfectly capable to "see" the house. It just cannot "believe" that there can be a big empty wall on a garden house.



Agreed, it also leads performance on arc-agi-1. Here's the leaderboard where you can toggle between arc-agi-1 and 2: https://arcprize.org/leaderboard


It leads on arc-agi-1 with Gemini 3.0 Deep Think, which uses "tool calls" according to google's post, whereas regular Gemini 3.0 Pro doesn't use "tool calls" for the same benchmark. I am unsure how significant this difference is.


Very interesting to hear two technologists at a tech business conference say things along the lines of: "our tools do not merely extend us, they transform us", followed up with "we've become numb to the devastating consequences of technology".

(I know I'm somewhat selectively reading but still)


Interesting because games are exactly the kinds of RL environments that models can effectively learn - but the catch is that they must do this learning on the fly in test-time. Very exciting to see this.


Right, like in math, not all infinite sequences contain every finite subsequence. For example, a non-repeating sequence of 2's and 7's contains no sequence "4". The further condition is that the number be normal[1].

Also, TIL we don't know whether π is normal thus the popular claim that "every string of numbers eventually occurs in π" is not known to be true

[1] https://en.wikipedia.org/wiki/Normal_number


Nice post. With respect to maximizing future options, I find the ideas expressed in the following quotes are interesting counter-points.

From '4,000 weeks': "Not only should you settle; ideally you should settle in a way that makes it harder to back out, such as moving in together, or having a child. The irony of all our efforts to avoid facing finitude -- to carry on believing that it might be possible not to choose between mutually exclusive options -- is that when people finally do choose, in a relatively irreversible way, they're usually much happier as a result."

From 'Zero to One': "When people lack concrete plans to carry out, they use formal rules to assemble a portfolio of various options. ... A definite view, by contrast, favors firm convictions. Instead of pursuing many-sided mediocrity and calling it "well-roundedness," a definite person determines the one best thing to do and then does it."


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: