or, hear me out, organic machines have conscious experience because existence itself is divine. Humans don't have a special soul separate from the universe, they have a soul because they are the universe: materialism.
That's (a) a different argument than the competition is "too risk averse", (b) subjective, and (c) arguably the result of a number of flywheel effects. That is, Bing's ability to compete is hampered by the fact that Google already has an overwhelming majority of search traffic from which to learn and improve.
For example, from the second filing I linked to:
> After search began appearing on phones, Google started logging information about user location, swipes, and other user-related movements. PFOF ¶¶ 1003–1004. This data is now vital to every aspect of search, including figuring out where and when to crawl specific websites, how to index the information retrieved from that crawl, what documents to retrieve from the index in response to a user query, and how to rank the retrieved items. Some elements of Google’s search engine are trained on 13 months of data—a volume that would take Bing over 17 years to accumulate.
This post is a very weak and incoherent criticism of a well formulated benchmark: task length bucket for which a model succeeds 50% of the time.
Gary says:
- This is just the task length that the models were able to solve in THIS dataset. What about other tasks?
Yeah, obviously. The point is that models are improving on these tasks in a predicable fashion. If you care about software, you should care how good ai is at software.
- Gary says: Task length is a bad metric. What about a bunch of other factors of difficulty which might not factor into task length?
Task length is a pretty good proxy for difficulty, that's why people grade a bug in days. Of course many factors contribute to this estimate, but averaged over many tasks, time is a great metric for difficulty.
Finally, Gary just ignores that despite his perspective that the metric makes no sense and is meaningless, it has extremely strong predictive value. This should give you pause - how can an arbitrary metric with no connection to the true difficulty of a task, with no real way of comparing its validity of measuring difficulty across tasks or across task-takers, result in such a retrospectively smooth curve, and so closely predict the recent data points from sonnet and o3? something IS going on there, which cannot fit into Gary's ~spin~ narrative that nothing ever happens.
In principle, Math proofs are another relatively easy to verify problem. In the extreme case, you can express any math proof as a computer-verifiable formalism — no intelligence necessary. Step back one step, and you could have a relatively weak model translate a proof into verifiable formalism and then use a tool call to run the verification. Coming up with the proof is an expensive search process, while verifying it is more mechanical. Even if it is not completely trivial to make the proof computer-verifiable, it might still be a vastly easier task compared to finding the proof in the first place.
Definitely a good question. Using an actual LLM as the execution layer allows us to more easily swap to the planner agent in the case that the test needs to be adapted. We don’t want to store just a selector based test because it’s difficult to determine when it requires adaptation, and is inherently more brittle to subtle UI changes. We think using a tiny model like Moondream makes this cheap enough that these benefits outweigh an approach where we cache actual playwright code.
100 years optimistically!? That's an incredibly pessimistic timeline, maybe one of the most hardline "nothing ever happens" outlooks I've ever heard articulated.
It's also particularly awkward to land on, as it has just enough atmosphere to be annoying, but not enough to be particularly helpful. Most Mars landings have involved some sort of ridiculous Rube Goldberg machine or other (see https://en.wikipedia.org/wiki/Sky_crane_(landing_system) , https://en.wikipedia.org/wiki/Mars_Pathfinder#Entry,_descent... ) which would not be viable for humans (and were only arguably viable for the probes they were used for; the risk of failure was high).
Given that no one has yet traveled to Mars, “faced a far more dangerous journey” seems a ridiculously hyperbolic statement. (Thinking even about the lost colony at Roanoke.)
Colonizing Mars isn't a problem. Colonizing Mars is a goal. Making that happen requires addressing a ridiculous number of problems and sub-problems.
If history teaches us anything, the biggest problem is supply chains - and supply chains have been so difficult to get right that they've led to countless famines, lost wars, failed businesses and economic crises. And those have all been supply chains here on Earth, mostly between fixed locations at fixed distances with relatively few environmental hazards and risks compared to space travel.
If we want to create a sustainable multi-planetary future, we need to solve this incrementally. Colonizing the moon would be a logical stopgap. But as it stands now we haven't even established a presence on the moon - let alone a permanent one. The only presence we have off-planet is the ISS and that one's still in Low Earth Orbit, no different from regular communication satellites, so that only qualifies as "off-planet" by not being on the surface of the planet.
Remember that we can't just scale up space travel indepently either. Even if SpaceX figures out how to do space launches every other day, that still requires a supply chain for fuel, parts, refinement, resource extraction, etc, all of which also needs to be scaled up accordingly. And that's just for launching stuff into space, which so far has mostly meant LEO.
Eh, it's a reasonable prior. The timeline is "it will never happen" until the leap forward happens that makes it "within 2 years." Basically the same as air flight.
You can't know when the leap will happen so it's basically picking a year that seems far enough off to be pretty darn sure.
Keeping them alive and returning them doesn't require "a leap" which is the central point of OP I am disagreeing with. We have all the technology, material science etc to do it.
Sure, it requires some research, engineering and a crapload of investment, but it doesn't require anything that is currently "science fiction".
reply