Second, with that out the way, these cars are not playing the same game as horses… first, and quite obviously they have massive amounts of horsepower, which is kind of like giving a team of horses… many more horses. But also cars have an absolutely massive fuel capacity. Petrol is such an efficient store of chemical energy compared to hay and cars can store gallons of it.
I think if you give my horse the ability of 300 horses and fed it pure gasoline, I would be kind of embarrassed if it wasn’t able to win a horse race.
Yeah man, and it would be wild to publish an article titled "Ford Mustang and Honda Civic win gold in the 100 meter dash at the Olympics" if what happened was the companies drove their cars 100 meters and tweeted that they did it faster than the Olympians had run.
Actually that's too generous, because the humans are given a time limit in ICPC, and there's no clear mapping to say how the LLM's compute should be limited to make a comparison.
It IS an interesting result to see how models can do on these tests - and it's also a garbage headline.
> what happened was the companies drove their cars 100 meters and tweeted that they did it faster than the Olympians had run
That would be indeed an interesting race around the time cars were invented. Today that would be silly, since everyone knows what cars are capable of, but back then one can imagine a lot more skepticism.
Just as there is a ton of skepticism today of what LLMs can achieve. A competition like this clearly demonstrates where the tech is, and what is possible.
> there's no clear mapping to say how the LLM's compute should be limited to make a comparison
There is a very clear mapping of course. You give the same wall clock time to the computer you gave to the humans.
Because what it is showing is that the computer can do the same thing a human can under the same conditions. With your analogy here they are showing that there is such a thing as a car and it can travel 100 meters.
Once it is a foregone conclusion that an LLM can solve the ICPC problems and that question has been sufficiently driven home to everyone who cares we can ask further ones. Like “how much faster can it solve the problems compared to the best humans” or “how much energy it consumes while solving them”? It sounds like you went beyond the first question and already asking these follow up questions.
You're right, they did limit to 5 hours and, I think, 3 models, which seems analogous at least.
Not enough to say they "won gold". Just say what actually happened! The tweets themselves do, but then we have this clickbait headline here on HN somehow that says they "won gold at ICPC".
Agreed. The linked messaging is much more clear: "achieved gold-medal level performance". This clearly separates them from competing against humans, which they didn't do, because their constraints are very different. The "AI wins gold at ICPC" line really does seem designed to rile people up.
That's a very interesting question. When comparing wildly different computing machines, how to make a fair comparison?
At least two criteria comes in mind: the volume and the energy consumption.
Indeed we can safely assume that more volume and more energy leads to more computation power. For example, it is not fair to compare a 10m^3 room filled with computers with 10cm^3 computer. The same goes with the number of kilowhat-hours used.
Thinking further on those two criteria for GPUs and humans, we could also consider the access to energy and volume. First, energy access for machines has dramatically increased since the industrial revolution. Second, volume access for machines has also increased since the beginning of the mass production. In particular, creating one cube meter of new GPUs is faster than giving birth to a new human.
tldr: fair comparison of two machines should take into account their volume and their energy consumption. On the other hand, this might be mitigated by how fast a machine can increase its volume, and what is its bandwidth for energy consumption.
Cars going faster than humans or horses isn't very interesting these days, but it was 100+ years ago when cars were first coming on the scene.
We are at that point now with AI, so a more fitting headline analogy would be "In a world first, automobile finishes with gold-winning time in horse race".
Headlines like those were a sign that cars would eventually replace horses in most use-cases, so the fact that we could be in the the same place now with AI and humans is a big deal.
It was more than interesting 100+ years ago -- it was the subject of wildly inconsistent, often fear-based (or incumbent-industry-based) regulation.
A vetoed 1896 Pennsylvania law would have required drivers who encountered livestock to "disassemble the automobile" and "conceal the various components out of sight, behind nearby bushes until [the] equestrian or livestock is sufficiently pacified". The Locomotive on Highways Act of 1865 required early motorized vehicles to be preceded by a person on foot waving a red flag or carrying a red lantern and blowing a horn.
It might not quite look like that today, but wild-eyed, fear-based regulation as AI use grows is a real possibility. And at least some of it will likely seem just as silly in hindsight.
For more than thirty years, the speed limit for cars in Britain was 4mph - a self-propelled vehicle travelling faster than walking pace was obviously unconscionably dangerous.
To celebrate the raising of the speed limit to a daring 12mph, a group of motorists organised a drive from London to Brighton. At the time, driving 54 miles in a single day was seen as an audacious feat and few people imagined that such a great distance could be travelled in such complicated and newfangled contraptions without mechanical incident.
For decades, the car was seen as a plaything for the wealthy that served no practical purpose. The car only became an important mode of transportation after very many false starts and against strong opposition.
... in opposition to the car makers who want to turn everything into highways and parking lots, who really want all forms of human walking to be replaced by automobiles.
"They really cant run like a human," they say, "a human can traverse a city in complete silence, needing minimal walking room. Left unchecked, the transitions to cars would ruin our city. So lets be prudent when it comes to adopting this technology."
"I'll have none of that. Cars move faster than humans so that means they're better. We should do everything in our power to transition to this obviously superior technology. I mean, a car beat a human at the 100m sprint so bipedal mobility is obviously obsolete," the car maker replied.
I think your analogy is interesting but it falls apart because “moving fast” is not something we consider uniquely human, but “solving hard abstract problems” is
This metaphor drops some pretty key definitional context. If the common belief prior to this race was that cars could not beat horses, maybe someday but not today, then the article is completely reasonable, even warranted.
The point is that up until now, humans were the best at these competitions, just like horses were the best at racing up until cars came around.
The other commenter is pointing out how ridiculous it would be for someone to downplay the performance of cars because they did it differently from horses. It doesn't matter if they did it using different methods, that fact that the final outcome was better had world-changing ramifications.
The same applies here. Downplaying AI because it has different strengths or plays by different rules is foolish, because that doesn't matter in the real world. People will choose the option that that leads to the better/faster/cheaper outcome, and that option is quickly becoming AI instead of humans - just like cars quickly became the preferred option over horses. And that is crazy to think about.
I feel the main difference is cars can't compress time in the way an array of computers can. I could win this competition with an infinitely parallel array of random characters typed by infinite monkeys on infinite typewriters instantly since one of them would be perfectly right given infinite submissions. When I make my tweet I would pick a single monkey cus I need infinite money to feed my infinite workforce and that's more impressive clearly.
Now obviously it's more impressive as they don't have infinite compute and had finite time but the car only has one entry in each race unless we start getting into some anime ass shit with divergent timelines and one of the cars (and some lesser amount of horses) finishing instantly.
To your last point we don't know that this was cheaper since they don't disclose the cost. I would blindly guess a mechanical turk for the same cost would outperform at least today.
Considering that OpenAI's model got a higher score than any of the world's best collegiate programming teams, I'd guess that a mechanical turk would not do better (even if you gave them quite a bit of time).
Yeah I think the only thing OP was passing judgement on is on the competition aspect of it, not the actual achievement of any human or non human participant
That’s how I read it at least - exactly how you put it
I think you missed that the whole point of this race was:
"did we build a vehicle faster than a horse, yes/no?"
Which matters a lot when horses are the fastest land vehicle available. (We're so used to thinking of horses as a quaint and slow mean of transport that maybe we don't realize that for millennia they've been the fastest possible way to get from one place to another.)
I was struck how the argument is also isomorphic to how we talked about computers and chess. We're at the stage where we are arguing the computer isn't _really_ understanding chess, though. It's just doing huge amounts of dumb computation with huge amounts of opening book and end tables and no real understanding, strategy or sense of whats going on.
Even though all the criticism were, in a sense, valid, in the end none of it amounted to a serious challenge to getting good at the task at hand.
This response is good but the more general problem is that people are in "It doesn't look like anything to me" mode like Westworld robots seeing advanced technology. If there's a way to snap people out of that, I've never seen it.
Snark aside, I would expect a car partaking in a horse race to beat all of the horses. Not because it's a better horse, but because it's something else altogether.
Ergo, it's impressive with nuance. As the other commenter said.
There's a difference. How much money went into training the computer here Vs the human? If you want to prove that a computer can, at extreme cost and effort, beat a human - sure, it's possible.
But you can also conclude that putting a lot of money and effort pays off. It's more like comparing a horse to a Ferrari that had millions of development costs, has a team of engineers maintaining it, isn't reusable, and just about beats Chestnut. It's a long way until the utility of both is matched.
Comparing power with reasoning does not make any sense at all.
Humans have surpassed their own strength since the invention of the lever thousands of years ago. Since then, it has been a matter of finding power sources millions of times greater such as nuclear energy
The massive amounts of compute power is not the major issue. The major issue is unlimited amount of reference material.
If a human can look up similar previous problems just as the "AI" can, it is a huge advantage.
Syzygy tables in chess engines are a similar issue. They allow perfect play, and there is no reason why a computer gets them and a human does not (if you compare humans against chess engines). Humans have always worked with reference material for serious work.
Humans are allowed to look up and learn from as many previous problems as they want before the competition. The AI is also trained on many previous problems before the competition. What's the difference?
Deleted, because the "AI" geniuses and power users pointed out that Tao does not have a point. You can get this one to -4 as well, since that seems to be the primary pleasure for "AI" one armed bandit users.
It doesn't say anywhere that Gemini used any of those things at ICPC, or that it used more real-world time than the humans.
Also, who cares? It's a self contained non-human system that could solve an ICPC problem it hasn't seen before on its own, which hasn't been achieved before.
If there was a savant human contestant with photographic memory who could remember every previous ICPC problem verbatim and can think really fast you wouldn't say they're cheating, just that they're really smart. Same here.
If there was a man behind the curtain that was somehow making this not an AI achievement then you would have a point, but there isn't.
I think "hasn't seen before" is a bit of an overstatement. Sure, the problem is new in the literal sense that it does exist verbatim elsewhere, but arguably, any competition problem is hardly novel: they are all some permutation of problems that exist and have been solved before: pathfinding, optimization, etc. I don't think anyone is pretending to break new scientific ground in 5 hours.
It's not new scientific ground but a machine beating a challenging computer science problem unassisted is a big deal. If they can do that then there are a lot of other challenging things they can do.
Like what exactly? As far as I can tell, the drug discovery is fizzling out, so it's not talked about much. Toxicity, for one, is a big problem, and the AI is not going to tell you whether the new drug it just concocted is suitable for humans or not.
Small model solves an easy problem; big model solves a challenging problem. I wouldn't call those problems; they are more like invented puzzles. Perfect match for the AI marketing department to "solve".
Humans have been shown to solve problems and discover things that "no human has" up to that point. So I wouldn't even call it "superintelligent". But it would definitely be truly useful at that point!
The marketing speak of some companies goes into effectively fantastical territory. Some claims that are made effectively imply P = NP, which is like bad TV-series-level science fiction, but some people fall for it.
Someone tried this, I saw it one of the Reddit AI subs. They were training a local model on whatever they could find that was written before $cutoffDate.
I think this is a meta-allusion to the theory that human consciousness developed recently, i.e. that people who lived before [written] language did not have language because they actually did not think. It's a potentially useful thought experiment, because we've all grown up not only knowing highly performant languages, but also knowing how to read / write.
However, primitive languages were... primitive. Where they primitive because people didn't know / understand the nuances their languages lacked? Or, were those things that simply didn't get communicated (effectively)?
Of course, spoken language predates writings which is part of the point. We know an individual can have a "conscious" conception of an idea if they communicate it, but that consciousness was limited to the individual. Once we have written language, we can perceive a level of communal consciousness of certain ideas. You could say that the community itself had a level of shared-consciousness.
With GPTs regurgitating digestible writings, we've come full circle in terms of proving consciousness, and some are wondering... "Gee, this communicated the idea expertly, with nuance and clarity.... but is the machine actually conscious? Does it think undependably of the world, or is it merely a kaledascopic reflection of its inputs? Is consciousness real, or an illusion of complexity?"
I’m not sure why it’s so mind-boggling that people in the year 1225 (Thomas Aquinas) or 1756 (Mozart) were just as creative and intelligent as they themselves are, as modern people. They simply had different opportunities then comparable to now. And what some of them did with those opportunities are beyond anything a “modern” person can imagine doing in those same circumstances. _A lot_ of free time over winter in the 1200s for certain people. Not nearly as many distractions either.
Saying early humans weren’t conscious because they lacked complex language is like saying they couldn’t see blue because they didn’t have a word for it.
Well, Oscar Wilde argues in “The Decay of Lying” that there were no stars before an artist could describe them and draw people’s attention to the night sky.
The basic assumption he attacks is that “there is a world we discover” vs “there is a world we create”.
It is hard paradigm shift, but there is certainly reality in “shared picture of the world” and convincing people of a new point of view has real implications in how the world appears in our minds for us and what we consider “reality”
It should be almost obligatory to always state which definition of consciousness one is talking about whenever they talk about consiousness, because I for example don't see what language has to do with our ability to experience qualia for example.
Is it self awarness? There are animals that can recognize themselves in mirror, I don't think all of them have a form of proto-language.
I keep seeing might and magic related content, despite never having played it, or even having heard of it until recently! But in the last few months I have been getting the odd YouTube recommendation, or see the occasional Reddit (and now HN) thread.
Possibly this is a game you will love playing and should check it out. Whether by emulating an ancient DOS machine or by picking up one of the eleven games in the series available on Steam. (https://store.steampowered.com/sale/might-magic/)
If it is the latter case then I am sure some enthusiastic fans of this series will reply to this comment or yours with detailed opinions on which option is the best :)
“Fill less than 1% of its space” becomes a very counter intuitive statement in any case when discussing high dimensions. If you consider a unit n-sphere bounded by a unit cube, the fraction occupied by the sphere vanishes for high n. (Aside: Strangely, the relationship is non monotonic and is actually maximal for n=6). For n=100 the volume of the unit 100-sphere is around 10^-40 (and you certainly cannot fit a second sphere in this cube…) so its not surprising that the gains to be made in improving packing can be so large.
> (Aside: Strangely, the relationship is non monotonic and is actually maximal for n=6)
For this aside I crave a citation.
When n=1 the sphere fit is 100% as both simplex and sphere are congruent in that dimension. And dismissing n=0 as degenerate (fit is undefined there I suppose: dividing by zero measure and all that) that (first) dimension should be maximal with a steady decline thereafter thus also monotonic.
This looks to have been a conflation by the GP between the volume of the unit sphere itself and its ratio to the volume of its bounding cube (which is not the unit cube.) The volume of the sphere does top out at an unintuitive dimension, but indeed the ratio of the two is always decreasing - and intuitively, each additional dimension just adds more space between the corners of the cube and the face of the sphere.
You don't need to involve the hypercube at all. You can just look at the volume of a hypersphere (n-ball). The dimension where the maximal volume of the n-ball lives depends on the radius, and for the unit n-ball, the max is at 5D, not 6D. As D->inf, then V->inf too.
This relationship doesn't happen to the hypercube btw. Really, it is about the definition of each object. The volume of the hypercube just continues to grow. So of course the ratio is going to explode...
As an extra fun tidbit, I'll add that when we work with statistics some extra wildness appears. For example, there is a huge difference between the geometry of the uniform distribution and the gaussian (normal) distribution, both of which can be thought of as spheres. Take any two points in each distribution and draw a line connecting them and interpolate along that line. For the unit distribution, everything will work as expected. But for the gaussian distribution you'll find that your interpolated points are not representative of the distribution! That's because the normal distribution is "hollow". In math speak, we say "the density lies along the shell." Instead, you have to interpolate along the geodesic. Which is a fancy word to mean the definition of a line but aware of the geometry (i.e. you're traveling on the surface). Easiest way to visualize this is thinking about interpolating between two cities on Earth. If you draw a straight line you're gonna get a lot of dirt. Instead, if you interpolate along the surface you're going to get much better results, even if that includes ocean, barren land, and... some cities and towns and other things. That's a lot more representative than what's underground.
I’m familiar with this example of hyper-geometry. Put more abstractly, my intuition always said something like “the volume of hyper geometric shapes becomes more distributed about their surface as the number of dimensions increases”.
Assuming you ignore or amortize the time necessary to create the table in the first place, of course.
This is the basis for rainbow tables: precomputed tables for mapping hashes to passwords, with a space-saving “hash chaining” trick to effect a constant factor reduction in table size. Such tables are the reason why passwords must be hashed with a unique salt when stored in a database.
Second, with that out the way, these cars are not playing the same game as horses… first, and quite obviously they have massive amounts of horsepower, which is kind of like giving a team of horses… many more horses. But also cars have an absolutely massive fuel capacity. Petrol is such an efficient store of chemical energy compared to hay and cars can store gallons of it.
I think if you give my horse the ability of 300 horses and fed it pure gasoline, I would be kind of embarrassed if it wasn’t able to win a horse race.