Hacker News new | past | comments | ask | show | jobs | submit login

I've been dabbling in AGI and it seems like the field has a lot of low-hanging fruit. I'll bet Carmack can offer some significant contributions.

I'll take an opportunity to plug a paper I recently published on comparing relative intelligence. The punchline will illuminate the low-hangingness of the fruit in this field.

Suppose X and Y are AGIs and you want to know which is more intelligent. For any interactive reward-giving environment E, you could place X into E and see how much reward X gets; likewise for Y. If X gets more reward, you can consider that as evidence of X being more intelligent. But there are many environments, and X might do better in some, Y in others. How can you combine those pieces of evidence into a final judgment?

The epiphany I had (obvious in hindsight) is that the above situation is actually an election in disguise. The voters are interactive reward-giving environments, voting (via their rewards) in an intelligence contest between different AGIs. This allows us to import centuries of research on voting and elections! In particular, by using theorems about elections published in the 1970s, I was able to provide an elegant notion of relative intelligence.

The notion I provided is elegant enough that some theorems can even be proved with it, for example, formalizations of the idea that "higher-intelligence team-members make higher-intelligence teams". Which emphasizes the low-fruit-hanginess of the field: as obvious as that idea seems, apparently no-one was able to prove it with previous formal intelligence measures, probably because those previous intelligence measures were too complicated to reason about!

Here's the paper: https://philpapers.org/archive/ALEIVU.pdf

All environments are not created the same, and therefore not equal. If you feed 100 environments that are nearly identical to each other but favor A, and 4 other environments that all favor B but are all dissimilar to each other, then although B is more generalized, A would still win.

Since you have no formal way to compare environments to each other, you can't prevent this from happening. Therefore you just pushed the subjectively as to which AI is smarter to which environments are chosen by the user to run against.

The paper addresses this. Rather than (futilely) attempt to nail down the "one true electorate", the paper defines an infinite family of comparators, depending on a hyperparameter which is, essentially, the choice of which environments get to vote and how to count their votes. Crucially, this doesn't change the truth of the structural theorems (except that some of the theorems require the hyperparameter satisfy certain constraints).

Indeed, any attempt to come up with "one true comparison of intelligence" (as opposed to a parametrized family) should be viewed with skepticism, because it really must depend on a lot of arbitrary choices.

You might be interested in the No Free Lunch Theorem (https://en.wikipedia.org/wiki/No_free_lunch_theorem).

For what I skimmed from your paper, it looks like the LH agents may be viewed as discrete optimization processes trying to optimize an objective/utility function across an infinite space of possible environments (infinite voters).

If it is the case, and if each environment vote has the same weight, you may be in a case of no free lunch, where the performances of all possible agents (including the random agent) will average to the same across all possible environments.

Or, to restate the above, for each environment in which an agent is doing well, it is possible to construct an "anti-environment" where the agent is performing exactly as bad.

My personal opinion on the topic of AGI is that it is actually a case of NFLT.

I think you're right to bring up the NFLT, but I don't think it is applicable, it just points at the real question.

The key assumption to get the NFLT is that each environment vote has the same weight, i.e. we are targeting a uniform distribution on objective functions / environments / problems / whatever you call it.

If you break this assumption, you get an opposite result which is that search algorithms divide into some equivalence classes determined by the sets of different outcomes (traces, if I remember the theorem's description) that you discriminate between.

A uniform distribution like this is actually a very very strong precondition; it implies (looking at results about the complexity of sets of strings, since choosing an environment is like choosing a string from 2^N given some encoding, etc) that you care equally about a very large number of environments most of which have no compressible structure or equivalently have a huge kolmogorov complexity. Most of these environments would not have a compact encoding, relative to a particular choice of machine, but we are weighing these the same as those environments which are actually implementable using less than a ridiculous amount of storage to represent the function.

The reason why I think this is too strong an assumption to use is then that we don't care about all these quadrillion problems which have no compact encoding - we know this because we literally can't encounter them as they would be too large to ever write down using ordinary matter.

Allowing for this, talking usefully about evaluating an AGI or equivalently a search strategy or optimization algorithm implies having an understanding of the distribution of environments / problems we care about. I think capturing this concept in a 'neat' way would be a significant contribution; I had a go during my PhD but failed to get anywhere. Unfortunately things like K-complexity are uncomputable, so reasoning about distributions in those terms is a dead-end.

Right, the environments are not uniformly distributed. In fact, the paper actually defines not one single intelligence comparator but an infinite family, parametrized by a hyperparameter which is, essentially, a choice of which environments vote and how to count their votes. Crucially, this doesn't change the truth of the structural theorems (except that some of the theorems require the hyperparameter satisfy certain constraints).

Other authors (Legg and Hutter, 2007) followed the line of reasoning in your comment much more literally. They proposed to measure the intelligence of an agent as the infinite sum of the expected rewards the agent achieves on each computable environment, weighted by 2^-K where K is the environment's Kolmogorov complexity. Which seems as if it gives "one true measure" of intelligence, but actually that isn't the case at all, because Kolmogorov complexity depends on a reference universal Turing machine (Hutter himself eventually acknowledged how big a problem this is for his definition, Leike and Hutter, 2015).

My position is that any attempt to come up with "one true comparison of intelligence" (as opposed to a parametrized family) should be viewed with skepticism, because relative intelligence really must depend on a lot of arbitrary choices.

Hah, interesting - this is a reference I hadn't seen and I like the sound of it. There was me thinking I'd had an idea of my own one time!

The reference machine thing would be the next problem to argue if using 2^-K as the weight; whilst you can make the K-complexity of any particular string low by putting an instruction in your machine that is 'output the string', this is clearly cheating! So there ought to be a connection between the reference machine and some real physics, since we are perhaps not interested in building optimisers that perform well in universes whose physics is very different to ours.

Sadly even if this were cracked I think the fact that K is uncomputable would make the result likely to be useless in practise.

Thanks for your interesting reply, I enjoyed it.

The computability problem can be addressed by using Levin complexity instead of Kolmogorov complexity, an approach which you can read here: http://users.monash.edu/~dld/Publications/2010/HernandezOral...

It still suffers the problem that it's highly lopsided in favor of simpler environments. Of course you're absolutely right that environments too complex to exist in our universe should get low weight. But it's hard to find the right "Goldilocks zone" where those ultra-complex environments are discounted sufficiently but medium-complexity environments aren't overly disenfranchised, and where ultra-simple environments aren't given overwhelming authority.

>There was me thinking I'd had an idea of my own one time!

I wouldn't give up. Although it's such a long paper, Legg and Hutter 2007 actually has very little solid content: they propose the definition, and the rest of the paper is mostly filler. There are approximately zero theorems or even formal conjectures. One area I think is ripe for contributions would be to better articulate what the desired properties of an intelligence measure should be. Legg and Hutter offered a measure using Kolmogorov weights, but WHY is that any better than just randomly assigning any gibberish numbers to agents in any haphazard way--what axioms does it satisfy that one might want an intelligence measure to satisfy?

Thanks for the clarification.

Like I said, I only skimmed your paper, so I hope it was clear my comment was not intended as a criticism (or even as a review) :)

I think I agree with the general terms of your conclusion personally.

Yep its clear that the NFLT only apply if we consider all possible environments equally.

In practice, we are indeed not interested in every imaginable environments, only in "realistic" ones.

It was not clear for me if the paper addressed such concerns for AGI, e.g. when writing:

To achieve good rewards across the universe of all environments, such an AI would need to have (or appear to have) creativity (for those environments intended to reward creativity), pattern-matching skills (for those environments intended to reward pattern-matching), ability to adapt and learn (for those environments which do not explicitly advertise what things they are intended to reward, or whose goals change over time), etc.

But like I said, I only skimmed it.

In general (not talking about the paper there), I have the impression that this is something that may be missed (sometimes even by researchers working in the domain), and I agree very much to your point!

This is why I think the NFLT gives us an interesting theoretical insight here:

Making a "General" AI is not actually about creating an approach that is able to learn efficiently about any type of environment.

Yes - I think you're right that the actual interesting result from NFLT is not that 'optimisation is impossible', but that 'uniform priors are stupid'.

Unfortunately, your intelligent agents qualify as optimization algorithms and therefore the No Free Lunch Theorem applies:


I.e. across the space of all possible environments, all agents perform equally well

As others pointed out, the NFLT only applies if the environments are uniformly distributed. In the paper, they are not uniformly distributed.

I missed your post and I made a similar answer.

However the idea may still be applicable if the environments votes can weighted, based on their relevance to specific domains.

(The same way optimization techniques are still useful despite the NFTL)

For any interactive reward-giving environment E, you could place X into E and see how much reward X gets; likewise for Y. If X gets more reward, you can consider that as evidence of X being more intelligent.

That's an odd definition of intelligence. By that definition, a bird is more "intelligent" than a human at the task of opening a nut. Seems like "fitness" would be a much more appropriate term.

It seems especially strange to consider this work in the field of general intelligence. Nothing about what you just described is general. By this definition, a chess bot is much more intelligent than the average person. I don't think we'd say a chess bot has general intelligence.

I think the idea is that you would have many environments and each one is a voter.

If an AGI candidate wins the board game vote but no others (the hunter-gathering vote, the walking and crawling vote, the "publish or perish" vote, etc), it will be trounced by something that is not quite so good at board games but is more flexible and adaptable - i.e., general.

I'm an AGI skeptic myself, but I do think that's the best attempt at a formalism for ranking AGI attempts that I've seen so far (disclaimer: as a skeptic I haven't exactly done a deep dive into the field).

> Nothing about what you just described is general.

But once you start integrating over many environments, it seems to make more sense, and the "generality" would be with respect to the set of environments being considered.

You'd be correct if the electorate held only one voter: the chessboard environment. But in any other environment the chess bot would fail miserably, no votes gained, whereas a future AGI would score well 'across the board' in many/most environments, win the election.

Could you summarize what exactly interesting insights did you gained by casting this problem as the election system? Unfortunately abstract fails to communicate this (I hate "teaser-only" abstracts!). Also, it might be easy to go from relative metric to sort of absolute by comparing against a "starndard" agent, for example, a random agent.

The #1 thing is it led to an elegant notion of how to compare intelligence (or rather, a parametrized family of intelligence comparators). There's a theorem called Arrow's Theorem, which basically says there's no good way to decide elections between more than 2 candidates. There are a handful of requirements an election-deciding method should have, and no method satisfies all those requirements. But Arrow's theorem has a loophole, discovered in the 1970s: if there are infinitely many voters, then there are methods of deciding the election which satisfy all Arrow's requirements. Economists in the 1970s exactly characterized these methods, in terms of a mathematical device called an "ultrafilter". Taking their characterization, a definition for relative intelligence is immediate--it writes itself.

If you're already familiar with ultrafilters, then the relative intelligence definition is SO elegant you're like, "Whoah. How can a definition of relative intelligence be that simple?" Of course, if you're not familiar with ultrafilters, then the definition is just as complicated as the definition of ultrafilters, which is quite complicated. So the definition of relative intelligence is like a simple computer program which imports from a complicated library in order to achieve that simplicity.

Taking their characterization, a definition for relative intelligence is immediate--it writes itself

A definition, but not a concrete and practical way of determining the relative difference between two agents?

Come on, bring this into reality. We're mostly a bunch of coders here. Stop talking about ultrafilters, concepts that are complex and thus render those of us not familiar with them unable to understand what you're saying.

Answer the question.. How can you take two real agents and, in finite time with finite resources, and in a completely general way, compare their intelligences?

The answer might be "the paper doesn't get us any closer to that." So just say that. Otherwise you're being misleading, because you start out by presenting the problem in a simple way. Then you get complex when you answer it.

The paper isn't intended to be a manual for how to practically compare agents. It will indirectly help there, I hope, by making people realize that they're looking at an election problem, and that there's a big existing literature on that subject. So in the practical case, say you have 10 different benchmarks, and some agents perform better at some of them, and others perform better at others. You could approach the problem from scratch, but it would be helpful for you to realize "oh this is an election with 10 voters and people have been studying how to decide elections for hundreds of years, I probably shouldn't reinvent the wheel". For example, it might take you many ages to essentially rediscover the Condorcet paradox and you might put inordinate effort into futilely trying to "solve" that paradox. Or you could stand on the shoulders of giants and avoid all that! https://en.wikipedia.org/wiki/Condorcet_paradox

But the fact that there's so much low hanging fruit is kind of the point. AGI as a field is still in its infancy. It's pure research and will be for a long time.

So if you want "Carmack's Theorem" or "Noelsusman's Theorem" to appear in high school math textbooks 500 years from now, this is the best time to jump in. (Assuming there are still textbooks 500 years from now.) "Doom" probably won't be remembered by then.

> "Doom" probably won't be remembered by then.

Given that DOOM has been ported to virtually every platform known to man, I’d argue the opposite.

Besides, in many instances, art survives while knowledge doesn’t. I suppose the digital age is different, but you never know!

I won't be around but 500 years from now the chances are much better than even that Doom will still be around in some form and a couple of people might even know who wrote it back in the stone age of technology. Not so sure about Carmack's Theorem.

I have also been researching AGI and what I found over the years is that a large majority of the work involved requires knowledge of outside fields that have been studied for centuries. Metaphysics, epistemology, macro/microeconomics, geometry --- dozens of fields that you would never guess should ever be related to AI in any way are actually pivotal when it comes to AGI.

How did you rule out other attributes in the voting, such as charisma, political environment, society wellness, etc?

The "voters" are not humans, but environments. A simple example of an environment would be a room with two buttons, one of which rewards you when you press it, and one of which punishes you when you press it. Independently of anything like charisma, society wellness, etc. If X figures this out quicker and pushes the "reward" button more often than Y, then we consider this simple environment to "vote" for X in the intelligence contest. (See the paper for the formal definitions.)

This simple environment is a form of the Multi-arm Bandit Problem, which has been researched and productised long ago. It's being used in advertising among other things.


I see. Instead of modeling the complexities, you simply record the choices picked by the adversarial entities in play.

This is a neat idea. Seems there's room to weigh value of electorate based on how liberally they praise. Thanks for sharing

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact