I'll take an opportunity to plug a paper I recently published on comparing relative intelligence. The punchline will illuminate the low-hangingness of the fruit in this field.
Suppose X and Y are AGIs and you want to know which is more intelligent. For any interactive reward-giving environment E, you could place X into E and see how much reward X gets; likewise for Y. If X gets more reward, you can consider that as evidence of X being more intelligent. But there are many environments, and X might do better in some, Y in others. How can you combine those pieces of evidence into a final judgment?
The epiphany I had (obvious in hindsight) is that the above situation is actually an election in disguise. The voters are interactive reward-giving environments, voting (via their rewards) in an intelligence contest between different AGIs. This allows us to import centuries of research on voting and elections! In particular, by using theorems about elections published in the 1970s, I was able to provide an elegant notion of relative intelligence.
The notion I provided is elegant enough that some theorems can even be proved with it, for example, formalizations of the idea that "higher-intelligence team-members make higher-intelligence teams". Which emphasizes the low-fruit-hanginess of the field: as obvious as that idea seems, apparently no-one was able to prove it with previous formal intelligence measures, probably because those previous intelligence measures were too complicated to reason about!
Here's the paper: https://philpapers.org/archive/ALEIVU.pdf
Since you have no formal way to compare environments to each other, you can't prevent this from happening. Therefore you just pushed the subjectively as to which AI is smarter to which environments are chosen by the user to run against.
Indeed, any attempt to come up with "one true comparison of intelligence" (as opposed to a parametrized family) should be viewed with skepticism, because it really must depend on a lot of arbitrary choices.
For what I skimmed from your paper, it looks like the LH agents may be viewed as discrete optimization processes trying to optimize an objective/utility function across an infinite space of possible environments (infinite voters).
If it is the case, and if each environment vote has the same weight, you may be in a case of no free lunch, where the performances of all possible agents (including the random agent) will average to the same across all possible environments.
Or, to restate the above, for each environment in which an agent is doing well, it is possible to construct an "anti-environment" where the agent is performing exactly as bad.
My personal opinion on the topic of AGI is that it is actually a case of NFLT.
The key assumption to get the NFLT is that each environment vote has the same weight, i.e. we are targeting a uniform distribution on objective functions / environments / problems / whatever you call it.
If you break this assumption, you get an opposite result which is that search algorithms divide into some equivalence classes determined by the sets of different outcomes (traces, if I remember the theorem's description) that you discriminate between.
A uniform distribution like this is actually a very very strong precondition; it implies (looking at results about the complexity of sets of strings, since choosing an environment is like choosing a string from 2^N given some encoding, etc) that you care equally about a very large number of environments most of which have no compressible structure or equivalently have a huge kolmogorov complexity. Most of these environments would not have a compact encoding, relative to a particular choice of machine, but we are weighing these the same as those environments which are actually implementable using less than a ridiculous amount of storage to represent the function.
The reason why I think this is too strong an assumption to use is then that we don't care about all these quadrillion problems which have no compact encoding - we know this because we literally can't encounter them as they would be too large to ever write down using ordinary matter.
Allowing for this, talking usefully about evaluating an AGI or equivalently a search strategy or optimization algorithm implies having an understanding of the distribution of environments / problems we care about. I think capturing this concept in a 'neat' way would be a significant contribution; I had a go during my PhD but failed to get anywhere. Unfortunately things like K-complexity are uncomputable, so reasoning about distributions in those terms is a dead-end.
Other authors (Legg and Hutter, 2007) followed the line of reasoning in your comment much more literally. They proposed to measure the intelligence of an agent as the infinite sum of the expected rewards the agent achieves on each computable environment, weighted by 2^-K where K is the environment's Kolmogorov complexity. Which seems as if it gives "one true measure" of intelligence, but actually that isn't the case at all, because Kolmogorov complexity depends on a reference universal Turing machine (Hutter himself eventually acknowledged how big a problem this is for his definition, Leike and Hutter, 2015).
My position is that any attempt to come up with "one true comparison of intelligence" (as opposed to a parametrized family) should be viewed with skepticism, because relative intelligence really must depend on a lot of arbitrary choices.
The reference machine thing would be the next problem to argue if using 2^-K as the weight; whilst you can make the K-complexity of any particular string low by putting an instruction in your machine that is 'output the string', this is clearly cheating! So there ought to be a connection between the reference machine and some real physics, since we are perhaps not interested in building optimisers that perform well in universes whose physics is very different to ours.
Sadly even if this were cracked I think the fact that K is uncomputable would make the result likely to be useless in practise.
Thanks for your interesting reply, I enjoyed it.
It still suffers the problem that it's highly lopsided in favor of simpler environments. Of course you're absolutely right that environments too complex to exist in our universe should get low weight. But it's hard to find the right "Goldilocks zone" where those ultra-complex environments are discounted sufficiently but medium-complexity environments aren't overly disenfranchised, and where ultra-simple environments aren't given overwhelming authority.
>There was me thinking I'd had an idea of my own one time!
I wouldn't give up. Although it's such a long paper, Legg and Hutter 2007 actually has very little solid content: they propose the definition, and the rest of the paper is mostly filler. There are approximately zero theorems or even formal conjectures. One area I think is ripe for contributions would be to better articulate what the desired properties of an intelligence measure should be. Legg and Hutter offered a measure using Kolmogorov weights, but WHY is that any better than just randomly assigning any gibberish numbers to agents in any haphazard way--what axioms does it satisfy that one might want an intelligence measure to satisfy?
Like I said, I only skimmed your paper, so I hope it was clear my comment was not intended as a criticism (or even as a review) :)
I think I agree with the general terms of your conclusion personally.
In practice, we are indeed not interested in every imaginable environments, only in "realistic" ones.
It was not clear for me if the paper addressed such concerns for AGI, e.g. when writing:
To achieve good rewards across the universe of all environments, such an AI would need to have (or appear to have) creativity (for those environments intended to reward creativity), pattern-matching skills (for those environments intended to reward pattern-matching), ability to adapt and learn (for those environments which do not explicitly advertise what things they are intended to reward, or whose goals change over time), etc.
But like I said, I only skimmed it.
In general (not talking about the paper there), I have the impression that this is something that may be missed (sometimes even by researchers working in the domain), and I agree very much to your point!
This is why I think the NFLT gives us an interesting theoretical insight here:
Making a "General" AI is not actually about creating an approach that is able to learn efficiently about any type of environment.
I.e. across the space of all possible environments, all agents perform equally well
However the idea may still be applicable if the environments votes can weighted, based on their relevance to specific domains.
(The same way optimization techniques are still useful despite the NFTL)
That's an odd definition of intelligence. By that definition, a bird is more "intelligent" than a human at the task of opening a nut. Seems like "fitness" would be a much more appropriate term.
It seems especially strange to consider this work in the field of general intelligence. Nothing about what you just described is general. By this definition, a chess bot is much more intelligent than the average person. I don't think we'd say a chess bot has general intelligence.
If an AGI candidate wins the board game vote but no others (the hunter-gathering vote, the walking and crawling vote, the "publish or perish" vote, etc), it will be trounced by something that is not quite so good at board games but is more flexible and adaptable - i.e., general.
I'm an AGI skeptic myself, but I do think that's the best attempt at a formalism for ranking AGI attempts that I've seen so far (disclaimer: as a skeptic I haven't exactly done a deep dive into the field).
But once you start integrating over many environments, it seems to make more sense, and the "generality" would be with respect to the set of environments being considered.
If you're already familiar with ultrafilters, then the relative intelligence definition is SO elegant you're like, "Whoah. How can a definition of relative intelligence be that simple?" Of course, if you're not familiar with ultrafilters, then the definition is just as complicated as the definition of ultrafilters, which is quite complicated. So the definition of relative intelligence is like a simple computer program which imports from a complicated library in order to achieve that simplicity.
A definition, but not a concrete and practical way of determining the relative difference between two agents?
Come on, bring this into reality. We're mostly a bunch of coders here. Stop talking about ultrafilters, concepts that are complex and thus render those of us not familiar with them unable to understand what you're saying.
Answer the question.. How can you take two real agents and, in finite time with finite resources, and in a completely general way, compare their intelligences?
The answer might be "the paper doesn't get us any closer to that." So just say that. Otherwise you're being misleading, because you start out by presenting the problem in a simple way. Then you get complex when you answer it.
Given that DOOM has been ported to virtually every platform known to man, I’d argue the opposite.
Besides, in many instances, art survives while knowledge doesn’t. I suppose the digital age is different, but you never know!