>> What do you think then is the difference between going from 50th to 99.9th pe...

>> What do you think then is the difference between going from 50th to 99.9th percentile in their other domains? Is there something materially different between ago, protein folding, or coding?

Yes, it's the size of the search space for each problem. The search space for arbitrary programs in a language with Universal Turing Machine expressivity is infinite. Even worse, for any programming problem there are an infinite number of candidate programs that may or may not solve it and that differ in only minute ways from each other.

For Go and protein structure prediction from sequences the search space is finite, although obviously not small. So there is a huge difference in the complexity of the problems right there.

Btw, I note yet again that AlphaCode performs abysmally badly on the formal benchmark included in the arxiv preprint (see Section 5.4, and table 10). That makes sense because AlphaCode is a very dumb generate-and-test, brute-force search approach that doesn't even try to be smart and tries to make up for the lack of intelligence with an awesome amount of computational resources. Most work in program synthesis is also basically a search through the space of programs, but people in the field have come up with sophisticated techniques to avoid having to search an infinite number of programs- and to avoid having to generate millions of program candidates, like DeepMind actually brags about:

At evaluation time, we create a massive amount of C++ and Python programs for each problem, orders of magnitude larger than previous work.

They say that as if generating "orders of magnitude more" progams than previous work is a good thing, but it's not. It means their system is extremely bad at generating correct programs. It is orders of magnitude worse than earlier systems, in fact.

(The arxiv paper linked from the article quantifies this "massive" amount as "millions"; see Section 4.4).