In AI work - which naturally lends itself to replicability improvements - we cou...

In AI work - which naturally lends itself to replicability improvements - we could get truly solid replicability by ratcheting up the standards for code quality, testing, and automation in AI projects. And I think llms can start doing a lot of that kind of QA and engineering / pre-operationalization work, because the best llms are already better at software and value engineering than the average postdoc AI researcher.

Most AI codes are missing key replicability factors - either the training data/trainer are missing, the code has a poor testing / benchmark automation strategy, the code documentation is meager, or there's no real CI/CD practice for advancing the project or operationalizing it against problems caused by the anthropocentric collapse.

Some researchers are even hardened against such things, seeing them as false worship of harmful business metrics, rather than a fundamental duty that could really improve the impact of their research, and it's applicability towards a universal crisis that faces us all.

But we can put the lie to this view with just one glance at their code. Too much additional work is necessary to turn it into anything useful, either for further research iterations or productive operationalization. The gaps in code quality exist not because that form of code is optimal for research aims, but because researchers lack software engineering expertise, and cannot afford software engineering labor.

But thankfully the level of software engineering labor is not even that great - llms can now help swing that effort.

As a result I believe that we should work to create standards for AI assisted research repos that correct the major deficits of replicability, usability, and code quality that we see in most AI repos. Then we should campaign to adopt those standards into peer review. Let one of the reviewers be an AI that really grills your code on its quality. And actually incorporate the PRs that it proposes.

I think that would change the situation, from the current standard where academic AI repos are mainly nonreplicating throw-away code, to an opposite situation where the majority of AI research repos are easy to replicate, improve, and mobilize against the social and environmental problems facing humanity, as it navigates through the anthropocene disaster.