Far from being impossible, the challenges that you mention have fairly straightforward solutions. Every cloud hosting provider that executes user-supplied virtual machines is employing practical technology that can solve this problem today.
I'd represent the code as a virtual machine image. Let the researcher work however they want. When they're done, they take a virtual machine snapshot of their execution environment. We provide some tooling so that running the "manifest" in this execution environment (re-)produces all of their computational results. Thus, think of research as a "build" process that runs analysis and produces output. Ideally this build would also compile any necessary source code before running the compiled applications in the context of research data.
Far from being "limited", researchers can run any code that can run on a cloud hosting provider today. To ensure portability, the VM image will include a versioned copy of (or reference to) the kernel (e.g. Linux) and all of the userland libraries that are used by the software (e.g. Python). Think of it like a self-describing Linux container, like a Kubernetes Pod [1] or Docker Container. The image fully describes the software running on the system; it's a precise copy. With this machine image, we can run code in the same execution environment that the researcher used.
Have you ever run a very old NES or SNES game on a Nintendo DS, or on your PC in an emulator? It's the same concept.
The researcher's data is stored directly within the virtual machine image, or is represented as a virtual data drive that's mounted in the virtual machine. When this approach becomes influential and widely adopted, researchers will represent their code as containers throughout the development process. The researcher won't just run Jupyter (or Matlab or whatnot) on a random machine, they'll e.g. run Jupyter in a carefully packaged minimal container that was designed for reproducibility.
> Code bit rots.
"Bit rot" is a phenomenon that references the difficulty of maintaining code as environment and system change around it. It doesn't directly apply to our scenario. A virtual image containing a Python 1.0 program running on version 1.0 of Linux will continue to work indefinitely, just like an old SNES game. New software releases don't affect our ability to run old software -- they just make it harder to mix the two together. Furthermore, we can even run the program while passing different (modern) data as input! We've already made huge progress over where we are today.
Sure, if we want to adapt that code to a newer version of Python and Linux, then we have work to do, but that's a different problem than reproducibility. There's no free lunch; nothing can make that problem go away. But if we do want to adapt the researcher's algorithm to another language or platform, then we have a huge advantage: we can run the researcher's actual code with their actual data and get the very same result! That's huge! That will make it far easier to confidently adapt their algorithm to a new environment.
I'd represent the code as a virtual machine image. Let the researcher work however they want. When they're done, they take a virtual machine snapshot of their execution environment. We provide some tooling so that running the "manifest" in this execution environment (re-)produces all of their computational results. Thus, think of research as a "build" process that runs analysis and produces output. Ideally this build would also compile any necessary source code before running the compiled applications in the context of research data.
Far from being "limited", researchers can run any code that can run on a cloud hosting provider today. To ensure portability, the VM image will include a versioned copy of (or reference to) the kernel (e.g. Linux) and all of the userland libraries that are used by the software (e.g. Python). Think of it like a self-describing Linux container, like a Kubernetes Pod [1] or Docker Container. The image fully describes the software running on the system; it's a precise copy. With this machine image, we can run code in the same execution environment that the researcher used.
Have you ever run a very old NES or SNES game on a Nintendo DS, or on your PC in an emulator? It's the same concept.
The researcher's data is stored directly within the virtual machine image, or is represented as a virtual data drive that's mounted in the virtual machine. When this approach becomes influential and widely adopted, researchers will represent their code as containers throughout the development process. The researcher won't just run Jupyter (or Matlab or whatnot) on a random machine, they'll e.g. run Jupyter in a carefully packaged minimal container that was designed for reproducibility.
> Code bit rots.
"Bit rot" is a phenomenon that references the difficulty of maintaining code as environment and system change around it. It doesn't directly apply to our scenario. A virtual image containing a Python 1.0 program running on version 1.0 of Linux will continue to work indefinitely, just like an old SNES game. New software releases don't affect our ability to run old software -- they just make it harder to mix the two together. Furthermore, we can even run the program while passing different (modern) data as input! We've already made huge progress over where we are today.
Sure, if we want to adapt that code to a newer version of Python and Linux, then we have work to do, but that's a different problem than reproducibility. There's no free lunch; nothing can make that problem go away. But if we do want to adapt the researcher's algorithm to another language or platform, then we have a huge advantage: we can run the researcher's actual code with their actual data and get the very same result! That's huge! That will make it far easier to confidently adapt their algorithm to a new environment.
[1] https://kubernetes.io/docs/concepts/workloads/pods/pod/