In the usual course of writing software, it's common to install huge dependency chains (npm, pypi), and any vulnerable package could spell doom. There's some nasty stuff out there, like https://pytorch.org/blog/compromised-nightly-dependency/ which uploaded people's SSH keys to the attacker.
It's easy to say just "use containers" or "use VMs" — but are there pragmatic workflows for doing these things that don't suffer from too many performance problems or general pain/inconvenience?
Are containers the way to go, or VMs? Which virtualization software? Is it best to use one isolated environment per project no matter how small, or for convenience's sake have a grab-bag VM that contains many projects all of which are low value?
Theorycrafting is welcome, but am particularly interested in hearing from anyone who has made this work well in practice.
Some examples of how we do it:
- Devs can only use hardened (by us) Docker images hosted inside our infrastructure. Policies enforce this during CI and runtime on clusters.
- All Maven/PIP/NodeJS/etc. dependencies are pulled through via proxy and scanned before first use. All future CI jobs pull from this internal cache.
- Only a handful of CI runners have outbound connectivity to the public internet (via firewalls). These runners have specific tags for jobs needing connectivity. All other runners pull dependencies / push artefacts from within our network.
- The CI Runners with Internet connectivity have domains whitelisted at the firewall level, and so far very few requests have been made to add new domains.
- External assets, e.g an OpenJDK artefact, have their checksums validated during the build stage of our base images. This checksum is included in Docker image metadata should we wish to download the asset again and compare against the public one.