My components arent strictly microservices (a mixture of open source components and handwritten tools) and they interact in all sorts o fprotocols with each other (importing jsons, csv, GRPC, HTTP), but I essentially treat the configuration flags as their API, so there are no implicit configurations that I could forget about. The rest is just naming things well, e.g. descriptive names for experiment result files etc.
Initially I thought everyone is doing that, but from talking to PhDs in other domains I noticed that there is a strong bias towards people working in any kind of complex distributed setting having these pipelines.
My friends who devise ML models and just test them on datasets on their laptop never had a real need to get a pipeline in place because they never felt the pain points of setting up large distributed experiments.