Hacker News new | comments | show | ask | jobs | submit login

How much time does a full reset take?

A few minutes - killing processes on one machine, restarting a database on another machine/cluster, reloading a schema, importing data (for some experiments), deploying a new service, loading tensorflow models, warming up benchmark clients. I found that the key of good research pipelines is having a really consistent way of passing around configurations between components.

My components arent strictly microservices (a mixture of open source components and handwritten tools) and they interact in all sorts o fprotocols with each other (importing jsons, csv, GRPC, HTTP), but I essentially treat the configuration flags as their API, so there are no implicit configurations that I could forget about. The rest is just naming things well, e.g. descriptive names for experiment result files etc.

Initially I thought everyone is doing that, but from talking to PhDs in other domains I noticed that there is a strong bias towards people working in any kind of complex distributed setting having these pipelines.

My friends who devise ML models and just test them on datasets on their laptop never had a real need to get a pipeline in place because they never felt the pain points of setting up large distributed experiments.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact