Hacker News new | past | comments | ask | show | jobs | submit login

I spent 9 years at Google, just left at the end of July.

The biggest thing I grew to appreciate was that for iterating large scale production systems, rollout plans are as important as anything else. A very large change may be cost or risk prohibitive to release at once, but with thought you can subdivide the release into easier to rollout and verify subcomponents, you can usually perform the same work as a series of lower risk well understood rollouts. That's critical enough where it's worth planning for this from the design phases: much as how you may write software differently to allow for good unit tests, you may want to develop systems differently to allow for good release strategies.

Google is a large company, and I think there are software engineers learning very different things from what I focused on. I worked on large scale machine learning in ads; someone working on chrome or Android likely learned very different things.




(Hi Danny!)

Yeah Google has a lot to teach about building and maintaining reliable systems.

I noticed that a lot of that, including the name SRE, gets cargo-culted by other companies without really understanding what it is.

I also guess that there might be people within Google that don't end up working in projects that invest in reliability (for whatever reason) and so might have had different experience.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: