Hacker News new | past | comments | ask | show | jobs | submit login

Google gets DDoSed all the time. You don't hear about it because, well, the DDoS SREs are very, very good at what they do.




SREs?


Site Reliability Engineer. It's a Google (+Facebook)-specific title that is sort of like a sysadmin or devops, but instead of keeping the system up, they write code that keeps the system up. They also have a different negotiating position vs. engineering than in many other companies, eg. SREs have veto power over many architectural decisions in the code, and it's more "we'll build the system that can stay upright with a minimum of pagerstorms" vs. "you build the system and throw it over the wall to us and then we'll keep it upright through our self-sacrificing heroism."


Apple hires SREs who are actually sysadmins that occasionally code, complicating the title somewhat. This is not unique to them.

nostrademons's explanation of SRE is the correct one, IMO. The architecture is key. Engineering has to be built to allow that. It has helped me in the past to say SREs are concerned more with the operation of a service than a group of machines offering a service; it's almost like a service operations developer. When a company thinks in terms of services and abstracts the machine away, i.e., containers, scheduling, Mesos, Omega/<unnamed>, intelligent CI/CD, service discovery, now you're getting into SRE territory instead of SA territory. The architecture involvement distinguishes SRE from devops for me. You should be able to trust SRE to build services, not just run engineering output.

Teams that congeal out of Xooglers tend to preach SRE well, and there is the occasional company (Twitter and Foursquare come to mind) that applies the title and interacts with the team as intended.


For further reading, this is a good post: http://www.site-reliability-engineering.info/


Site Reliability Engineers




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: