Hacker News new | past | comments | ask | show | jobs | submit login

It must be possible for SREs to refuse a service that doesn't meet certain criteria.

AFAIK this doesn't happen anywhere. Although I believe G has something like this.




In order to make this possible you need an entirely separate chain of command for SRE like what google has or very influential SREs either of which is exceedingly rare so not surprised op thinks sre doesn’t scale


Google SREs definitely have that stick.


> I believe G has something like this

It does. And until the service meets the criteria, SWEs on the project are actually partially playing the SRE roles under the guidance of the SRE org.


is sre at google just a maintenance team? what do they do then?


Effectively yes. The main things SRE provides are oncall support, production focused design consulting and integration with other infrastructure. In practice, the engagement usually always provides 1) and then the rest are dependent on how mature the SRE team is.

In a typical split, SWEs often do the dev work for features and large reliability/scalability changes (which SRE helps appropriately prioritise), whereas the SRE team maintains the software around the project (config pipelines, monitoring etc.) and might occasionally write some smaller reliability/scalability modifications.

But there can be lots of variance. It’s atypical but some of the infrastructure-focused SRE teams often maintain non-trivial software, but are part of SRE because of other responsibilities.


Google wrote a book about it. It's free to read. https://sre.google/books/


SRE is the first-responder team. They are on-call 24/7 (the team, not each person), perform systems and service monitoring, triage failures and mitigate outages.

That doesn't mean it's all handwork, I'm sure SREs at Google employ a boatload of automated event handling and custom response scripts. But "keeping the service up" requires different skills than "building a service", and Google chose to separate their Dev and Ops this way. As others said in this thread, if some service isn't up to SRE standards (in terms of monitoring, logging, or robustness), the SRE team won't accept it and Devs would have to do their own Ops.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: