I recently had the privilege of attending the SRECon conference in San Francisco. It’s in its third year and 650 attendees gathered to learn and share their experiences, insights and site reliability engineering (SRE) practices.
Most SRECon attendees either identified as SREs or were there to learn more about this new role and approach. While there was strong representation from the “unicorn” community (Google, Shopify, Twitter, Netflix and Linkedin), enterprises such as Capital One and Comcast were among the many exhibitors actively seeking to recruit SRE talent. The conference spirit was infused with enthusiasm, a willingness to share, innovative thinking and grassroots advice. It reminded me very much of my first DevOps Days in 2012. There is no doubt that interest in SRE is rapidly growing and worthy of attention by enterprise IT.
The practice of site reliability engineering was born out of the book by the same name from Google. Interestingly, the SRE movement emerged separately from the DevOps movement—although there is little doubt that they are part of the same IT spectrum with similar customer value-driven goals. DevOps focuses on engineering continuous delivery to the point of deployment; SRE focuses on engineering continuous operations at the point of customer consumption. Both domains rely on sharing, culture, metrics and automation. Both require human and automated resources to ensure a seamless value stream and exceptional customer experience.
The role of the SRE is rapidly gaining momentum. Part systems administrator, part second tier support and part developer, SREs are encouraged to be inquisitive, to acquire new skills, to ask questions, to solve problems and to embrace automation. I asked several of the exhibitors what types of skills they were looking for in new SREs. The answers almost unilaterally included the ability to code as well as an unwavering thirst for knowledge and improvement—to find answers, solve problems and ensure the highest levels of reliability and availability.
I also can see a clear bridge between SRE with IT service management (ITSM). Many of the presenters spoke about the handling and management of late-night incidents, escalations, root cause analysis, service level objectives, availability metrics and other topics addressed in ITSM frameworks such as ITIL. The tools and process definitions may be different, the opportunity to instill agility into many of the ITSM processes to meet changing demands is certainly apparent.
What are the risks? To my mind the biggest risk is if SRE, DevOps, ITSM, Agile SDLC and other frameworks do not align and remain siloed outposts. SRE and DevOps are two sides of the same coin. Let’s hope that the SRE momentum catches up to DevOps and that enterprises begin to integrate both sets of practices as the norm.