Welcome to another VSHN.timer! Every Monday, 5 links related to Kubernetes, OpenShift, CI / CD, and DevOps; all stuff coming out of our own chat system, making us think, laugh, or simply work better.
This week we’re back! And we’re not going to talk about Foghat, but rather about how service level objectives are the new deal breakers.
1. Atlassian customers suffered in April one of the longest downtimes ever recorded in SaaS history, leaving users without access to Jira, Confluence, or OpsGenie. Gergely Orosz wrote a month-long report with updates about the issue, and most importantly, the reactions (or lack thereof) from Atlassian.
2. How does Salesforce track SLO for their thousands of services in production? They use a well-defined GitOps process based on configuration as code; it tracks SLOs, alerts, and everything related to them in the same Git repo.
3. Failures can come from the most unexpected places. Take for example how a missing shell option called “pipefail” slowed Cloudflare down dramatically.
4. In The Cloudcast, Aaron Delp received Brian Singer, CPO at Nobl9 talking about Service Level Objectives (SLO), what they are, why they matter, and how to use SLOs to focus on innovation versus technical debt.
5. Interested in the subject of SLOs? Check out next week’s online SLOconf 2022, with tracks not only for engineers, but also for all teams impacted by SLOs in one way or another.
Were you impacted by Atlassian’s downtime? How do you negotiate your SLOs with your customers? Would you like to share any tips and tricks with the community? Get in touch with us, and see you next week for another edition of VSHN.timer.
PS: would you like to receive VSHN.timer every Monday in your inbox? Sign up for our weekly VSHN.timer newsletter.
PS2: do you prefer reading VSHN.timer in your favorite RSS reader? Subscribe to this feed.
PS3: check out our previous VSHN.timer editions about Quality Assurance, SLAs & SREs: #6, #34, #43, #66, and #104.