Welcome to another VSHN.timer! Every Monday, 5 links related to Kubernetes, OpenShift, CI / CD, and DevOps; all stuff coming out of our own chat system, making us think, laugh, or simply work better.
Errare humanum est, sed perseverare diabolicum.(Source)
1. In these times of zero-days on log4j, it is becoming harder and harder to keep our systems safe and sound. Thankfully, our collective experience brings some best practices to daylight. Mathew Duggan just shared a few common infrastructure mistakes he’s made during the years, just for us to learn from.
2. The creators of the Jeli incident platform just published a comprehensive Post-Incident Guide, also available in PDF format, with a complete set of instructions for you to get the most learning out of a painful incident. An outstanding guide, and a must read.
3. Last September, Slack had an outage that impacted less than 1% of their users for around 24 hours. The root cause was an attempt to enable DNSSEC in their infrastructure. Slack Engineering explains it all in their blog.
4. You probably didn’t notice, but Amazon Web Services suffered a service disruption in their Northern Virginia region (“us-east-1”) on Tuesday, December 7th, 2021. It impacted the availability and performance of EC2, API Gateway, EKS, and some other services. Their report provides more details.
How does your team deal with incidents and failures? Is your team working in a blame-aware environment? Do you have any Kubernetes best practices to share with the community? Get in touch with us, and see you next… year for another edition of VSHN.timer. That’s right! This is the last VSHN.timer of 2021; we’d like to thank you for your attention, your comments, your sharing on social media, and your suggestions! The VSHN.timer team wishes you all the best for 2022 🙂
PS: would you like to receive VSHN.timer every Monday in your inbox? Sign up for our weekly VSHN.timer newsletter.
PS2: do you prefer reading VSHN.timer in your favorite RSS reader? Subscribe to this feed.