VSHN.timer

VSHN.timer #121: Errare humanum est

13. Dec 2021

Welcome to another VSHN.timer! Every Monday, 5 links related to Kubernetes, OpenShift, CI / CD, and DevOps; all stuff coming out of our own chat system, making us think, laugh, or simply work better.

This week we’re going to talk about a quote attributed to Seneca, and how to achieve eudaimonia, without hubris, but rather with humility.

Errare humanum est, sed perseverare diabolicum.

(Source)

1. In these times of zero-days on log4j, it is becoming harder and harder to keep our systems safe and sound. Thankfully, our collective experience brings some best practices to daylight. Mathew Duggan just shared a few common infrastructure mistakes he’s made during the years, just for us to learn from.

https://matduggan.com/mistakes/

2. The creators of the Jeli incident platform just published a comprehensive Post-Incident Guide, also available in PDF format, with a complete set of instructions for you to get the most learning out of a painful incident. An outstanding guide, and a must read.

https://www.jeli.io/howie-the-post-incident-guide/

3. Last September, Slack had an outage that impacted less than 1% of their users for around 24 hours. The root cause was an attempt to enable DNSSEC in their infrastructure. Slack Engineering explains it all in their blog.

https://slack.engineering/what-happened-during-slacks-dnssec-rollout/

4. You probably didn’t notice, but Amazon Web Services suffered a service disruption in their Northern Virginia region (“us-east-1”) on Tuesday, December 7th, 2021. It impacted the availability and performance of EC2, API Gateway, EKS, and some other services. Their report provides more details.

https://aws.amazon.com/message/12721/

5. The VSHN.timer tool of the week is Diego Lima‘s Kubernetes Best Practices 101, a growing guide of practical knowledge about running web services on our beloved container orchestrator.

https://github.com/diegolnasc/kubernetes-best-practices

How does your team deal with incidents and failures? Is your team working in a blame-aware environment? Do you have any Kubernetes best practices to share with the community? Get in touch with us, and see you next… year for another edition of VSHN.timer. That’s right! This is the last VSHN.timer of 2021; we’d like to thank you for your attention, your comments, your sharing on social media, and your suggestions! The VSHN.timer team wishes you all the best for 2022 🙂

PS: would you like to receive VSHN.timer every Monday in your inbox? Sign up for our weekly VSHN.timer newsletter.

PS2: do you prefer reading VSHN.timer in your favorite RSS reader? Subscribe to this feed.

PS3: check out our previous VSHN.timer editions about incidents and operations: #32, #41, #49, #66, #75, #89, and #107.

Adrian Kosmaczewski

Adrian Kosmaczewski is in charge of Developer Relations at VSHN. He is a software developer since 1996, a trainer, and a published author. Adrian holds a Master in Information Technology from the University of Liverpool.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us