VSHN.timer

VSHN.timer #107: Planning for Incidents and Operations

6. Sep 2021

Welcome to another VSHN.timer! Every Monday, 5 links related to Kubernetes, OpenShift, CI / CD, and DevOps; all stuff coming out of our own chat system, making us think, laugh, or simply work better.

This week we’re going to talk about how to plan for operations, for both good times and bad times.

1. In the world of DevOps and SRE, incidents are not a matter of “if”, but rather of “when”. It is fundamental for teams to agree on a communication protocol, specifically tailored for all internal and external stakeholders, in which the most important milestone is the incident retrospective.

https://www.blameless.com/incident-response/how-to-communicate-incident-retrospectives-to-stakeholders

2. Properly identifying (and negotiating with) project stakeholders is an art that some product managers sadly don’t really know much about. Each stakeholder is different, and knowing which information to exchange with each one, when and how, is a fundamental skill that should be learnt properly.

https://hackernoon.com/the-product-manager-guide-to-identifying-and-managing-project-stakeholders-856a35bw

3. How do site reliability engineers plan for the resource usage of their Kubernetes clusters? Being greedy may lead to downtime, but being too generous can be really inefficient and expensive in the long run. There are many variables to take into account (CPU, storage, memory) and it can be really complex to find out the optimal values for each.

https://sysdig.com/blog/kubernetes-capacity-planning/

4. Resource planning for Red Hat OpenShift clusters can be simplified using Trimaran schedulers, featuring the TargetLoadPacking and LoadVariationRiskBalancing plugins based on the scheduler framework. This article on the Red Hat Hybrid Cloud explains how to use them.

https://cloud.redhat.com/blog/improving-the-resource-efficiency-for-openshift-clusters-via-trimaran-schedulers

5. The tool of the week is Uptime Kuma, a self-hosted web-based monitoring tool, easily installed as a Docker container, with dark and light modes, and with a live demo to test its capacities. What’s not to like?

https://github.com/louislam/uptime-kuma

What tools do you use for capacity planning of your clusters? How do you communicate incidents to your stakeholders? Would you like to share any tips and tricks with the community? Get in touch with us, and see you next week for another edition of VSHN.timer.

PS: would you like to receive VSHN.timer every Monday in your inbox? Sign up for our weekly VSHN.timer newsletter.

PS2: do you prefer reading VSHN.timer in your favorite RSS reader? Subscribe to this feed.

PS3: check out our previous VSHN.timer editions about incidents and operations: #32, #41, #49, #66, #75, and #89.

Adrian Kosmaczewski

Adrian Kosmaczewski is in charge of Developer Relations at VSHN. He is a software developer since 1996, a trainer, and a published author. Adrian holds a Master in Information Technology from the University of Liverpool.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us