This blog post is adapted from the speech about APPUiO Cloud given at the Berner Architekten Treffen in June 2022. With this text, we would like to give you an idea of how we built this new product using DevOps as a guiding philosophy.
Before we start, I’d like you to introduce VSHN to those who have never heard of it before. We’re a company of 50 people based in Zürich, founded in 2014 with the objective of providing companies with DevOps services of different kinds. The slogan of VSHN is, actually, “The DevOps Company.” We embrace the DevOps philosophy completely, and as you’ll see today, we use its principles and ideas in everything we do.
What does VSHN do? We provide various services and products:
- We offer “DevOps as a Service” (or DaaS), with a full team of Kubernetes and cloud experts ready to monitor your applications 24/7, in every hyperscaler, all over the world.
- We help companies become self-sufficient and cloud-enabled; we help software engineers “build bridges” between Dev and Ops, building CI/CD pipelines in various platforms such as GitLab, OpenShift, or Jenkins.
- We are Kubernetes specialists, to the point that our strategy is 100% oriented towards Kubernetes; all of the code we write, and all of the systems we assemble, for us or others, are meant to be run on Kubernetes.
Aside from our technological choices, the important thing to know about VSHN is that we have chosen to drive the growth of our company in various ways that are completely nonstandard:
- We use Sociocracy as a management and growth framework. This means that all decisions, and I mean all of them, happen through consensus among all VSHNeers (that’s how we call ourselves, by the way.)
- We have created a Handbook, freely available online, which in printed form takes 563 pages, explaining everything we are and do at VSHN with quite an incredible level of detail. I invite you to check it out and learn everything there’s to learn about us.
- We have decided as a company not to grow through venture capital, instead relying upon the good old method of organic growth. We have been self-funded since day one (2014), and we have been consistently profitable and growing since 2017.
- We are “The DevOps Company”, and as such, we embraced the DevOps mantra completely. Everything we do is automated as much as possible, freeing our brains to think.
All of these choices have shaped our culture in ways that are really not common at all, and have interesting consequences in our day-to-day operations. For example, our hiring policies are different from those of most IT companies; not only we do pay attention to the IT skills of those who want to join our team but place a very high degree of attention on the human factor. We want people to feel great at VSHN, and one of the primary factors we evaluate during our hiring process is the “likeness” of the person, that is, how much would we like to work with them every day?
Another interesting consequence of how VSHN works is that we had embraced remote and asynchronous working well before the pandemic; when the Bundesrat mandated everyone to work from home in March 2020, we simply stayed home and continued working as if nothing had happened. The important bit of information here is the asynchronous word; not so much that we work remotely, but that we work in a non-synchronous way. This particular mindset has shaped our company greatly.
But let’s get back in time a little bit, and see how APPUiO Cloud came to be.
In 2016 VSHN and Puzzle ITC (an IT and software consulting company with offices in Bern and Zurich) launched a joint venture called APPUiO. This word has many meanings: not only it is a word in Esperanto, meaning “support”, but it is also composed of “App” (for application) and “Ujo” which is Esperanto for “container.”
APPUiO consists of a series of products built around Red Hat OpenShift.
For those who haven’t heard of it yet, OpenShift is the most widely used Kubernetes-based platform in the enterprise world. It is quite popular with big companies, and it incorporates a hardened and highly available Kubernetes cluster surrounded by lots of relevant software: a container repository, a management console, and CI/CD pipelines, with a very nice and professional GUI on top.
We decided we wanted to be a part of the OpenShift market, but we also realized that installing and operating OpenShift is a huge endeavor, and many companies could not use OpenShift because of the lack of staff or budget. So we decided to join forces with Puzzle ITC.
APPUiO is our response to the complexity of Red Hat OpenShift. With APPUiO, customers can get a ready-to-use cluster, together with the know-how of VSHN and Puzzle ITC. We at VSHN specialize in the setup and maintenance of OpenShift clusters; we have been operating OpenShift clusters since version 3. Puzzle ITC is a specialist in the creation of software solutions for OpenShift, which is something we don’t do. Together, the APPUiO team can help companies make the most out of their OpenShift investment.
Flavors of APPUiO
APPUiO has been historically available in various forms:
- APPUiO Public, based on OpenShift 3, was the first Swiss-based shared OpenShift cluster available to customers. It is a shared platform, where customers can run their projects without having to care about management or anything else. There were APPUiO Public clusters running in various cloud providers, such as Cloudscale in Switzerland and AWS in Germany. Customers can choose whichever they prefer, and deploy their containers there in a few minutes.
- APPUiO Managed is the next step for organizations that are not comfortable running their workloads on a public platform. With APPUiO Managed, they get their own OpenShift cluster, for their exclusive use, and Puzzle ITC and VSHN take care of the operations of the cluster transparently for their users.
- APPUiO Self-Managed is the final step in the evolution of organizations: with APPUiO Self-Managed, organizations not only get an OpenShift cluster “keys in hand” and ready to run, but we teach their IT teams how to manage and maintain the cluster by themselves. We gradually “fade in the background” and provide help, until at some point they become completely independent. This is a very interesting option for big corporations with their own IT teams.
- APPUiO Cloud is the latest offering in the APPUiO family, officially launched in November 2021.
What is APPUiO Cloud? Simply put, APPUiO Cloud is the successor of APPUiO Public. Given the major architectural changes between OpenShift 3 and 4, instead of migrating our APPUiO Public infrastructure to OpenShift 4, we decided to create a new project from scratch, and we gave it a different name and even a different visual identity.
We notified our APPUiO Public customers of the upcoming phasing out of the service, with an offer to help them migrate their payloads to APPUiO Cloud. And on September 1st, 2022 we decommissioned the last APPUiO Public cluster in existence.
As said previously, APPUiO Cloud is based exclusively on OpenShift 4. At the moment we have two APPUiO Cloud zones available to our customers:
- cloudscale.ch in Kanton Aargau
- Exoscale au Canton de Genève
We plan to open more regions in the future, as required and following the demand from our customers.
We started working on APPUiO Cloud in the Spring of 2021, and we released it to the public in the Autumn of that year. We reused lots of code and infrastructure we had created for our work previously:
- K8up is a Kubernetes backup operator that has been picked up by the Cloud Native Computing Foundation as a sandbox project.
- Project Syn is a suite of tools that allow for the remote management of Kubernetes clusters of any kind, from a central location using a GitOps philosophy and workflow.
Who is APPUiO Cloud for?
APPUiO Cloud, just like its predecessor APPUiO Public, is meant to be an “entry-level” product, catering to the “long tail” of OpenShift customers, who might be interested in getting access to a working OpenShift cluster without the hassle of installing and operating it.
As such, we identified a few target groups:
- Cloud Native App Development: Instant access to a namespace on a running OpenShift cluster. No provisioning infrastructure in a cloud provider, no installing of OpenShift, no waiting until a new cluster is provisioned for you. Just log in, create a namespace, and you’re ready for
oc deployyour new application.
- MVP: Spin a new OpenShift namespace on APPUiO Cloud, deploy your MVP, and go back to raising more venture capital.
- DevOps & CI/CD Pipelines: Run your CI/CD pipelines, deploy, and preview your application on a running OpenShift cluster right now before going live.
- Machine Learning: Use it to train a machine learning model with new data.
- Production App Hosting: Host your application in minutes in production.
- Mobile App Backends: For iOS or Android developers needing to deploy back-ends on a scalable, trusted environment.
- Education: Let students experience the full power of a real OpenShift cluster with their own individual namespace.
- Technology Trial: For users interested in APPUiO Managed cluster, but needing to hedge the risks of trying out new technology.
- Resellers: Resell APPUiO Cloud and let us manage the cluster for you.
What is included in APPUiO Cloud?
- Instant On: Get your own OpenShift namespace in minutes, ready to use.
- Pay-per-use: Only pay for the resources you actually use.
- User Management: Organize your namespaces in teams and organizations, and assign users to those teams; control who can access which namespaces at a glance.
- Backup: Backup all your work with the pre-installed K8up operator.
- Pre-Installed and Configured Operator: APPUiO Cloud provides the following OpenShift operators pre-installed and pre-configured, ready to be used:
- K8up: Kubernetes Backup Operator.
- Cert Manager: X.509 certificate management for Kubernetes.
- Community Support: Need help? Check out our APPUiO Cloud forums and community chat. For those needing more help, there are support packages available at extra cost.
Because it’s a public platform, APPUiO Cloud comes with some gotchas:
- Maintenance Policies: There are two types of maintenance: APPUiO Cloud Zones receive automatic revision updates (for example, from 4.7.1 to 4.7.2). This includes OpenShift and worker node updates. These updates can happen at any time without prior announcement. Upgrades of a minor (for example from 4.7 to 4.8) and major (4 to 5) OpenShift versions are announced in advance, with a description of all possible breaking changes. On request, we can provide you with access to an already upgraded APPUiO Cloud Zone, for you to test your deployment if needed.
- Status Information: We communicate the status of the platform on status.appuio.cloud.
- Resource Availability: APPUiO Cloud is provided without any guarantees of resource availability.
- SLA: Best effort.
- Fair-Use Policy: APPUiO Cloud is a shared platform. Unless otherwise stated, this fair-use principle applies to the use of our services. APPUiO Cloud users must use their resources moderately, so as not to degrade the service level available to other users.
- Privileged Containers: Privileged containers can’t run on APPUiO Cloud.
- Log retention: The OpenShift integrated logging (Elasticsearch / Kibana) retains collected logs for 72h (3 days), after that time-period logs are permanently deleted.
- Other Operators: It is not possible (at the moment) to run other OpenShift operators than the ones we are offering. We evaluate them on a case-by-case basis, following the requests from our customers.
A DevOps way of working
What do we mean by “a DevOps way of working”? Let’s see first what we mean by DevOps, one of those words that can mean anything and everything depending on who you ask.
We think the best people to talk about DevOps are those who wrote “The DevOps Handbook” and “The Phoenix Project”: Gene Kim and Jez Humble.
In those books, DevOps is usually defined by the “three ways”:
- The Principles of Flow
- The Principles of Feedback
- The Principles of Continual Learning and Experimentation
How did we create APPUiO Cloud following DevOps? We applied these three principles in and out during the whole process, as follows.
The first thing was to decide where to start, that is, what value stream we wanted to provide first.
That work brought together the Product Documentation. That’s right, the first thing we created through discussion was written documentation of what we wanted to offer.
Why written? Because we work asynchronously. That means that some of us work better at night, while some work better in the morning; having everything written down helped everyone, commenting down drafts of the documents until there was agreement. Agreement from whom? From the Product Owners to the DevOps engineers who would have to maintain the solution at the end.
This way, the operations team knows exactly what is it that’s going to happen. There are no surprises down the hall, and they feel empowered and listened to. All the features of APPUiO Cloud are, simply put, possible to release; either now or later, but they are possible.
The important thing here is that we started by reverse engineering Conway’s Law. That is, we first structured the team that would work on APPUiO Cloud, and then we got to create the system. The end result of this process is that the architecture of APPUiO Cloud, following Conway’s Law, strictly mirrors the structure of our team. We do not fight against Conway’s Law; we embrace it.
The result of this work of architecture can be summarized in three different documentation websites for APPUiO Cloud; you’ve heard right, we have created three different sets of documentation, and we keep them updated every day:
We have made all of the documentation publicly available and viewable, even editable because transparency is one of our values at VSHN. We want all of our customers to know exactly we’re doing things the way we do; this, in turn, generates trust in our existing customers and shows our know-how to prospective customers. These three documentation sites are, simply put, great marketing tools!
The principle of Flow requires teams to make work visible, reduce batch sizes and intervals of work, and build quality. We limited work in progress to the strict minimum, and we automated as much as possible of the process.
This automation involves removing the human factor from the maintenance of those clusters as much as possible. One of the key factors for doing this was Project Syn, a suite of tools we started building in 2019 that allows our small team to manage hundreds of clusters from a central location. We created Project Syn as a way to be able to operate our customers’ assets with a reduced human footprint, but it turned out to be a great way to handle our own work on APPUiO Cloud, too.
Thanks to Project Syn, DevOps engineers can specify and deploy changes to lots of Kubernetes clusters from a central location, using a GitOps strategy; just commit your changes as “infrastructure as code” to a Git repository, and wait a few seconds until all clusters apply those changes.
We use Project Syn to deploy Kyverno security policies to our APPUiO Cloud clusters so that all regions conform to the same rulebook.
We also configured each of the APPUiO Cloud zones with the mandatory differences between the cloud providers we use; Exoscale and cloudscale.ch do not offer exactly the same features and being able to see those differences in written form allows us to manage those systems, to take decisions for the future, and to inform our customers of any possible tradeoff.
We know that APPUiO Cloud is a complex system, built out of complex systems, that are prone to failure at any given time. It is not a matter of “if”, but rather a matter of “when.”
We have built observability and management tools immediately from the start of our work on APPUiO Cloud. We have reused the management infrastructure provided by OpenShift, the same one we were using for our private customers, and we have built APPUiO Cloud to be observable at all times.
Using “everything as code” as a basis for our work means that every time we fix an issue on the platform, we have to change a configuration file somewhere. This information is later stored in the Git repo, as part of the project history; not only that, but we also update the required documentation files, both internal and external, so that everyone knows (asynchronously and at their own rhythm) what happened, when, where, and most importantly, why.
And when we say “everything as code”, we mean it:
All of this is described in their corresponding files, and versioned in Git repositories. We use GitLab, and its integrated CI/CD pipelines are configured to automatically build, test, and eventually deploy changes as required. Thanks to Project Syn, all of the feedback they bring back to the system is automatically deployed whenever possible, reducing the amount of human brain work required to keep things running.
Even our documentation is automated: we use the Antora documentation generator tool, which can automatically extract and integrate documentation from various sources into a single website, and we use GitLab pipelines for that as well. With this process, engineers only have to update the documentation sources (using the Asciidoc format, very similar to Markdown) and
git push their changes. They are immediately picked up, built, verified (we have automatic styling and syntax checks built-in in our pipelines), and deployed.
Continuous Learning and Experimentation
APPUiO Cloud is not, and will never be, finished. It is a product that changes continuously, sometimes in small ways, and sometimes in bigger ones.
To give an example of continuous learning at work in APPUiO Cloud, let’s refer to the current console screen, where in June 2022 you could see a red banner on top with the following text:
Issue with CPU requests resolved. The resolution includes a slight change to the pricing model.
This, as you can imagine, is the result of a learning process. We realized that, in our preparation, we had not designed our CPU request pricing properly. As a result, as soon as the first users started using the platform this year, we realized that some of them were consuming disproportionate amounts of CPU; this was a huge problem, since they were not aware of that, and we would have to cover for those extra costs at first.
We modified the policies in the clusters, made communication with all of our customers, and updated our documentation as follows:
As of 2022-05-01: The underlying infrastructure has a fixed ratio between memory and CPU resources. CPU requests exceeding that ratio will be counted as well. The requested CPU cores will be multiplied with the platforms’ ratio. This yields an equivalent in MiB. The memory to CPU ratio can be different per zone. See zone listing for the exact values.
This was unexpected and unplanned learning; a local discovery that brought a global improvement in APPUiO Cloud for all users. We did cover some of the costs, but we rectified our policies openly and communicated clearly with our customers. The result? Not only did all of them acknowledge and understand the changes, but we didn’t lose a single customer because of this change. This level of cooperation with our customers is one of the things we’re most proud of.
Another learning: the GitLab Kubernetes agent. Cannot be installed globally, but we figured out how to do it on a namespace basis for those who need it.
As part of the continuous learning process of DevOps, we also have bi-monthly updates by the APPUiO Cloud team on a private Zoom call open to all VSHNeers to join, where we explain what’s going on, what are the upcoming changes to the platform, what new features are planned, and what has happened lately. These calls are recorded for internal use, so that (once again) people can re-watch those internal education sessions whenever they want and how often they want.
This is the toolkit we used to collaborate during the creation of APPUiO Cloud.
- Visual Studio Code Live Share
- Eating own dogfood: all of our internal systems run on APPUiO Cloud, with the exception of the APPUiO Cloud status page, of course 🙂
The current team in charge of APPUiO Cloud is called Tarazed.
- 1 project manager/product manager
- 1 product owner
- 3 DevOps engineers core team
This is a rough timeline of key events during the creation of APPUiO Cloud, with links to their corresponding blog posts.
- 2021-02-19: First discussions about product definition for “APPUiO Public 2.0”
- Spring/Summer 2021: Design, development, test
- 2021-06: Benchmarking storage options: Rook, Ceph, and Longhorn (spoiler: Rook won)
- 2021-07-29: “APPUiO Cloud” name chosen, appuio.cloud domain registered
- 2021-08: Project Syn components for APPUiO Cloud in beta
- 2021-08: cloudscale.ch cluster, including Kyverno and Keycloak, ready for internal use
- 2021-09-17: docs.appuio.cloud published
- 2021-09-20: Public announcement
- 2021-10-28: The testing phase started with beta users on the LPG-2 region migrating apps from APPUiO Public
- 2021-12-01: The testing phase ended
- 2021-12-07: Pricing calculator released
- 2021-12-16: New logo
- 2022-02-02: Exoscale Geneva region available
- 2022-02-07: APPUiO Cloud Portal released
- 2022-02-09: Getting started guide published
- 2022-05-03: First 6 months of APPUiO Cloud
- 2022-09-01: APPUiO Public fully decommissioned
Can other organizations use a similar process to create a product? We believe that yes, it is possible. However, there are a few caveats, that we know some companies should have to work on those items first, in order to have a successful DevOps journey.
First of all, writing skills are fundamental. We need DevOps engineers to be writers and to put everything down. Not only as “everything as code” (security, infrastructure, business rules, etc) but also as documentation writers, making sure that both engineers and users are able to refer to a written document that explains the reasons why things happen. Yes, keeping that written documentation is part of the work; it is not a chore, it is not a bonus; it is part of the deliverables, and it must be updated, reviewed, and proofread.
Second of all, Cloud Native technologies have been designed to work faster than ever. Containers, Kubernetes, CI/CD pipelines, Open Source, and all of the ecosystem of Cloud Native technology is the greatest enabler of our world. The technological context constitutes a fantastic “giant’s shoulder” where we can stand, and go faster and better. We definitely could have never done this work without the ecosystem of Open Source Cloud Native technologies available today.
And third of all, trust is paramount. You have to have trust in your teams. We actually think that trust is more important than flat hierarchies; even though these have helped us, without trust there’s no way we could have created APPUiO Cloud in such a short amount of time. Trust allows teams to work independently, moving fast, and without the inherent fear typical of a “blame culture”. And trust is the key ingredient for asynchronous work culture. You cannot really go full async if you do not trust your teams. We stress this point because this factor is the deal breaker for many teams in this country.
These are, we think, the three most important pillars of our DevOps culture: writing, technology, and trust. The ones that have helped us shape APPUiO Cloud into a product that is steadily growing and changing continuously.
Is it easy to work like this in DevOps mode? Of course not, there are lots of things that can go wrong. Is it worth it? Let’s put it this way: after all this time, we have internalized this way of working so much, that we couldn’t do things any other way. We think it is totally worth it, and as a result, we just do things like this.
With APPUiO Cloud, VSHN has demonstrated that we can deliver world-class products in a short amount of time, with a small team of experts, and with fast cycles of feedback and experimentation baked into the process.