Tech

A (Very!) Quick Comparison of Kubernetes Serverless Frameworks

17. May 2019

2022–03–09: There’s a newer version of this article!

(This blog post is the transcription of the presentation given at the CNC Switzerland meetup, May 9th 2019. Slides are available at the end of this page and impressions of the meetup can be found here.)
Serverless is one of those hot topics that, as many others in our industry, looks a bit like a good old idea recycled and brought back to fashion. Yet Serverless (or “Function as a Service”) looks like a natural evolution to a movement that started more than a decade ago, when Heroku and Google App Engine came under the spotlight.
As says Martin Fowler himself, Serverless and FaaS can be defined as follows:

FaaS is about running backend code without managing your own server systems or your own long-lived server applications.

At VSHN we have been interested in the subject for years now, and even worked in a FaaS project for a customer two years ago.
But now that Kubernetes and Knative are emerging as the true “Golden Standards” of hosted cloud operations, we see the emergence of FaaS solutions running directly on top of them. And developers have been fast creating lots of solutions that somehow appear to occupy the same space. How do they compare? Which one to choose?
To answer these questions, we are going to give a brief description of the following frameworks, outlining some of their characteristics and relative strengths:

  1. OpenFaaS
  2. Fn Project
  3. Fission
  4. OpenWhisk
  5. Kubeless
  6. TriggerMesh

The order of the frameworks is not trivial; they are roughly ordered following their “level of abstraction” on top of Docker and Kubernetes. For each of these projects, we are going to provide the following information:

  1. Project details;
  2. Three demos, all of them recorded with asciinema using minikube;
  3. Available triggers;
  4. Supported programming languages.

1. OpenFaaS

OpenFaaS is the project with the most stars (14’000) on Github of all those in this article. It is mostly written in Go, featuring around 100 contributors, and the latest available version at the time of this writing is 0.13.0 (April 4th, 2019).
It is an independent project, funded through donations (Patreon, OpenCollective, and PayPal.)
From a developer experience perspective, it is a project quite complex to setup and use. It is based on Docker, which means that functions are actually packaged as containers, to be pushed to a repository, and to be built using a local Docker installation in the developer workstation. OpenFaaS manages the Dockerfile for the developers automatically, though.
OpenFaas has a “template store” with several available programming languages. It also provides developers with a command-line utility, called faas-cli, itself talking to a REST API documented using Swagger. Finally, there is a Serverless Framework plugin for those who need it.
The following asciicast shows a very simple interaction with OpenFaas. First we create a Python function, which we customize a bit, and then we deploy and call it from the command line; both directly and also using curl:

OpenFaaS functions can be called through the following triggers:

  • HTTP
  • Event Connectors: Kafka, Amazon SNS, Minio, CloudEvents, IFTTT, Redis, RabbitMQ…

Finally, developers can use the following programming languages with OpenFaaS:

  • C#, F# (.NET Core)
  • Go
  • JavaScript (Node.js)
  • Java (JVM)
  • PHP 7
  • Python / Python 3
  • Ruby

2. Fn Project

The Fn Project has been started and is currently funded by Oracle, who uses a fork to power its own Oracle Functions product.
Just like OpenFaaS, it is hosted in Github and written in Go. The project has around 4000 stars and 80 contributors, and the latest version at the time of this writing is 0.3.703 (May 6th, 2019).
From a technical point of view, Fn can use any Docker container as a function, and it can run anywhere: in public, private, and hybrid clouds.
Fn has two major concepts: – Functions (defined in YAML) – Applications: groups of functions, which can be deployed all at once.
For developers, it offers a command-line tool called fn and a Serverless Framework plugin.
Fn functions can be triggered with HTTP calls, and can be developed using the following languages:

  • Go
  • JavaScript (Node.js)
  • Java (JVM)
  • Python
  • Ruby
  • C# (community supported)

The Fn marketing material further states that it “supports all languages”.

3. Fission

Fission is an open source, Kubernetes-native Serverless Framework. It allows functions to be deployed and executed instantly, mapping them to HTTP requests.
Its Github project is mostly written in Go, features 4300 stars and around 80 contributors at the time of this writing. Its latest available version is 1.2.0 (May 3rd, 2019). It was started and is currently maintained by Platform9.
Fission does not need Dockerfiles or Docker registries; it is based on the notion of environments. Functions are injected into those environments, which are a pool of containers with language runtimes, where functions are loaded and launched on demand.
Fission keeps in memory a set of images containing the runtimes where the functions will be run, injecting them and running them immediately when invoked. In this sense, it is similar to how AWS Lambda works.
For developers, it features a command line tool (fission) and a Serverless Framework plugin. They do not need to have a local Docker environment to build their functions into.
The following asciicast shows the basic operations required to create, deploy and call a function:

Currently, Fission supports following types of triggers:

  • HTTP
  • Time
  • Message Queue
  • Kubernetes Watch

Only the following programming languages can be used to create functions in Fission; the project is quite young and the list will probably grow in the future:

  • Go
  • Python
  • JavaScript (Node.js)
  • Java (JVM)

In our tests, using minikube as a support, Fission appears easy to use, but at the same time very fragile (in spite of what its version number might suggest.) Removing and re-creating environments and functions led to many problems, and the project is too young to have more than 5 (unanswered) questions on Stack Overflow. In short, a promising but yet rather immature product.

4. OpenWhisk

OpenWhisk is the behemoth in the room. This open source project was created by IBM and is currently managed by the Apache Foundation. This project is the most “corporate” ones of those described in this blog post. It is written in Scala, it features around 4000 stars in Github and has around 150 contributors. The latest available version at the time of this writing is 0.9.0 (October 31st, 2018.)
This framework has the following features:

  • Very “corporate” in design and functionality;
  • Secure by default;
  • Forked by Adobe and other big corporations;

For developers, it features a command-line tool (wsk) and a Serverless Framework plugin.
OpenWhisk functions can be triggered by the following mechanisms:

  • Message Queues
  • Databases
  • Document Stores
  • Website or Web Apps
  • Service APIs
  • IoT Frameworks…

OpenWhisk function can be created using the following programming languages:

  • C#, F# (.NET Core)
  • JavaScript (Node.js)
  • Swift
  • Java, Scala (JVM)
  • Go
  • Python
  • PHP
  • Ruby
  • Ballerina
  • Through Docker Actions: Rust, Haskell…

The installation on minikube was the most complex and difficult of all the frameworks considered in this document. The tools have changed a lot in the last two years and resources online might be outdated. But in spite of those factors, this framework stands out by the quantity, breadth, and depth of the documentation, as well as by the number of integration and supported languages.

5. Kubeless

Kubeless is a promising framework created and maintained by Bitnami. It is an open source project on Github written in Go, with around 4600 stars and 80 contributors. At the time of this writing, its latest version is 1.0.3 (March 14th, 2019.)
In our tests it was the one offering the best developer experience. Very simple to install and use, it offers a command-line tool (kubeless) that is very similar to the AWS Lambda CLI. This is no coincidence, as the whole aim of the project is to provide an experience very close to that of AWS Lambda, Azure Functions or Google Cloud Functions.
For DevOps teams, Kubeless provides Prometheus monitoring of functions calls and latency, and a Serverless Framework plugin.
The following asciicast shows the basic interaction to create, deploy, and test a function using Kubeless:

Kubeless functions can be triggered through the following mechanisms:

  • HTTP
  • Cronjob
  • PubSub mechanisms
    • Kafka
    • NATS messaging
  • Data stream events
    • AWS Kinesis

These functions can be developed using the following languages and runtimes:

  • Go
  • Python
  • JavaScript (Node.js)
  • Java (JVM)
  • Ruby
  • C#, F# (.NET Core)
  • Ballerina
  • Custom runtimes possible

All frameworks herewith considered, Kubeless offered the most flawless experience of all. The documentation was solid and easy to follow, and there are plenty of online resources to guide developers into building applications using this platform.

6. TriggerMesh

TriggerMesh is the newest entry in the world of FaaS, and it will represent a major shift in the way serverless applications are deployed and executed. Founded ex-Kubeless engineers, it builds upon Kubernetes & Knative, providing features yet unseen in the serverless arena.
TriggerMesh functions can be triggered through cross-cloud triggers from AWS to functions on Knative:

  • Code Commit
  • Cognito
  • DynamoDB
  • Kinesis
  • S3
  • SNS
  • SQS

TriggerMesh has announced the following programming languages as options:

  • Go
  • JavaScript (Node.js)
  • Python
  • Ruby

Together with VSHN, TriggerMesh released the TriggerMesh Operator for OpenShift 4.0. OKD 4.0 was recently announced by Red Hat to bring additional automation to Kubernetes applications. The operator allows OpenShift users to install the TriggerMesh management platform and benefit from its integration with Knative, to power serverless workloads across multiple clouds.. TriggerMesh also allows CI/CD of serverless functions, as well as access to multi-cloud event sources, like Azure and AWS.

Comparison

The following chart summarizes some of the ideas of this article, and has been adapted from the “Examining the FaaS on K8S Market” article on the Cisco Blog.

Local Docker Image Repo Base Image
OpenFaaS Required Required Required
Fn Project Required Required Required
Fission None None Required
OpenWhisk None None None
Kubeless None None None

Popularity

The following tweet provides interesting information about the current state of the FaaS-on-Kubernetes market:

Conclusion

I hope this summary will be useful to you! Having tested all of these options, here at VSHN we will be focusing our efforts in the TriggerMesh platform, which will provide a much more solid developer experience and an unprecedented level of flexibility. We believe that this is the next generation of serverless platforms and we cannot wait to bring its power to our customers.
TriggerMesh will be launching their TriggerMesh Cloud service in the near future which will allow users to host serverless functions and consume events from many cloud sources.  To join the TriggerMesh Early Adopters program and get access for free serverless hosting for a limited time only please visit cloud.triggermesh.io.

Sources

The author used the following articles, documents, and books for inspiration and guidance:

Articles

Projects

Presentations

Books

People

Slides

The slides of the presentation are available in (and can be downloaded from) SpeakerDeck.

Aarno Aukia

Aarno is Co-Founder of VSHN AG and provides technical enthusiasm as a Service as CTO.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

Examples of Supported Kubernetes Operator SDK workflows

3. Apr 2019
This blog post is part of the series How to leverage Kubernetes operators using the Operator SDK framework.
IN PREVIOUS BLOG POSTS WE TALKED ABOUT:
Section 1 – Operators, Operator Framework, and Operators SDK: 

  • Here we discuss in a general setting about Operators, Operator Framework, and Operators SDK.
  • Then we will discuss about the Operators SDK emerging popularity in GitHub, and in general about the “Operator SDK workflow” adopted for generating and handling operators.

Section 2 – Supported Kubernetes Operator SDK workflows

  • Here we discuss about the three available alternative workflows to generate Operators provided by the last versions of Operator SDK APIs.
  • We also discuss pros and cons of using the various operators workflows.

IN THIS BLOG POST WE WILL TALK ABOUT:
Section 3 – Examples of Supported Kubernetes Operator SDK workflows

  • Here we provide examples of the three available alternative workflows to generate Operators provided by the Operator SDK APIs.
  • We specifically focus on Go operators, as they are in our opinion the more stable available APIs.

Section 3 – Examples of Supported Kubernetes Operator SDK workflows

We will refer to:
1) Operator: Go operator 
2) Operator: Ansible operator [coming soon]
3) Operator: Helm operator [coming soon]
 
For each of these SDK supported workflows, we provide:
a) a description of the generated Operator structure (there is one for each specific workflow);
b) a link to our example(s) of operator(s) based on such Operator structure (there is one for each specific workflow) and logic.
c) descriptions on how to add (e.g., 3rd Party) Resources, different from the Core Kubernetes resource types, to your Operator 
d) a description of the main pros and cons of using suchOperator structures (there is one for each specific workflow) and logic.

Back to overview

Back to overview How to leverage Kubernetes operators using the Operator SDK framework.

simon.beck

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

Supported Kubernetes Operator SDK workflows

This blog post is part of the series How to leverage Kubernetes operators using the Operator SDK framework.
IN A PREVIOUS BLOG POST WE TALKED ABOUT:
 Section 1 – Operators, Operator Framework, and Operators SDK, and in particular:

  • in a general setting about Operators, Operator Framework, and Operators SDK.
  • the Operators SDK emerging popularity on GitHub, and in general about the “Operator SDK workflow” adopted for generating and handling operators.

IN THIS BLOG POST WE WILL TALK ABOUT:
Section 2 – Supported Kubernetes Operator SDK workflows

  • Here we discuss about the three available alternative workflows to generate Operators provided by the last versions of Operator SDK APIs.
  • We also discuss the pros and cons of using the various operators’ workflows.

Section 2 – Supported Operator SDK workflows

As discussed before, on GitHub, the Operator SDK is a very active project, with over 10 releases produced in less than a year. This means that the Operator SDK is a toolkit that is continuing evolving over the time (e.g., its code,  structure, and logic is changing). In particular, as reported in the main Github page of the operator SDK, the libraries and tools are labeled with “Project Status: pre-alpha”, and thus are “expected breaking changes to the API in the upcoming releases”.
The project started in April 2018 and we started monitoring it intensively from September 2018. We found out that the SDK provides three different workflows to develop operators based on GoAnsible, or Helm.
These versions of the Operators SDK emerged between 2018 and 2019. Specifically, the first version of the operator was based on Go, and only from December 2018 it was provided a version based on Ansible.
Finally, in the beginning of 2019 (January), the operator workflow based on Helm was also released.
Thus, the SDK provides a workflow to develop operators based on GoAnsible, or Helm.
The following workflow is for a new Go operator:

      1. Create a new operator project using the SDK Command Line Interface (CLI)
      2. Define new resource APIs by adding Custom Resource Definitions (CRD)
      3. Define Controllers to watch and reconcile resources
      4. Write the reconciling logic for your Controller using the SDK and controller-runtime APIs
      5. Use the SDK CLI to build and generate the operator deployment manifests

The following workflow is for a new Ansible operator:

      1. Create a new operator project using the SDK Command Line Interface (CLI)
      2. Write the reconciling logic for your object using Ansible playbooks and roles
      3. Use the SDK CLI to build and generate the operator deployment manifests
      4. Optionally add additional CRD’s using the SDK CLI and repeat steps 2 and 3

The following workflow is for a new Helm operator:

      1. Create a new operator project using the SDK Command Line Interface (CLI)
      2. Create a new (or add your existing) Helm chart for use by the operator’s reconciling logic
      3. Use the SDK CLI to build and generate the operator deployment manifests
      4. Optionally add additional CRD’s using the SDK CLI and repeat steps 2 and 3

Guidelines:
Command Line Interface: To learn more about the SDK CLI, see the SDK CLI Reference, or run operator-sdk [command] -h.
For a guide on Reconcilers, Clients, and interacting  with resource Events, see the Client API doc.
As it is possible to see from the following figure, there is not much difference among the various operator workflows.
However, the workflow that has reached more maturity and gives more control over the operator behavior is the one based on Go: 

Next article

Section 3 – Examples of Supported Operator SDK workflows

Back to overview

Back to overview How to leverage Kubernetes operators using the Operator SDK framework.

simon.beck

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

Introduction to Kubernetes Operators, Operator Framework, and Operators SDK

1. Mar 2019

Introduction to Kubernetes Operators, Operator Framework, and Operators SDK

This blog post is part of the series How to leverage Kubernetes operators using the Operator SDK framework.
Section 1 – Kubernetes Operators, Operator Framework, and Operators SDK: 

  • Here we discuss in a general setting about Operators, Operator Framework, and Operators SDK.
  • Then we will discuss about the Operators SDK emerging popularity in GitHub, and in general about the “Operator SDK workflow” adopted for generating and handling operators.


Section 1 – Kubernetes Operators, Operator Framework, and Operators SDK

a) Operators are Kubernetes applications
A Kubernetes application is an application that is both deployed on Kubernetes and managed using the Kubernetes APIs and kubectl tooling. To be able to make the most of Kubernetes, you need a set of cohesive APIs to extend in order to service and manage your applications that run on Kubernetes. You can think of Operators as the “runtime that manages this type of application on Kubernetes“.
Thus, an Operator is a method of packaging, deploying and managing a Kubernetes application. Conceptually, an Operator takes human operational knowledge and encodes it into software that is more easily packaged and shared with consumers. We can think of an Operator as an extension of the software vendor’s engineering team that watches over your Kubernetes environment and uses its current state to make decisions in milliseconds. Operators follow a [maturity model] from basic to having specific logic for an application.
We’ve seen in the last years that Operators’ capabilities differ in sophistication depending on how much intelligence has been added into the implementation logic of the Operator itself. We’ve also learned that the creation of an Operator typically starts by automating an application’s installation and self-service provisioning capabilities, and then evolves to take on more complex automation, this depending on the specific use case. As result, advanced operators are, nowadays, designed to handle upgrades seamlessly, react to failures automatically, and not take shortcuts, like skipping a software backup process to save time.
b) Operator Framework
Operators are Kubernetes native applications that facilitate the management of complex stateful applications on top of Kubernetes, however writing such operators can be very difficult because of challenges such as using (i) low level APIs and (ii) a lack of modularity which leads to duplication, inconsistencies, and unexpected behaviors.
To address such issue, several tools are now being launched (e.g., the Operator FrameworkKooperMetacontroller, etc.) as results of years of work and experience of the Red Hat, Kubernetes, and CoreOS open source communities in building Operators. Specifically, Red Hat and the Kubernetes open source community shared the Operator Framework — an open source toolkit designed to manage  operators in a more effective, automated, and scalable way.
The Operator Framework is an open source toolkit composed by several low-level APIs. We believe that the new Operator Framework represents the next big step for Kubernetes by using a baseline of leading practices to help lower the application development barrier on Kubernetes. The project delivers a software development kit (SDK) and the ability to manage app installs and updates by using the lifecycle management mechanism, while enabling administrators to exercise operator capabilities on any Kubernetes cluster.
The Operator Framework includes:

    • Operator SDK: Enables developers to build Operators based on their expertise without requiring knowledge of Kubernetes API complexities.
    • Operator Lifecycle Management: Oversees installation, updates, and management of the lifecycle of all of the Operators (and their associated services) running across a Kubernetes cluster. Once built, Operators need to be deployed on a Kubernetes cluster. The Operator Lifecycle Manager is the backplane that facilitates management of operators on a Kubernetes cluster. With it, administrators can control what Operators are available in what namespaces and who can interact with running Operators. They can also manage the overall lifecycle of Operators and their resources, such as triggering updates to both an Operator and its resources.
    • Operator Metering (joining in the coming months): Enables usage reporting for Operators that provide specialized services. In a future version, the Operator Framework will also include the ability to meter application usage – a Kubernetes first, which provides extensions for central IT teams to budget and for software vendors providing commercial software. Operator Metering is designed to tie into the cluster’s CPU and memory reporting, as well as calculate IaaS cost and customized metrics like licensing.

Simple, stateless applications can leverage the Lifecycle Management features of the Operator Framework without writing any code by using a generic Operator (for example, the Helm Operator). However, complex and stateful applications are where an Operator can shine. The cloud-like capabilities that are encoded into the Operator code can provide an advanced user experience, automating such features as updates, backups and scaling.
In the next subsection we discuss about the Operators SDK emerging popularity in GitHub, and in general about the “Operator SDK workflow” adopted for generating and handling operators.
c) Operators SDK popularity
The Operator-SDK is a toolkit,  recently built on top of the Operator Framework, that provides the tools to build, test and package Operators. Initially, the SDK facilitated the marriage of an application’s business logic (for example, how to scale, upgrade, or backup) with the Kubernetes API to execute those operations. However, over time, the SDK is evolving to allow engineers to make applications smarter and have the user experience of cloud services. As consequence, leading practices and code patterns that are shared across Operators are included in the SDK to help prevent reinventing the wheel.
From a developer perspective, the entry point is the Operator SDK, originating from CoreOS, which is offered as part of the Operator Framework that is, according to its self-description, “an open source toolkit to manage Kubernetes native applications, called Operators, in an effective, automated, and scalable way”. The SDK specifically targets Go developers and applications, and even if support for other programming languages (e.g., Java, C, etc.) is currently lacking,  future plans for their integration are already in place.
In GitHub, the Operator SDK is becoming a very active project, which already gained a high visibility/popularity with:

However, even if the project is becoming with the time more popular, its project Status is still “pre-alpha, which means that “are expected breaking changes to the API in the upcoming releases“.
Thus, the Operator SDK toolkit requires still a bit more of maturity to be used in wider practical working scenarios. As researchers, we believe that this software development kit (SDK) will be widely adopted in future, as it will support the developers during the management of app installs and updates by using the lifecycle management mechanism, while enabling administrators to exercise operator capabilities on any Kubernetes cluster (see the following Figure, it highlights the overall view of envisioned Operator SDK support).

As follow we talk about the Operators SDK General Workflow.
d) Operators SDK General Workflow
The Operator-SDK is a toolkit that provides the tools to build, test and packageOperators, as shown in the following Figure.

Specifically, the following specific workflow is provided by the toolkit for supporting the writingbuilding, testing and packaging of a new Go operator:

      1. Create a new operator project using the SDK Command Line Interface (CLI)
      2. Define new resource APIs by adding Custom Resource Definitions (CRD)
      3. Define Controllers to watch and reconcile resources
      4. Write the reconciling logic for your Controller using the SDK and controller-runtime APIs
      5. Use the SDK CLI to build and generate the operator deployment manifests

In this context, the Operator SDK uses for its workflow the controller-runtime library, which makes the writing of operators easier by providing:

      • High level APIs and abstractions to write the operational logic more intuitively.
      • Tools for scaffolding and code generation to bootstrap a new project fast.
      • Extensions to cover common operator use cases.

A simple example to create and deploy a simple operator with the SDK toolkit is provided  in the official operator SDK GitHub repository:
https://github.com/operator-framework/operator-sdk
The resulting automatically generated GO operator will present the following reference Structure:

File/Folders
Purpose
cmd
Contains manager/main.go which is the main program of the operator. This instantiates a new manager which registers all custom resource definitions under pkg/apis/... and starts all controllers under pkg/controllers/... .
pkg/apis
Contains the directory tree that defines the APIs of the Custom Resource Definitions (CRD).
pkg/controller
This pkg contains the controller implementations.
build
Contains the Dockerfile and build scripts used to build the operator.
deploy
Contains various YAML manifests for registering CRDs, setting up RBAC, and deploying the operator as a Deployment.
Role-based access control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within an enterprise ]
Gopkg.toml Gopkg.lock
The Go Dep manifests that describe the external dependencies of this operator.
vendor
The golang vendor folder that contains the local copies of the external dependencies that satisfy the imports of this project. Go Dep manages the vendor directly.

In the next blog post we will talk about the Operators SDK current status, e.g., available versions and workflows.

Next article

Section 2 – Supported Kubernetes Operator SDK workflows

Back to overview

Back to overview How to leverage Kubernetes operators using the Operator SDK framework.

simon.beck

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

How to leverage Kubernetes operators using the Operator SDK framework

How to leverage Kubernetes operators using the Operator SDK framework

Kubernetes has become an omnipresent platform to host cloud-native applications. As a rather low-level platform, it is often made developer-friendly by wrapping it into higher-level platforms, such as OpenShift (OKD), and by turning it into a managed service platform, such as APPUiO, which can be deployed to any cloud infrastructure. Application engineers interact with Kubernetes mostly by authoring appropriate deployment descriptors and by pushing their code which triggers deployments. Due to ongoing feature additions, not so much is known about useful combinations of annotations on Kubernetes deployments (and other declaratively described objects), Kubernetes operators (a kind of hooks) and custom resources definitions.
In this blog post series, we share some of the experience we have gained while researching how to trigger actions upon certain updates to the descriptors, as a precursor to dynamic and autonomous feedback loops which can self-manage application deployments.
In particular, we provide access to the adapted original examples of operators generated with the Operator SDK toolkit, which deal with Kubernetes resources by combining annotations on Kubernetes deployments and Kubernetes operators concepts. The link to our operators examples are available on Github: https://github.com/appuio/operator-sdk-examples. In further blog posts we will describe some, discussing also how they could be extended for more advanced decision making. In particular, adapting the (Go) operators to work on different environments require to modify some important go files (e.g., pkg/controller/memcached/memcached_controller.go as shown in the following Figure).

IN FURTHER BLOG POSTS WE WILL TALK ABOUT:

Section 1 – Kubernetes Operators, Operator Framework, and Operators SDK
  • Here we discuss in a general setting about Operators, Operator Framework, and Operators SDK.
  • Then we will discuss about the Operators SDK emerging popularity in GitHub, and in general about the “Operator SDK workflow” adopted for generating and handling operators.
Section 2 – Supported Kubernetes Operator SDK workflows
  • Here we discuss about the three available alternative workflows to generate Operators provided by the last versions of Operator SDK APIs.
  • We also discuss pros and cons of using the various operators workflows.
Section 3 – Examples of Supported Kubernetes Operator SDK workflows
  • Here we provide examples about the three available alternative workflows to generate Operators provided by the Operator SDK APIs.
  • We specifically focus on Go operators, as they are in our opinion the more stable available APIs.
Section 4 – Example(s) of Operator(s) Monitoring the Service with the usage of Prometheus (coming soon)
  • Here we provide an example of an operator that communicates with Prometheus (currently used to monitor Kubernetes Clusters) for more advanced decision making (e.g., advanced monitoring of the service).

About the authors

These blog posts have been written by Dr. Josef Spillner and Dr. Sebastiano Panichella from ZHAW (Zurich University of Applied Sciences) School of Engineering. Thank you very much Josef and Sebastiano for sharing your know how with our readers!

simon.beck

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
General Tech

DevSecOps: security for development and IT operations

10. Dec 2018

What is DevSecOps and why should I care?

DevSecOps (development, security, operations, sometimes also called SecDevOps) integrates the topic of application security into the DevOps process. Hence agile software development meets the current challenges of cyber security. By automating and creating a security-as-code culture, collaboration between teams shall remain remain flexible while security will be continuously improved.

What is DevOps?

Before we try to understand the term DevSecOps, we need to understand “DevOps.” What does this widespread term mean? It is almost as vague as “cloud”. Every modern business needs it, but is it something you can simply order and get delivered? We understand DevOps as the interdisciplinary collaboration between developers and software operators that allows a rapid and systematic development and delivery of applications. Our understanding of DevOps is explained in detail in “What is DevOps – what does VSHN do?”

Origin and the meaning of DevSecOps

Just as in the traditional separation of Devs and Ops, security has traditionally been the task of a detached team or individuals. Security concerns were thus considered outsourced and rather down the line in development. Security as a silo, so to speak. Security specialists are good in detecting security holes, but within a traditional environment they rarely understand how modern software development teams – an agile DevOps organization – work together.
In order to fully exploit the agility and responsiveness of DevOps while increasing application security, security has to be an integral part of the lifecycle and must be included from the beginning.
To underline the ever-increasing importance of cybersecurity, the term DevSecOps has been formed:

DevSecOps means that everyone involved in the software development process is responsible for security and continuously improves and automates and integrates it into the development process right from the beginning.

Incorporate security into your DevOps workflows right from the beginning

What sounds like a matter of course, was (and is) not always the case. The classic developer is more concerned about the functionality than about the security of an application. In addition, new technologies such as container platforms (e.g. Docker) and microservices are, despite the many benefits such as the continuous delivery of code, leading to new problems and security concerns, as ever-shorter release cycles can no longer withstand manual testing.
DevSecOps should lead to a rethinking by integrating IT security and security features wherever possible into the automation workflows. The integration of existing security teams and employees and an associated cultural change is just as important as the selection of the right security tools.
With the DevSecOps approach, security should be integrated right from the start and should not be added later or considered after the development is completed. Development, IT operations and security teams need to be made aware of information security and pull together. Transparency, continuous feedback, and mutual insights are just as important as sharing known threats and vulnerabilities. For developers, this often requires rethinking because these processes were not always part of application development.

DevSecOps automation = automation of security

A successful adoption of DevSecOps principles requires the automation of repetitive tasks and checks, as manual security checks take a lot of time and are more prone to errors.
Technologies that facilitate DevSecOps include containers and microservices: DevOps security practices need to be customized as they are not suitable for static or manual testing. Information security must be integrated throughout the whole application cycle and has to be continuously improved. Modern agile teams already use automated validation and test points within the DevOps pipelines to increase application and code security while enabling fast release cycles. If the tests and checks can not be integrated into the CI/CD pipelines, the development process is likely to bypass the security audit, which in turn can lead to security vulnerabilities.
DevSecOps makes security an integral part of the entire development process. DevOps teams must incorporate security from the beginning and automate it as much as possible so they can to continuously test and protect all data, microservices, container, and CI/CD processes. Integrated testing should provide the team with an overview in real time and vulnerabilities and bugs can be quickly identified and closed.

Conclusion: security is more important today than ever

Almost daily reports about cyber attacks, security holes, data losses and lax security standards of large corporations remind us again and again how important security is today. Security should be a standard repertoire in DevOps teams, and with today’s approaches and tools, the overhead is usually manageable.
Due to the short development cycles nowadays, it is possible to test earlier and thus also recognize problems earlier. The integration of application security therefore also means using security and testing tools from the early development process and not just in the live operation of the application.

Is DevSecOps worth it?

Of course, the integration of security into the DevOps process means more effort (than not to do it), but in the long run, the investment pays off. Agility and security can not only be combined, they even can benefit from each other, if the team lives transparency, openness and the sharing of know-how. And at least since the negative headlines from the recent past, everyone should be clear about just how important security is.

SIGS DevSecOps Forum

Aarno, our CTO, held a talk about Continuous (Security) improvement in the DevOps process on the SIGS DevSecOps Forum on December 4th 2018 at Mobiliar in Bern.

You can find the slides of Aarno’s talk here:

Continuous security improvements in the DevOps process from Aarno Aukia

Related links

In agile software development, there is also the term “shift to the left”, which means moving the validation to earlier stages of development (see DevSecOps.org).
Or security is treated as a customer feature rather than adding non-functional requirements to the product backlog (Michele Chubirka aka “Mrs. Y” on postmodernsecurity.com).

What do you think about DevSecOps?

What does DevSecOps mean to you? Is it already the new standard or just another step on the way to GitOps? We would be very happy to receive your feedback on the topic, via @vshn_ch, mail or the contact form below.

Markus Speth

Marketing, Communications, People

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Events Tech

Open Source Monitoring Conference 2018

14. Nov 2018

Our Marco Fretz has visited the OSMC (Open Source Monitoring Conference) 2018 in Nuremberg and reports his impressions below.

The OSMC 2018

The four-day OSMC with around 300 visitors is all about monitoring tools, concepts and the automation of the monitoring system. Of course, Icinga2 is very well represented – in the talks, among the participants and organizers. The conference is organized by NETWAYS, the company behind Icinga2, and takes place annually at the Holiday Inn Hotel in Nuremberg.

On the first day there were workshops on various topics and on the fourth day a hackathon, with projects that were determined spontaneously. The English and German talks took place on the second and third day and were divided into three tracks. Thanks to the agenda (online, in front of the rooms and at the back of the badge), it was usually easy to decide on the most exciting track. Talks that you missed can be watched later online as video.

I only visited days two and three again (with the talks). With around EUR 1500.- for two days conference, three evening events and three nights in the hotel, the conference is also very attractively priced.

In any case, the OSMC is also known for the rich and excellent catering (all-you-can-eat) from breakfast to the evening events and the smooth organization – the hotel and conference check-in are completed in under a minute. This has come true again this year.

More impressions: #OSMC

Highlights

For me, the most important thing was to meet new and familiar faces and the related exchange about their own and other monitoring landscapes including their problems and solutions as well as, of course, the direct line to NETWAYS / Icinga. Thanks to the talk from @ekeih (scaling Icinga2 …), some people quickly found themselves running Icinga2 in a setup similar to ours.

Prometheus

Exciting talks about Prometheus or concepts that rely on Prometheus made me feel that Prometheus is becoming more widespread, not only in the “cloud native” world but, for example also for HTTP SLA monitoring (MORITZ TANZER, nic.at), network monitoring (MATTHIAS GALLINGER, ConSol), etc.

Our current plans at VSHN for the integration of Prometheus into our monitoring environment were thus confirmed.

IcingaDB

A big bottleneck in Icinga2 and Icingaweb2 is the IDO database (MySQL / Postgres), whose schema dates back to the Nagios and Icinga1 era and has been steadily worsening over time. At that time, a relational database for actually volatile status information such as service and host states, etc. seemed to make sense. In larger setups, however, the writes from Icinga2 to the DB are the bottleneck. Also, the query performance of Icingaweb2 suffers greatly in certain configurations for larger setups.

A lot of details are not yet known, however, Redis is used for the volatile status information and a SQL DB for the historical data. A first version already runs on a trial basis at Icinga and was presented in a live demo. I think it’s great that you can probably use the IDO DB and the IcingaDB module in parallel (transitionally), the same applies to the Icingaweb2 monitoring module – this will greatly simplify migration.

To take away

Try it out…

OpenAPM

OpenAPM is not a tool in itself but shows in a simple way which tools can be combined with each other to build an application performance management / monitoring landscape. Just try it here: https://openapm.io/landscape

Maps

Certainly exciting are the maps for Icinga2. You can give each host or service object the geolocation via custom variable, then they are automatically displayed on the map and grouped according to the zoom factor: https://github.com/nbuchwitz/icingaweb2-module-map

Rancher Monitoring

From the talk of @ClaudioKuenzler a plugin for easy monitoring of Rancher2 and Kubernetes: https://github.com/Napsty/check_rancher2

Conclusion

I have taken great ideas with me, met new people, ate a lot (and yes, the gin was good too (big grin)). A great conference that’s really worth it. I like to go again.

.

Marco Fretz

Marco is one of VSHN's General Managers and Chief Operating Officer.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
General Tech

Release your applications faster and easier – move them to the cloud

27. Sep 2018

No matter where you work in the technical space of software development, you are probably in a hurry to launch the application that you are currently developing.

Your customers can’t wait – accelerate development

Time is money, and when you are on the verge of launching your next killer application time is a scarce resource. Your customers won’t wait forever, and you need to launch your applications as quickly as possible if you want to succeed in the fast-paced tech market.
You might also be pushed internally to keep time to market as fast as possible.
No time to build up internal resources, educate colleagues, gather know-how. You need to be faster than the competition. IT teams have to do more and more nowadays so the pressure is on to manage them smartly.

Common Problems With Launching Applications

The road to your application launch date can be a crooked one, with a lot of turns and even dead ends. This is why it is so important to quickly test each small improvement in one or more test environments before deploying it to the end-user visible production environment. The earlier an error in the application or environment is found the quicker and less effort you need to fix it.
One of the most costly and time-consuming errors are differences between the testing and production environments. They result in the application behaving correctly during testing, but then suddenly misbehaving in production even though the same application version was deployed, and you don’t know what the differences are that lead to the problems. These problems can lead to error messages or catastrophic failures on public facing apps and long-term damage to your reputation.
Another security-focused problem is to keep testing and production environments separate from each other to prevent faulty applications under test to ruin real customer data.
Instead of synchronizing dev/test/production environments manually and duplicate this labor for each step of the process you can automate your work using best practice cloud-based tools. 

Why Should You Move Your Applications To The Cloud?

There may be many different reasons why you may be reluctant to move your application to the cloud. You may be used to the classical way of building and testing applications or fear the migration effort, or you might not have the internal know-how or resources.
However, a well-prepared DevOps team leveraging the potential of the cloud can rapidly manage the deployment of applications without a substantial increase in either manpower or long-term costs. It can make it significantly easier to scale the application later in the process and provide access to helpful auxiliary services.
Unifying the different environments using open-source Docker software container technology helps you leverage the world-wide ecosystem and experience. The more parts of the process that you can automate and integrate, the more efficiently you can build and launch your application.

Whitepaper ‘5 Steps Of Moving Your Applications Successfully To The Cloud’

Migrating to the cloud is a big business decision, so it’s vital to go in with both eyes open. Download our 5 Steps Of Moving Your Applications Successfully To The Cloud whitepaper and find out how moving to cloud can help you meet your software and application development goals.

Markus Speth

Marketing, Communications, People

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Internal Tech

Wir haben Geburtstag!

16. Sep 2018

Aarno Aukia

Aarno is Co-Founder of VSHN AG and provides technical enthusiasm as a Service as CTO.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
General Tech

How Docker and container technology can help your DevOps organization

15. Sep 2018

DevOps needs three things: people with the right attitude, shared processes and the right tools. Docker software containers helps solve these challenges and offers a standardized platform for development and operation.

Container from the perspective of the developer

From a web agency perspective, each project places different demands on the target system, such as different versions of programming languages ​​and frameworks. These  combinations must be thoroughly tested during development through Continuous Integration (CI), which is time-consuming and prone to error with traditional systems.
Container virtualization, for example with Docker, helps. Docker uses so-called images, compilations of software to launch individual instances of an app, called containers. Unlike traditional virtual machines, these images do not include an operating system and are therefore lighter and faster. Ideal for continuous integration.
From the perspective of the software developer, it is easy to configure the pipelines with Docker, for example within GitLab CI. The image is specified and the runner takes care of everything else. The application is thus tested encapsulated and requires no additional software on the server.

Container from the perspective of the operator

Docker containers are a standardized and efficient way to package software with everything it needs to run. On the one hand, this helps to minimize external dependencies at runtime, so that the versions of PHP, Java etc. used in the correct version with all required modules, extensions and plugins are not yet to be managed separately on the server.
On the other hand, a change of the application code is exactly the same as a change of the application server: a new version of the container image is automatically built and deployed in the test environment, then the same checked image can be rolled out in the production environment.

The Advantages of Standardizing the Software Containers are Analogous to the Containers in Logistics

  • Standardization makes containers more efficient: Just as a container ship transports 21’000 different containers with the same crew, a PaaS-provider can operate hundreds of thousands of containers on various customer infrastructures and cloud providers.
  • Containers standardize the handling of contents: in logistics, the pick-up points in the corners are exactly the same whether the contents are liquid, solid or gaseous. In software, entrypoint, list port and storage volumes are defined exactly the same, no matter whether PHP, Java or .NET Core should be executed.
  • Container technology is portable, so it works on all infrastructures and vendors just like any other means of transport.

The solution of software logistics is therefore called container orchestration and the most well-known implementation thereof is Kubernetes. It standardizes and automates software operations such as deployment / update, scaling, load-balancing, service discovery, storage volume management, monitoring, backup, distribution of containers to multiple servers and isolation of multiple applications, test environments, teams and / or customers.

So what does that mean for you – should you containerize your application?

There may be many different reasons why you may be reluctant to use Docker or container technology in general. You may be used to the classical way of building and testing applications using traditional VM technology or fear the migration effort, or you might not have the internal know-how or resources. Or you might have a legacy application which isn’t easily transferable or movable to the cloud.
So should you jump on this bandwagon – or to stay with the same terminology – ship?
A well-prepared DevOps team leveraging the potential of container technology can rapidly manage the deployment of applications without a substantial increase in either manpower or long-term costs. Moving your application to the cloud will be way easier using container technology like Docker, Kubernetes or OpenShift.
It can also make it significantly easier to scale the application, provide access to helpful auxiliary services and just in general make it more future-proof. So if you are coming from a legacy application, it probably will pay off in the future to invest resources now to make your application container ready.
Unifying the different environments using open-source Docker software container technology helps you leverage the world-wide ecosystem and experience. The more parts of the process that you can automate and integrate, the more efficiently you can build and launch your application.

Whitepaper ‘5 Steps Of Moving Your Applications Successfully To The Cloud’

If you want to learn more about migrating your applications to the cloud, download our free whitepaper ‘5 Steps Of Moving Your Applications Successfully To The Cloud and find out how moving to the cloud can help you meet your software and application development goals.

Markus Speth

Marketing, Communications, People

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Kubernetes Tech

What is a Kubernetes distribution and what are the differences between Kubernetes and OpenShift?

30. Aug 2018

Update: here’s the Kubernetes distributions 2026 overview.

At VSHN and APPUiO.ch we rely on OpenShift as Kubernetes distribution. What a Kubernetes distribution is, why we use it and where the differences to ‘plain’ Kubernetes will be explained in this blog post.

What is Kubernetes?

The official description of Kubernetes is:

Kubernetes is a portable, extensible open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.

The most important part of this description is the fact that Kubernetes is a platform and not a ready off-the-shelf product. This is an important piece of information for understanding this article.

What is a Kubernetes distribution?

To understand the differences between Kubernetes and OpenShift, first of all we have to clarify the term ‘Kubernetes distribution’: if Kubernetes is installed directly from the open source Kubernetes project, you ‘only’ get the core components (API server, controller manager, scheduler, Kubelet, kube-proxy). In order for Kubernetes to be really usable, you need a lot of other components like etcd, ingress controller, logging server, metrics collector (for example Prometheus), software defined network (SDN) and many more. This is very similar to Linux: the Linux kernel alone does not help much, you need a whole Linux distribution which provides a shell, package management, boot process and much more.

OpenShift is a Kubernetes distribution and makes Kubernetes a product

A ‘minimum viable Kubernetes distribution’ requires the following additional components and tools for productive operation:

  • Installation and upgrade mechanism: for an automated installation of all involved components.
  • SDN (software defined network): pods must be able to communicate with each other no matter where they are running. The SDN ensures that.
  • Ingress controller: to allow user access to applications running on the cluster.
  • Authentication: a central user and group database provides the authenticated and authorized access.
  • Security: Kubernetes executes containers via Docker or CRI-O. The security on the host system must be ensured accordingly.
  • Persistent storage: stateful applications such as databases require persistent storage.
  • Monitoring: constant monitoring of all cluster components and applications.
  • Backup: backup of cluster components and persistent data.

Optionally, further components are recommended:

  • Central logging with graphics and searchability
  • Application and cluster metrics including alerting

OpenShift as Kubernetes distribution

Essentially, OpenShift relies 100% on Kubernetes, but as a Kubernetes distribution, it comes with everything needed for a Kubernetes cluster. To name just the most important functions:

  • Operations tools: an official and supported way via Ansible allows the entire lifecycle of OpenShift to be executed. This includes the automated installation, as well as upgrades to newer versions of OpenShift.
  • Router: the OpenShift router (ingress controller) – based on HAProxy – ensures that access to applications within the cluster is made possible via HTTP(S).
  • Multi-tenancy: multi-tenancy is built-in the core on OpenShift projects, RBAC and other concepts to allow the use of the platform by various stakeholders.
  • Authentication: the most different authentication backends are supported, above all LDAP, Active Directory (AD) and others.
  • Metrics: the bundled metrics component collects all available data (RAM, CPU, network) of the applications running on the cluster and visualizes them in the web console.
  • Central logging: all lines logged by the application on stdout are automatically collected by the central logging component and are made available to the user via the web console.
  • Security: the platform is designed for maximum security. For example, security measures in the kernel of Red Hat Enterprise Linux like SELinux ensure that the security of the containers is guaranteed. Further measures such as ‘security context constraints’ (SCC) and the prevention of root containers ensure further security.
  • Builds and pipelines: directly in the cluster integrated build and pipeline capabilities enable a fully integrated CI / CD workflow.
  • Web console: all operations on the cluster are visually displayed to the user of the platform in a web console and allow an easy and quick access to the use of Kubernetes.
  • SDN: the included software defined networking provides connectivity between the pods running on the platform and for an adequate network security with network policies.
  • Container registry: Docker / container images are stored in the bundled registry and used for deployment onto the worker nodes.

All these built-in functionalities can be added to any Kubernetes cluster, but only with a lot of effort. Comparable to building your own Linux distribution, as for example  Linux From Scratch demonstrates. Kubernetes has a similar guide called Kubernetes The Hard Way.

OpenShift as PaaS

The strength of Kubernetes lies in the container orchestration. In addition, OpenShift offers classic Platform-as-a-Service (PaaS) functionalities. One of these is the automatic building and deployment of application code directly from a Git repository. Nevertheless, as a user of the platform and thanks to its great flexibility, you always have the choice of whether you want to use the integrated build functions, or rather build outside the cluster. This can be chosen for each deployment, so both types can be used on one cluster.

OpenShift as upstream to Kubernetes

Many developments in Kubernetes originally came from OpenShift. The best example is RBAC (role based access control). This feature has been part of OpenShift since the first release and has been gradually integrated into Kubernetes. RBAC has been an integral part of Kubernetes since Kubernetes version 1.6. The OpenShift ‘Route’ or the ‘DeploymentConfiguration’ object also played a key role in the current objects ‘Ingress’ and ‘Deployment’ in Kubernetes.
Since OpenShift is 100% based on Kubernetes, all Kubernetes native workloads are also supported, such as the ‘Deployment’ or the ‘Ingress’ object.
If you look more closely at the contributor statistics, you’ll find that Red Hat is one of the top 3 contributor companies, so Red Hat is crucial in the development of Kubernetes. With the purchase of the company CoreOS, Red Hat has acquired formidable Kubernetes know-how. The merger of OpenShift and Tectonic will be the next milestone of the Kubernetes distribution OpenShift.

Alternatives to OpenShift

OpenShift is not the only Kubernetes distribution on the market. A quick comparison shows the differences:

  • Cloud vendor Kubernetes: the big clouds offer their own Kubernetes distributions as a service. These are tailored to the respective cloud and are maintained by the providers. Installation on your own private cloud or on other public clouds is not possible.
  • Rancher: since version 2.0, Rancher focuses 100% on Kubernetes and offers a multi-cluster management function as a major strength. With Rancher, Kubernetes clusters in the cloud (for example, on Amazon or Google) can be managed centrally, as well as Kubernetes clusters with the ‘Rancher Kubernetes Engine’ on your own VMs. With the web interface, setting up a new cluster is very easy and application deployments using Helm are also directly available.
  • Tectonic: this distribution lies great importance on cloud-native automation. Through Red Hat’s acquisition of CoreOS, Tectonic will be merged with OpenShift and many of its features will inserted into OpenShift.
  • Canonical / Ubuntu Kubernetes: platform based on Ubuntu, which uses Juju as installation tool. In partnership with Google and Rancher, a hybrid cloud solution will be offered in the future.
  • SUSE CaaS platform: a very new platform based on SUSE MicroOS. Salt is used to ensure configuration management. Under the following link you can participate in the beta program: SUSE CaaS Platform Beta.

Further Kubernetes distributions include:

One very important aspect to consider is the cloud and / or vendor lock-in. Many of the Kubernetes distributions have their own characteristics, which may not be compatible with each other. Using the example of ‘cloud vendor’ distributions: these can only be used in the corresponding cloud. However, if you want to pursue a hybrid cloud approach, this is not possible due to the lock-in. In contrary, a self-installable distribution like OpenShift makes this option possible.
Pure open source distributions without manufacturer support are not recommended for productive environments, as this is of great advantage for a complex platform like Kubernetes.

APPUiO – Swiss Container Platform

The attentive reader may have noticed that there are some discrepancies between the ‘minimum viable Kubernetes distribution’ and OpenShift. This is exactly where APPUiO comes in: we refine OpenShift into a comprehensive, production-ready Kubernetes distribution by offering managed services. We automatically monitor and secure the cluster status, take care of regular updates, fix bugs, provide persistent storage and help with our know-how to make the most out of the platform.

More information about Kubernetes and OpenShift

At the Cloud Native Meetup on August 28, 2018 in Zurich, we also talked about Kubernetes distributions: you can find the slides on Speaker Deck. You can also find more about OpenShift, Docker and Kubernetes here. Another recommendable blog post on this topic by Tomasz Cholewa: 10 most important differences between OpenShift and Kubernetes (English, technical).

How can we help?

Through our experience in operating OpenShift clusters around the world, we offer managed OpenShift clusters on almost any public, private or on-premise cloud. Or are you interested in another Kubernetes distribution than OpenShift? We gladly help you with the evaluation, integration and operation and support with our many years of Kubernetes experience.
Contact us, follow us on Twitter or take a look on our services.
We are looking forward to your feedback! 

Tobias Brunner

Tobias Brunner is working since over 20 years in IT and more than 15 years with Internet technology. New technology has to be tried and written about.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

Docker Overlay Encryption

24. Aug 2018

Docker Swarm with encrypted node-to-node traffic

VSHNeer Elia set up a Docker Swarm Cluster with full traffic encryption inside the cluster (crosspost to his private blog):
I have set up a Docker Swarm cluster on the new Hetzner Cloud. First things first – the Hetzner Cloud is really amazing: Super simple, super cheap and performs as expected. It is not a bloated cloud provider that has 100x services and features that you can use for your servers, this keeps the costs and complexity down – I am really a big fan of it.
To the topic: Because the feature-set is simple, the Hetzner Cloud does not provide private networking (yet!). With only public IP addresses, we need to secure the overlay traffic between our docker containers!
 

The Problem

Per default, Docker Swarm encrypts the traffic between the managers, so we won’t have any issues there. However, this default setting is not set for container-to-container traffic. Any traffic that uses the overlay network is not encrypted by default because most of the time people do have private network setups with a floating IP as access point to the cluster. Docker assumes that the private network is secure and thus can spare some resources for other tasks (Which for example is not the case at DigitalOcean, so I recommend using overlay encryption anyway!).
Now, let’s assume we have the following stack:

version: '3'
services:
  db:
    networks:
      - internal
    image: mysql:5.7
    environment:
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wordpress
      MYSQL_PASSWORD: securepw
  wordpress:
    networks:
      - traefik_public
      - internal
    depends_on:
      - db
    image: wordpress:latest
    environment:
      WORDPRESS_DB_HOST: db:3306
      WORDPRESS_DB_USER: wordpress
      WORDPRESS_DB_PASSWORD: securepw
    deploy:
      labels:
        - traefik.frontend.rule=Host:blog.example.com
        - traefik.docker.network=traefik_public
        - traefik.port=80
networks:
  traefik_public:
    external: true
  internal: 

This is a WordPress Stack that creates the WP site and a MySQL database. Those two networks are defined:

  • internal
  • traefik_public

The interal  overlay network is used for the communication between the WP container and the database. This network is not reachable by the outside. traefik_public  is the network used for the reverse proxy. It is only attached to the WP container as this is the only public facing side of this setup.
The problem here is: Without a secured private network, traffic running through the network internal will go out to another worker (Docker Node), and this fully plain visible. Any password/authentication/<SENSITIVE_DATA> is sent plain text between the docker containers, should they be on two different nodes.
Most of the docker images are not made for public access in their simple utilization and that’s why most keep it as simple as possible, no complicated encryption. You can of course build your own image to enable application side encryption.

The Solution

Docker has a solution for this issue. You can simply enable encryption of the overlay network. Sadly I really didn’t see much discussion about this hence why I thought a blog post about this particular issue might be useful.
The encryption of the network needs to happen during it’s creation, you cannot encrypt a network once it has been setup already. To enable encryption we need to add a flag to the network definition:

networks:
   traefik_public:
     external: true
   internal:
     driver_opts:
       encrypted: ""

The network traefik_public  is of course also encrypted as you don’t want the reverse proxying in plain text.
The option encrypted  will create a IPSec tunnel between all the workers where tasks are scheduled for a stack. This will fully encrypt all the traffic of the overlay network internal and thus allow sensitive data being shared between the database and WordPress.
You can read the official documentation about this here.
 

Final thoughts

The information in regards of encryption is very “hidden” and mostly ignored in my opinion. People want to simply deploy applications with docker without thinking about the infrastructure under it and thus running into the problem like plain text traffic on overlay networks.
I hope to make people more aware of encryption with this blog post.
If you have any questions, please let me know below!
 
 

Elia Ponzio

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

Hosting Drupal on Openshift with the help of Lagoon – by Bastian Widmer (Amazee.io)

3. Jun 2018

Amazee.io set out to make hosting of Drupal websites easier and more flexible. We always strive to stay at the cutting edge of technology. So it does not come to a surprise that we started adopting a container based approach to hosting. We’ll talk about what containers are and why we believe it’s important to go one step further and open source our technology we use to host websites.
 

How Does Hosting Without Containers Look Like?

First and foremost, it’s less flexible. If your project has special needs (i.e. caching with redis or a decoupled frontend with node.js)that your provider does not support, you might end up in a tricky situation.
You might have to either add another provider that supports your technology or you find another provider which hosts your project on one platform.
Adding another provider to the mix ads more complexity and less streamlined support structures if you ever run into issues. amazee.io’s new system, Lagoon, gives developers the flexibility to create their own service architecture, supporting everything from a simple Drupal website to a high traffic website with a decoupled frontend and server side rendering. We do this with through the magic of containers.
 

So, What are containers?

You can think about hosting companies as shipping companies and the servers we have are ships that hold the code from individual projects.
Before the standard shipping container existed it was quite an endeavour to load and unload a ships cargo because everything had custom dimensions – so stacking them in an efficient manner was really hard.As soon as a standard was established – in this case the shipping container – everything got much easier because they were stackable and had all the same form factor.
This allegory shows that the age of custom servers is over. Sure you can run a lot of things on a server but it will not scale past a certain degree. Containerizing your applications makes it much easier to move between platforms because you follow a certain standard.
You can see why hosting companies solve this by only hosting one kind of thing. Drupal sites, for example. But the beauty of containers is that we can host all sorts of projects and still make everything efficient and optimized for our ships. We do this with the help of container technology.
 

Open source

We’re entering the third decade of open source as the Open Source Initiative celebrated its 20th anniversary beginning of February 2018. From the very beginning, amazee.io was committed to  basing our work open source software and technologies as much as possible.  This enabled us to continue the great work started byother projects and bring innovative solutions a step further our solutions a step further.
Every version of our hosting system is more flexible than the last, with more  functionality for our clients. We always saw it a bit troubling that we host open source CMS like Drupal or WordPress on proprietary solutions where you don’t know what happens behind the curtains. ast year we decided to go one step further and we have open sourced our Docker in Production hosting System. It’s there for everyone to look at it, understand it, tinker with it. We welcome feedback, contributions, and high fives at all times.
You can find more about our Project on our Website or if you want to talk about hosting we’re always there for you on our Slack

Aarno Aukia

Aarno is Co-Founder of VSHN AG and provides technical enthusiasm as a Service as CTO.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

Automated build pipelines with GitLab CI and APPUiO

1. Apr 2018

Update: demo projects published

This article is now outdated! We have prepared a new set of demo projects with full source code, public CI/CD pipelines and Kubernetes integration. Please visit https://gitlab.com/appuio and let us know how you like them! Issues and Merge Requests welcome!

Overview

“Deploy early and often” is one of the core principles of agile development. One solution to this problem are automated build and deploy pipelines.
This project uses such a pipeline, based on GitLab CI and OpenShift. A sample PHP application is built and deployed to APPUiO.
Three environments in form of OpenShift projects are used: testqual and production. Those are the most commonly used setups but can be configured and extended. The idea is to build and deploy every commit from the master branch to the test environment automatically. If a git tag is created, the previously built docker image is tagged with the git tag name and automatically deployed to qual. The deployment to production is a manual step and can be invoked on GitLab.

Pipelines

There are two pipelines setup in .gitlab-ci.yml: One is run on every commit to the master branch and consists of the stages lintbuild and deploy-test. The lint stage starts a docker container in which the PHP code is verified for correct syntax. The build stage starts a new OpenShift build (S2I), waits for its completion and shows the log output of the build. The last stage automatically deploys the built image to the test environment.
The second pipeline is started upon creation of a git tag. It consists of the stages releasedeploy-qual and deploy-production. The first stage tags the docker image that was built for the respective git commit with the name of the created git tag. The deploy-qual stage deploys the tagged docker image automatically to the qualenvironment. The stage deploy-production needs to be started manually and deploys to the production environment.

APPUiO

The application and the build is run on APPUiO which allows us to use the concepts and solutions OpenShift provides to work with builds and deployments. The directory os contains the yaml-files which describe all the used OpenShift resources. These files live together with the code in the same git repository, which guarantees that the same version of the code is always built and deployed using same version of the OpenShift resources.

Build

The build consists of an OpenShift BuildConfig which uses a PHP source-to-image builder and builds a docker image containing the application. The image is tagged with the git commit hash and stored on the OpenShift docker registry.

BuildConfig
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: v1
kind: BuildConfig
metadata:
  labels:
    app: ${APPNAME}
  name: ${APPNAME}
spec:
  strategy:
    type: Source
    sourceStrategy:
      from:
        kind: ImageStreamTag
        name: 'php:7.0'
        namespace: openshift
      incremental: true
  source:
    git:
      ref: ${COMMIT_SHA}
      uri: ${REPOSITORY_URL}
    type: Git
  output:
    to:
      kind: ImageStreamTag
      name: '${APPNAME}:${APP_TAG}'

Deployment

DeploymentConfig is used to run the application. With the help of rolling deployments, zero downtime deployments are possible and the end user won’t experience any service disruption during updates. The liveness and readiness probes enable OpenShift to see if the container is running properly (liveness) and able to accept incoming traffic (readiness). Additionally an OpenShift service and route are set up to route traffic from outside the cluster to the application. The URL is specific for each environment and can be opened from within GitLab.

DeploymentConfig
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: v1
kind: DeploymentConfig
metadata:
  name: ${APPNAME}
spec:
  replicas: 1
  selector:
    app: ${APPNAME}
  strategy:
    type: Rolling
  template:
    spec:
      containers:
      - name: ${APPNAME}
        image: "docker-registry.default.svc:5000/${OPENSHIFT_PROJECT}/${APPNAME}:${APP_TAG}"
        imagePullPolicy: Always
        ports:
        - name: web
          containerPort: 8080
        livenessProbe:
          httpGet:
            path: /
            port: web
        readinessProbe:
          tcpSocket:
            port: web
      restartPolicy: Always

Demo

To see the pipeline in action you need three things:

  1. Your own fork of the repository on GitLab with GitLab CI enabled
  2. A project on APPUiO for each stage: test, qual and production
  3. A ServiceAccount with edit membership in all projects

Step by step

  1. Set up Kubernetes/OpenShift integration in GitLab: https://docs.gitlab.com/ce/user/project/clusters/index.html#adding-an-existing-kubernetes-cluster
  2. Configure .gitlab-ci.yaml with your APPUiO projects
  3. Commit & push the changes
  4. Watch the automated pipeline build the application and deploy it to the test project
  5. Create and push a git tag
  6. Watch the automated pipeline create image tags and deploy it to the qual project
  7. Manually run a deployment to the production environemnt by visiting the pipeline on GitLab and click “play” on the deploy:production job

Outlook

To further extend this setup, feature branches could be built and deployed to OpenShift. With this in place every merge request is built and a live version of the application is available to test the introduced changes. This ensures that only buildable changes are merged into the master branch and changes can be easily reviewed.
To learn more about automated builds and deployments with GitLab and APPUiO, read the docs and create a project on APPUiO Public.

Aarno Aukia

Aarno is Co-Founder of VSHN AG and provides technical enthusiasm as a Service as CTO.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

APPUiO auf Microsoft Azure Cloud

15. Mar 2018

In the past few years, Microsoft has transformed itself from a classic Windows and Office provider to a modern cloud provider. The Azure Cloud has become a global hyperscale cloud platform with many interesting features. Red Hat has also recognized this and offers official support from OpenShift to Azure. Mit der Ankündigung von Microsoft die Azure Cloud auch in die Schweiz zu bringen, könnte dies für viele Schweizer Firmen ein interessantes Angebot sein. Als Microsoft Azure Partner für Open Source Software in der Schweiz zeigen wir anhand eines Beispiels wie OpenShift auf Azure installiert werden kann. Kurz gesagt: Funktioniert wunderbar.

Was haben wir gebaut?

Für uns ist die Automatisierung aller Aspekte von Installation, Konfiguration, Monitoring, Updates etc das Wichtigste für einen stabilen und skalierbaren Betrieb einer Infrastruktur. Dies gilt insbesondere für ein komplexes System wie OpenShift in der Cloud. Daher wollen wir zeigen, wie einfach, schnell und automatisiert sich alle notwendigen Komponenten einrichten lassen. Dafür haben wir uns an den Referenz Architektur Guide gehalten.

Desweiteren wollten wir in Erfahrung bringen, welche Azure Services benutzt werden können und wieviele Pods wir ohne Tuning laufen lassen können.

Wie hat es funktioniert?

Der Referenz Architektur Guide orchestriert alle Komponenten mit Ansible. Die Infrastruktur Komponenten auf Azure werden dabei mittels der offiziellen Ansible Integration unter Zuhilfenahme des Azure Resource Manager aufgesetzt. Anschliessend übernimmt das offizielle OpenShift Ansible Playbook die Arbeit und installiert und konfiguriert OpenShift.
Benutzt werden eine ganze Reihe von Azure Services:

  • Availability set
  • Load balancer
  • Network interface
  • Network security group
  • Public IP address
  • Storage account
  • Virtual machine
  • Virtual network

Bei dieser grossen Anzahl zu pflegender Services kommt man um Automatisierung nicht herum. Mittels Ansible lässt sich das sehr gut bewerkstelligen.

Mit den 9 provisionierten Nodes dieses Beispiels haben wir einen Demo-Service erfolgreich auf 300 Pods skaliert:

Unser Fazit

OpenShift auf Azure ist eine gute Wahl. Die Azure Cloud bietet sehr viele essenzielle Services, welche einen reibungslosen Betrieb von OpenShift ermöglichen. Gerne unterstützen wir Sie bei Ihrem Azure Cloud Projekt.

Tobias Brunner

Tobias Brunner is working since over 20 years in IT and more than 15 years with Internet technology. New technology has to be tried and written about.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

Meltdown und Spectre

4. Jan 2018

Am 4. Januar 2018 wurden Informationen über zwei Angriffe auf aktuelle Prozessor-Architekturen veröffentlicht, genannt “Meltdown” und “Spectre”, welche auch uns und unsere Kunden betreffen. Während die unterliegenden Probleme ähnlich sind, handelt es sich dabei um zwei unterschiedliche konkrete Angriffe.

Kurzdarstellung der Sicherheitslücken

Grundsätzlich unterstützen CPUs einen gewissen Instruktionssatz. Dieser ist funktional genau definiert, so dass sich Anwendungen darauf verlassen können, dass der Prozessor zum richtigen Resultat kommt. Wie genau der Instruktionssatz in der Praxis implementiert wird, ist jedoch dem Prozessorhersteller überlassen. Um die Ausführungsgeschwindigkeit zu erhöhen, wenden die Prozessorhersteller alle möglichen Tricks an; z.B. versucht der Prozessor, mehrere Instruktionen gleichzeitig abzuarbeiten (“pipelining”), oder er versucht zu erraten, welcher Pfad durch Fallunterscheidungen im Programm der wahrscheinlichste ist, und führt diesen auf gut Glück aus, mit dem Risiko, das Resultat verwerfen und nochmal weiter vorne beginnen zu müssen (“speculative execution”, “branch prediction”). Diese Mechanismen haben zwar auf das Endresultat einer Berechnung keinen Einfluss, sehr wohl aber auf den zeitlichen Ablauf und den Inhalt der CPU-Caches. Die genannten Angriffe nutzen das nun aus, indem sie den Prozessor dazu verleiten etwas zu tun, was er nicht darf und was er anschliessend auch korrekt verwirft; aber über geschickte Zeitmessungen kann das Programm herausfinden, was für Werte aus dem Speicher dabei verwendet wurden. Über diesen Kniff kann ein entsprechendes Programm unter anderem geschützten Speicher des Systems auslesen und so z.B. Zugriff auf Verschlüsselungs-Keys erhalten. Im Falle von virtualisierten Umgebungen ist das besonders schlimm, denn der Angriff funktioniert auch über die Grenzen virtueller Maschinen hinweg; Es ist so möglich, auf den Speicher von anderen virtuellen Maschinen zuzugreifen.
Die technischen Details sind komplex und sprengen den Rahmen dieses Blog-Artikels bei Weitem, können aber auf der eigens dafür eingerichteten Webseite https://meltdownattack.com nachgelesen werden.

Bewertung

Innerhalb eines einzelnen Systems (ob physisch oder virtuell) haben die Angriffe den Charakter einer Rechteausweitung. Das bedeutet, dass ein einfacher User, der beliebige Software ausführen kann, Zugriff auf das System erhält, welcher nur ein Administrator haben dürfte. Das ist zwar sehr schlecht, aber bei Systemen, bei welchen grundsätzlich nur vertrauenswürdige User Software ausführen können, für sich alleine noch kein Problem. Das trifft typischerweise z.B. auf Web-, File- und Datenbankserver zu.
Bei Systemen, auf welchen beliebige User Software ausführen können, ist die Lücke katastrophal, denn nun ist die Trennung der User oder gar Kunden (Firmen) nicht mehr sichergestellt. Das betrifft z.B. Virtualisierungs-Server (Xen, KVM, VMWare) oder Docker-Server, auf denen beliebige User virtuelle Maschinen beziehen oder Docker-Container ausführen können. Die User können auf solchen Systemen innerhalb einer VM oder eines Docker-Containers bösartigen Code ausführen und Zugriff auf den Speicher anderer VMs oder Docker-Container erhalten. Für derartige öffentliche Virtualisierungsplattformen (z.B. Amazon AWS, Google Compute Engine, Cloudscale) ist daher sofortiges Handeln zwingend nötig.

Massnahmen

Für die gängigen Betriebssysteme stehen Sicherheitsupdates zur Verfügung oder sind in Arbeit. Da sich das Problem auf Ebene Hardware derzeit nicht lösen lässt, modifizieren die Betriebssysteme die Art und Weise, wie sie mit geschütztem Speicher umgehen. Dabei wird der Betriebssystem-Kernel nicht mehr im Speicherbereich jedes Prozesses eingeblendet, sondern ist nur noch nach einem Kontext-Wechsel (Wechsel in eine andere “Speicher-Arbeitsumgebung”) erreichbar.
Dies ist ein massiver Umbau. Einerseits führt er zu einem Performance-Verlust, weil der Kontext-Wechsel für den Prozessor aufwändiger ist als die heutige Umsetzung. Es ist von Geschwindigkeitseinbussen von 5% bis zu 30% die Rede. Wir rechnen für die meisten Anwendungsfälle mit Geschwindigkeitseinbussen von unter 10%, also nicht im spürbaren, aber knapp messbaren Bereich. Andererseits bedeuten solch umfangreiche Umbauten ein Risiko für Bugs und neue, bisher unbekannte Stabilitätsprobleme, weshalb sofortiges Aktualisieren aller Systeme nicht zwingend die beste Vorgehensweise ist.
Da allerdings die Sicherheitsprobleme im Falle von Systemen mit nicht vertrauenswürdigen Usern derart gravierend sind, müssen wir diese Systeme zwingend so bald wie möglich aktualisieren. Das betrifft konkret primär die APPUiO-Public-Plattform; Für alle anderen Systeme evaluieren wir das Vorgehen noch, jedoch werden diese spätestens im nächsten Wartungsfenster am 9. Januar aktualisiert und erhalten dann entsprechend angepasste Linux-Kernel.

david.gubler

David ist DevOps-Engineer und Entwickelt fliessend Java und MongoDB

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

NoSQL – Why you should use MongoDB instead of a relational database

4. Oct 2017

What’s wrong with relational databases

Stored procedures and triggers
For some people, a database just stores data. For others, a database is more like an application server, with stored procedures, triggers, CSV exports to the file system, cleanup jobs and code to expire user accounts. I think a database should store data. Period. Embedding application logic in a database is bound to end up in a huge mess:
  • Who maintains the stored procedures and triggers? The application on startup? A hacky collection of bash scripts? The sysadmin, manually? Nobody at all, because “we have a backup”? Any combination thereof?
  • Are they kept under version control, or are they just floating around?
  • How do you keep track of the versions of stored procedures and triggers? Not at all? “We add a version number to the name”? “We don’t change them”? And related to this:
  • How do you run different versions of your application in parallel (during a rolling release), if they require different versions of your stored procedures and triggers?
  • How do you track which ones are still needed? I bet the answer is: Not at all.

Of course, I’m trying to make two points here: First, if you’re stuck with a relational database, don’t use these features; you’ll make your life easier. Second, if your alternative database doesn’t have these features, don’t think of it as a negative. You’ll be forced to find better solutions right from the beginning.
Schema changes
Nobody likes schema changes. On larger tables, they may take tens of minutes or hours, during this time the database is often at least partially unavailable. This can be a complete show-stopper for deployments during the day, making continuous deployment impossible. Worse yet, in my experience it’s not even possible to figure out beforehand how long a schema change will take; you can try it on a development system with a copy of the production database, and it may take a completely different amount of time than on the production servers, even with equal hardware, cache warmup etc.
Also, during rolling releases, you may have multiple versions of the application trying to set up a different schema, or one of the applications may fail if it doesn’t like the current version of the schema.
There are some tools, most notably pt-online-schema-change by Percona to work around this. But they drastically increase the complexity of a deployment, and I even don’t want to think of the number of ways in which this can go wrong.
Failover and High Availability
Many relational databases don’t have good options for replication and failover. One example is PostgreSQL, which currently only offers simple replication without any sort of failover or HA. There are some tools that work on top of PostgreSQL which try to provide this functionality, but they are complex to set up, even more complex to manage and can fail in an even greater number of ways – very far off from anything one could call “high availability”.
A notable exception, kudos to them, is MySQL/MariaDB with its Galera cluster feature. While it’s implementation is very complex and not without issues, it mostly works well and is more or less easy to set up and administer. If you must go for a relational database, go for MySQL/MariaDB, because you can easily switch to a Galera cluster later! (Shameless plug: We offer Galera as a managed service, and we’ve got it all ready for you if you’re interested).
Normalization
So you’ve got some object in your application. This object is associated with a number of users, there are comments, and you can attach files to it. What do you do in a relational database? You create a table for that object, a table for the n:n relationship to your users, a table for the comments, a table for the file metadata (assume the file is stored elsewhere).
Now, you’ve got some heavy users of that feature. One of your objects has 50 comments, 140 user IDs attached to it and 15 files. Now your application needs to display this object.
The database needs to fetch one of your objects, 50 comments, 140 user IDs and 15 file metadata entries. That’s over 200 rows! Since the data was probably added over time, it’s not stored sequentially, hence typically your database server will have to fetch 200 random 4k blocks from your SSD, only to read a single one of your objects!
Wait, it gets worse. Let’s have a look at your indexes.
The n:n relationship between your object and your users contains #objects * #usersPerObject entries. Depending on how large usersPerObject is, the total number of rows in this n:n relationship can be huge, much bigger than the number of objects and the number of users.
You could, for instance, easily have 10 mio users, 10 mio objects and 200 mio entries in this n:n relationship.
This n:n relationship probably needs two indexes. These indexes are going to be huge, possibly bigger than all your object and user data combined. They occupy a lot of RAM, and they are just overhead, not actual user data.
Wouldn’t it be nice to store your entire object in one place, without the need to join all those tables? All of it could fit in a few sequential 4k blocks!
Oh. And don’t get me started on maintaining consistency in a normalized database. Yes, I, too, have written tools that iterate over the database, trying to find orphans and broken foreign keys.
SQL
Your application has to assemble what you want into a single string, and the database server has to parse that string and figure out what you wanted in the first place. That’s just wasteful, never mind all the injection issues. Why don’t you tell the database server directly what you want, in a structured form?

Ok, so relational databases are bad because of normalization and schema changes, some are bad in terms of HA, and I’m shooting myself in the foot if I’m using stored procedures. How can I do better?

Easy. Just store your object in one piece as JSON, without schema restrictions on the database.

  • No more wasteful n:n relationships
  • No more schema changes
  • Far fewer or even no more orphans in the database
  • Data much easier to navigate and understand

Now, even some relational database vendors have come to realize that this is the way to go, e.g. PostgreSQL has added such features in recent versions. But if you want it all, you’re probably better off with a database like MongoDB, which is built around these core concepts, and offers all the advantages.
Stored procedures and triggers
Stored procedures are not offered by MongoDB. It can execute server-side JavaScript, but it doesn’t store the code for you (hence none of the issues of stored procedures).
MongoDB doesn’t have triggers. Because triggers are often only used to ensure consistency in a normalized schema and MongoDB doesn’t have that, they’re not really needed anyway.
Schema changes
No schema, no schema changes.
(Of course your application has a schema, otherwise it couldn’t interpret the objects read from database, but the database doesn’t care how it looks).
Failover and High Availability
MongoDB supports easy-to-use and powerful replica sets. There can only be one writeable primary, though. If you need multiple primaries for performance (which you probably won’t since MongoDB’s write performance is excellent), you can use sharding (also supported out of the box).
Normalization
Typically, you put your entities into one document, therefore there is no normalization going on. However, don’t think you can just store anything you want in the database. Just because the database doesn’t enforce a schema doesn’t mean that you don’t have to put in some effort to find a good data layout, which may include spliting up data into multiple collections.
SQL
As the well-known term “NoSQL” suggests, MongoDB doesn’t support SQL. Instead you tell the server what you want in a structured form. This is something that seems alien at first, but you’ll get used to it.

Ok that’s great, but what about transactions?

MongoDB doesn’t support transactions per se. But, as long as you stay within one document, you can change any number of things atomically (“at the same time”) as you want, which kind of acts as a replacement for transactions. This means, in turn, that if your application needs atomicity for some use cases, you must make sure that all the data involved in this operation is living in a single document.
Does this sound like a problem to you? Think about this:
Relational databases have multiple levels of transaction isolation. You would need the highest levels of isolation to ensure that nothing can go wrong, but performance is so poor with that kind of isolation that even banking software doesn’t use it.
The second problem of transactions is: They don’t do what you want. You don’t want a user’s action to fail if another user changes something concurrently. Instead, you want the two changes to get merged, without loosing any user’s changes. Relational databases don’t offer any kind of semantics for that, this just sort of happens (or doesn’t) depending on the database layer implementation in your application. MongoDB offers update operations which don’t have that problem, because they only change selected parts of a document, and the change has very clear semantics, even in the presence of other, concurrent changes.

Help! I’m all for MongoDB, but I need to convince my boss!

Put these in your PowerPoint presentation (remove the comments in parentheses):

  • Your application will have better performance, because there’s no need to normalize data (given a good implementation).
  • Your application will have better availability, because replica sets can be set up easily and work very well.
  • There is faster product development, because new features can be added more easily without the need for schema migrations.
  • There will be lower system administration cost, because MongoDB is easier to set up, tune and maintain than relational databases.

Are there any downsides?

There are.

  • MongoDB is in many ways very different from a relational database, among other things you have to re-learn how to create a good schema and how queries and updates work.This is asking a lot from your development team.
  • A MongoDB document is a BSON (binary form of JSON with some extensions) blob. Each document repeats the keys, and that’s redundancy you don’t have with a relational database. This can be mitigated by choosing short key strings and using document level compression.
  • Server-side Javascript execution was always kind of slow when I used it (which unfortunately includes the built-in map/reduce functionality). Things may have improved in the meantime, though. But if you stick to normal query/update operations and (if required) the aggregation framework, you’ll have great performance.
  • Replica Sets only have one writeable primary node and sharding is complex to set up. Won’t be an issue for most applications, though, because write performance of a single primary is really good.
  • There is no good PHPMyAdmin analog.
  • If you want to migrate an existing application to MongoDB, it will be a very painful and tedious process, as you’ll have to rewrite all of your database layer and take care of data migration. It can easily take you years to do a full migration. Don’t underestimate this.
  • While the original storage engine of MongoDB, mmapv1, was very robust, and its successor wiredTiger is faster and has better features, we’ve also seen cases in which a crashed wiredTiger database couldn’t recover on its own. This is not a problem in a replica set (you can just delete the data on the replica and let it re-sync), but it can be a problem on a single server. However, the MongoDB team is aware of these issues and we expect that they’ll get fixed.

david.gubler

David ist DevOps-Engineer und Entwickelt fliessend Java und MongoDB

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

Serverless Computing / Functions as a Service mit APPUiO und Snafu

22. Sep 2017
Was ist Serverless, auch FaaS (Function as a Service) genannt?

Die Begriffe “Serverless” und “FaaS (Function as a Service)” werden in jüngerer Zeit immer öfter in Artikeln erwähnt. Um was geht es eigentlich bei diesem Thema? Unter “FaaS” versteht man die Möglichkeit, eine Programmfunktion “in der Cloud” zu betreiben. Dabei speichert der Entwickler die gewünschte Funktion (egal in welcher Programmiersprache abgefasst, so lange vom Anbieter unterstützt) im FaaS Service der Wahl und ruft diese z.B. über HTTP oder einen Servicebus auf. Dabei muss der Benutzer der Funktion sich weder um die Ausführungsumgebung, Skalierung noch die Konfigurationsdetails eines Applikationsservers kümmern. Daher kommt auch der Begriff “Serverless”, welcher sagen möchte, dass die Funktion “einfach läuft”, sozusagen ohne Server. Eine Funktion kann eine einfache “Eingabe-Verarbeitung-Ausgabe” Funktion sein, komplexe Berechnungen durchführen oder Daten aus anderen externen Diensten beziehen, verarbeiten und speichern.
Der Einsatz von FaaS macht vor allem dann Sinn, wenn es sich um eine spezialisierte Funktion handelt, welche von diversen Microservices verwendet wird. Auch ökonomisch lässt sich der Einsatz von Funktionen in der Cloud gut begründen: Bezahlt wird für der einzelne Funktionsaufruf (je nach Anbieter). Wird die Funktion nicht genutzt, fallen auch keine Kosten an. Dies ist ein “echtes” Cloud Modell, ganz im Sinne von “Pay per Use”.

(more…)

Tobias Brunner

Tobias Brunner is working since over 20 years in IT and more than 15 years with Internet technology. New technology has to be tried and written about.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

DKIM – Ein Baustein im Kampf gegen Spam mit gefälschten Absenderadressen

30. Aug 2017

DKIM (DomainKeys Identified Mail, RFC6376) geht auf eine Entwicklung von Yahoo aus dem Jahre 2004 zurück. Es handelt sich dabei um ein Protokoll zur Sicherstellung der Authentizität von Absendern in E-Mails. Der Standard dient also der Eindämmung von unerwünschten E-Mails wie Spam oderPhishing, welche grösstenteils mit gefälschtem Absender in den elektronischen Postfächern landen.

(more…)

andre.keller

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us
Tech

Introduction to OpenShift on Exoscale

22. Aug 2017

“OpenShift is to Kubernetes similar to what a Linux distribution is to the kernel.”
The world is talking about the Kubernetes Project – but did you hear about OpenShift? It’s an open source product based on the open source projects Kubernetes and Docker plus a container builder/registry, and a Web GUI to manage it all. This blog post will introduce you to OpenShift and give some hints why to use it, how to get started, and where you can get professional support and managed services.

What is OpenShift and why should you use it?

It describes itself as “the industry’s most secure and comprehensive enterprise-grade container platform based on industry standards, Docker and Kubernetes”. It’s much more than that – it gives you a complete Kubernetes cluster with many cool features: integrated build pipelines, Docker Registry, Application router (for getting traffic into the cluster), security based on RBAC and SELinux, Web Console for easy access, central logging of all Pod output, Metrics to measure Pod performance, Installation and upgrade using Ansible Playbooks, Source-to-Image builds, and much much more.
As a Linux distribution acts to the Linux Kernel, OpenShift is a Kubernetes distribution with all the needed tools and tricks to make full use of it.
OpenShift comes in two flavors:

  • OpenShift Container Platform: Software product to install in your data center and get support by Red Hat.
  • OpenShift Origin: The open source upstream project with a very active GitHub repository.

OpenShift enables you to develop faster – after committing your changes in GIT it solves container image build, storage, deploy, scaling, monitoring, and logging for you so you don’t have to do it. The integrated build and deployment processes help you get the developed application to the customer as fast as possible. It enables you to deploy hourly or even faster, and scale computing resources per project automatically with your user base.

How to get started?

There are many many ways to get started, here are a few hints and examples:

  • Install your own OpenShift cluster for example on Exoscale with the official Ansible Playbooks. By using these playbooks you learn to customize every inch of the installation and configuration, and they also help you upgrade from one version to another. Documentation about these playbooks can be found inside the Git repository or on the documentation page.
  • Start a local OpenShift cluster on your workstation with Minishift (based on Minikube) or with the fancy command oc cluster up. Just download the client binary from the GitHub releases page, unpack it, and then run the oc cluster up command. This will launch a complete OpenShift instance on your local Docker Engine:

.

% oc cluster up
Starting OpenShift using openshift/origin:v3.6.0 ...
Pulling image openshift/origin:v3.6.0
Pulled 1/4 layers, 28% complete
Pulled 2/4 layers, 83% complete
Pulled 3/4 layers, 88% complete
Pulled 4/4 layers, 100% complete
Extracting
Image pull complete
OpenShift server started.
The server is accessible via web console at:
You are logged in as:
    User:     developer
    Password: <any value>
To login as administrator:
    oc login -u system:admin
% oc new-app https://github.com/appuio/example-php-sti-helloworld.git
[...]
% oc expose svc example-php-sti-helloworld
[...]
% curl -s http://example-php-sti-helloworld-myproject.127.0.0.1.nip.io/ | grep title
    <title>APPUiO PHP Demo</title>
  • Have a look at the APPUiO Techlabs on GitHub which is a free step-by-step introduction to get started. We offer free half-day workshops.
  • The APPUiO Microservices Example documentation gives some insight for developers on how a Microservice application can be built and deployed on OpenShift, describing tools like Gitlab CI and Jenkins for the build pipelines.

There is a lot of documentation available from upstream. It’s a great source to read about every little detail. You’ll find documentation for both the OpenShift Container Platform and OpenShift Origin. APPUiO also provides a community-driven documentation.

About APPUiO

APPUiO – the Swiss Container Platform – is a managed OpenShift service by Puzzle and VSHN. Your OpenShift platform is managed on any cloud you wish – especially on Exoscale.
With more than two years of experience with OpenShift v3 we’re the leading provider in Switzerland with a deep knowledge in running and operating OpenShift. We’re not only managing dozens of private OpenShift clusters but also a public shared platform.
Running the OpenShift platform continuously and reliably is not easy – it has a lot of moving parts making the lives of developers easier. That’s why we’ve engineered more than 120 cluster checks and 50 checks per server for each APPUiO cluster to ensure proper functionality. This also includes regular end-to-end tests which are simulating the user interaction with the cluster – from building to deploying and accessing an application. We also share a lot of scripts, tools, and documentation on GitHub under the APPUiO organization. Talk to us in the APPUiO Community Chat.
Check out the Exoscale integrations to find out which tools Exoscale is supporting – and come talk to APPUiO about trying out OpenShift for free today!
(Original-Blog veröffentlicht bei www.exoscale.ch )

Tobias Brunner

Tobias Brunner is working since over 20 years in IT and more than 15 years with Internet technology. New technology has to be tried and written about.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us