SOS level102 Containerization and Orchestration (#111)

* SOS level102 Containerization and Orchestration

* Typos and links fixed. Minor rephrasing work done

Co-authored-by: rajalakshmi Vaidyanathan <ravaidya@ravaidya-ld1.linkedin.biz>
pull/112/head^2
rajalakshmi Vaidyanathan 3 years ago committed by GitHub
parent 2522176cd2
commit b194798985
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,5 @@
# Conclusion
In this sub-module we have toured the world of containers starting from why we use containers, how containers evolved from the virtual machine past (though they are, in no means, obsolete) and how they are different from virtual machines. We then saw how containers are implemented with emphasis on cgroups and namespaces along with some hands-on exercises. Finally we concluded our journey with container orchestration where we learnt a bit of Kubernetes with some practical examples.
Hope this module gives you enough knowledge and interest to continue learning and applying these technologies in greater depth!

@ -0,0 +1,100 @@
## Introduction
Docker has gained huge popularity among other container engines since it was released to the public in 2013. Here are some of the reasons why Docker so popular:
- _Improved portability_
Docker containers can be shipped and run across environments be it local machine, on-prem or cloud instances in the form of Docker images. Compared to docker containers, LXC containers have more machine specifications.
- _Lighter weight_
Docker images are light weight compared to VM images. For example, an Ubuntu 18.04 VM size is about 3GB whereas the docker image is 45MB!
- _Versioning of container images_
Docker supports maintaining multiple versions of images which makes it easier to look up the history of an image and even rollback.
- _Reuse of images_
Since Docker images are in the form of layers, one image can be used as base on top of which new images are built. For example, [Alpine](https://hub.docker.com/_/alpine) is a light weight image (5MB) which is commonly used as a base image. Docker layers are managed using [storage drivers](https://docs.docker.com/storage/storagedriver/).
- _Community support_
Docker hub is a container registry where anyone logged in can upload or download a container image. Docker images of popular OS distros are regularly updated in docker hub and receive large community support.
Lets look at some terms which come up during our discussion of Docker.
## Docker terminology
- _Docker images_
Docker image contains the executable version of the application along with the dependencies (config files, libraries, binaries) required for the application to run as a standalone container. It can be understood as a snapshot of a container.
Docker images are present as layers on top of the base layer. These layers are the ones that are versioned. The most recent version of layer is the one that is used on top of the base image.
`docker image ls` lists the images present in the host machine.
- _Docker containers_
Docker container is the running instance of the docker image. While images are static, containers created from the images can be executed into and interacted with. This is actually the “container” from the previous sections of the module.
`docker run` is the command used to instantiate containers from images.
`docker ps` lists docker containers currently running in the host machine.
- _Docker file_
It is a plain text file of instructions based on which an image is assembled by docker engine (daemon, to be precise). It contains information on base image, ENV variables to be injected.
`docker build` is used to build images from dockerfile.
- _Docker hub_
It is Dockers official container registry of images. Any user with a docker login can upload custom images to Docker hub using `docker push` and fetch images using `docker pull`.
Having known the basic terminologies lets look at how docker engine works; how CLI commands are interpreted and container life-cycle is managed.
## Components of Docker engine
Lets start with the diagram of Docker Engine to understand better:
![Docker Engine Architecture](images/dockerengine.png)
The docker engine follows a client-server architecture. It consists of 3 components:
- _Docker client_
This is the component the user directly interacts with. When you execute docker commands which we saw earlier (push, pull, container ls, image ls) , we are actually using the docker client. A single docker client can communicate with multiple docker daemons.
- _REST API_
Provides an interface for the docker client and daemon to communicate.
- _Docker Daemon (server)_
This is the main component of the docker engine. It builds images from dockerfile, fetches images from docker registry, pushes images to the registry, stops, starts containers etc. It also manages networking between containers.
## LAB
The official [docker github](https://github.com/docker/labs) provides labs at several levels for learning Docker. We're linking one of the labs which we found great for people beginning from scratch. Please follow the labs in this order:
1. [Setting up local environment for the labs](https://github.com/docker/labs/blob/master/beginner/chapters/setup.md)
2. [Basics for using docker CLI](https://github.com/docker/labs/blob/master/beginner/chapters/alpine.md)
3. [Creating and containerizing a basic Flask app](https://github.com/docker/labs/blob/master/beginner/chapters/webapps.md)
Here is another [beginner level lab](https://www.katacoda.com/courses/docker/2) from Katacoda for dockerizing a node js application. You dont even need a local setup for this and its easy to follow along.
## Advanced features of Docker
While we have covered the basics of containerization and how a standalone application can be dockerized, processes in the real world need to communicate with each other. This need is particularly prevalent in applications which follow a microservice architecture.
**Docker networks**
Docker networks facilitate the interaction between containers running on the same hosts or even different hosts. There are several options provided through docker network command which specifies how the container interacts with the host and with other containers. The `host` option allows sharing of network stack with the host, `bridge` allows communication between containers running on the same host but not external to the host, `overlay` facilitates interaction between containers across hosts attached to the same network and `macvlan` which assigns a separate MAC address to a container for legacy containers are some important types of networks supported by Docker. This however is outside the scope of this module. The official documentation on [docker networks](https://docs.docker.com/network/) itself is a good place to start.
**Volumes**
Apart from images, containers and networks, Docker also provides the option to create and mount volumes within containers. Generally, data within docker containers is non-persistent i.e once you kill the container the data is lost. Volumes are used for storing persistent data in containers. This [KataKoda lab](https://www.katacoda.com/courses/docker/persisting-data-using-volumes) is a great place to start playing with volumes.
[In the next section](https://linkedin.github.io/school-of-sre/orchestration_with_kubernetes/) we see how container deployments are orchestrated with Kubernetes.

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 199 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 171 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 115 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 113 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 358 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 207 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 125 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 97 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

@ -0,0 +1,51 @@
# Containers and orchestration
## Introduction
Containers, Docker and Kubernetes are "cool" terms that are being spoken of by everyone involved with software in some way. Let's dive into each of these pieces of technology at enough depth to understand what the whole deal is about!
In this module we talk about the ins and outs of containers: the internals and usage of containers; how they are implemented, how to containerize your application and finally, how to deploy containerized applications on a large scale without losing your sleep. We'll also get our hands dirty by trying out a few lab exercises.
### Prerequisites
- Basic knowledge of linux will be helpful understanding the internals of containers
- Basic knowledge of shell commands (will come handy when we're containerizing applications)
- Knowledge of running a basic web application. You can go through our [Python And Web module](https://linkedin.github.io/school-of-sre/level101/python_web/intro/) to gain familiarity with this.
## What to expect from this course
This module is divided into 3 sub-modules. In the first sub module, we will cover the internals of containerization and why theyre used for.
The second sub-module introduces Docker, a popular container engine and contains lab exercises on dockerizing a basic webapp.
The last module talks about container orchestration with Kubernetes and some lab exercises to show how it makes the lives of SREs easy.
## What is not covered under this course
We will not cover advanced docker and kubernetes concepts. However, we will be leading you to links and references from where you can pick them up as per your interest.
## Course Contents
The following topics has been covered in this course:
- [Introduction to containers](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/)
- [What are containers](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#what-are-containers)
- [Why containers](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#why-containers)
- [Difference between virtual machines and containers](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#difference-between-virtual-machines-and-containers)
- [How are containers implemented](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#how-are-containers-implemented)
- [Namespaces](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#namespaces)
- [Cgroups](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#cgroups)
- [Container engines](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#container-engine)
- [Containerization with Docker](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/)
- [Introduction](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#introduction)
- [Basic docker terminology](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#docker-terminology)
- [Components of Docker engine](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#components-of-docker-engine)
- [Hands-on](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#lab)
- [Introduction to Advanced Docker](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#advanced-features-of-docker)
- [Container orchestration with Kubernetes](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/)
- [Introduction](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#introduction)
- [Motivation to use Kubernetes](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#motivation-to-use-kubernetes)
- [Kubernetes Architecture](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#architecture-of-kubernetes)
- [Hands-on](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#lab)
- [Introduction to Advanced Kubernetes concepts](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#advanced-topics)
- [Conclusion](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/conclusion/)

@ -0,0 +1,198 @@
## What are containers
Here's a popular definition of containers according to [Docker](https://www.docker.com/resources/what-container), a popular containerization engine :
> A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another
Let's break this down. A container is your code bundled along with its entire runtime environment. That includes your system libraries, binaries and config files needed for your application to run.
## Why containers
You might wonder why we need to pack your application along with its dependencies. This is where the second part of the definition comes,
> ...so the application runs quickly and reliably from one computing environment to another.
Developers usually write code in their dev environment (or local machine), test it in one or two staging/test environments before pushing their code into production. Ideally, for reliably testing applications before pushing to production, we need all these environments to be uniform to a tee (underlying OS, system libraries etc).
Of course, the ideal is hard to achieve especially when we're using a mix of on-prem (complete control) and cloud infrastructure providers (more restrictive in terms of control of hardware and security options), a scenario which is more common today.
This is exactly why we need to package not only the code but also the dependencies; so that your application runs reliably irrespective of which infrastructure or environment it runs on.
We can run several containers on a single host. Due to how containers are implemented, each container has its own isolated environment within the same host. This means that a monolithic application can be broken down into micro-services and packaged into containers. Each microservice runs in the host machine in isolated environments. This is another reason why containers are used: _separation of concerns_.
Providing isolated environments does not let the failure of one application in one container affect the other. This is called _fault isolation_. Isolation also gives the added benefit of increased security due to restricted visibility of processes in a container.
Due to how most of the containerization solutions are implemented, we also have the option to cap the amount of resources consumed by applications running within a container. This is called _resource limiting_. Will will discuss this feature in more detail in the section on cgroups.
## Difference between virtual machines and containers
Let's digress a little and go into some history. In the previous section we talked about how containers help us in achieving separation of concerns. Before the wide-spread usage of containers, virtualization was used for running applications in isolated environments in the same host (its still being used today in some cases).
In plain terms, virtualization is where we package software along with a copy of the OS on which it runs. This package is called a virtual machine (VM). The image of the OS bundled in the VM is called Guest OS. A component called Hypervisor sits between the Guest and the Host OS and is responsible for facilitating the access of the underlying OSs hardware to the Guest OS. You can learn more about hypervisors [here](https://searchservervirtualization.techtarget.com/definition/bare-metal-hypervisor).
![Virtual Machine Architecture](images/VM.png)
Similar to how multiple containers can be run in a single host machine, multiple VMs can be run on a single host and in this way, its possible to run applications (or each microservice) in a separate VM and achieve separation of concerns.
The main focus here is on the size of the VMs and containers. VMs come along with a copy of the guest operating system and therefore are heavy-weight compared to containers. If youre more interested in comparison of VMs and containers, you can check these articles from [Backblaze](https://www.backblaze.com/blog/vm-vs-containers/) and [NetApp](https://blog.netapp.com/blogs/containers-vs-vms/).
While it is possible to run an operating system on a host with an incompatible kernel using hypervisors (e.g Windows 10 VM on CentOS 7), in cases where kernels can be shared (e.g Ubuntu on CentOS 7) containers are preferred over VMs due to the size factor. Sharing kernels, as you will see later, also gives containers many performance benefits over VMs like quicker boot-ups. Lets look at the diagram of how containers work.
![Containers Architecture](images/Containers.png)
Comparing the two diagrams, we notice two things:
- Containers do not have a separate (guest) OS
- Container engine is the intermediary between containers and Host OS. It is used to facilitate the life-cycle of a container on the Host OS (it is not a necessity, however).
The next section explains in detail how containers share the same operating system (kernel, to be precise) as the host machine and yet provide isolated environments for applications to run.
## How are containers implemented
Weve talked about how containers, unlike virtual machines, share the same kernel as the host operating system and provide isolated environments for applications to run. This is achieved without the overhead of running a guest operating system on the host OS, thanks to two features of linux kernel called cgroups and kernel namespaces.
Now that we are touching upon the internals of containers, it would be appropriate to give a more technically accurate representation of what they are. A container is a linux process or a group of linux processes which is restricted in
- **visibility** into processes outside the container (implemented using namespace)
- **quantity of resources** it can use (implemented using cgroups) and
- **system calls** that can be made from the container. Refer [seccomp](https://docs.docker.com/engine/security/seccomp/), if interested in knowing more.
These restrictions are what make a containerized application remain isolated from other processes running in the same host.
Now lets talk about namespaces and cgroup in a little more detail.
## Namespaces
Visibility of processes inside a container should be restricted within itself. This is what linux namespaces do. The idea is that processes within a namespace cant affect those which it cant “see”. Processes sharing a single namespace have identities, service and/or interfaces unique to the namespace they exist in. Heres a list of namespaces in linux:
- *Mount*
Process groups sharing a mount namespace share a separate, private set of mount points and file system view. Any modifications made to these namespaced mount points are not visible outside the namespace. For example it is possible to have a /var within the a mount namespace which is different from /var in the host.
- *PID*
A processes in a pid namespace have process ids which are unique only within the namespace. A process can be a root process (pid 1) in its own pid namespace and have an entire tree of processes under it.
- *Network*
Each network namespace will have its own network device instances that can be configured with individual network addresses. Processes in the same network namespace can have their own ports and route tables.
- *User*
User namespaces can have their own users and group ids. Its possible for a process using a non-privileged user in the host machine to have a root user identity within a user namespace.
- *Cgroup*
Allows creation of cgroups which can be used only within the cgroup namespace. Cgroups will be covered in more detail in the following section.
- *UTS*
This namespace has its own hostname and domain name
IPC. Each IPC namespace has its own System V and POSIX message queues.
As complex as it seems, creating namespaces in linux is quite simple. Lets see a quick demo to create a PID namespace. Youll need a linux based OS with sudoers permission to follow along.
### DEMO: namespaces
* First we check which processes are running in the host system (output varies from system to system). Note the process with pid 1.
![Fig 1](images/ns1.png)
* Lets create a PID namespace with the unshare command and create a bash process in the namespace
![Fig 2](images/ns2.png)
You can see that `ps aux` (which itself is a process launched in the PID namespace so created) can only see processes within its own namespace. Hence, the output shows **only 2 processes** running within the namespace. Also note, the root process (pid 1) in the namespace is not init but it is the bash shell which we specified while creating the namespace.
* Lets create another process in the same namespace which sleeps for 1000 seconds in the background. In my case the pid of the sleep process is 44 **within the PID namespace**.
![Fig 3](images/ns3.png)
![Fig 4](images/ns4.png)
* On a separate terminal, check for the process id of the sleep process as seen from the host.
![Fig 5](images/ns5.png)
Note the difference in pid (23844 in the host and 44 within the namespace) though both refer to the same process (start time and all other attributes are same).
Its also possible to nest namespaces i.e create a pid namespace from another pid namespace.
Try out `sudo nsenter -t 23844 --pid -r bash` to reenter the namespace and create another pid namespace within it. It should be fun to do!
## Cgroups
A cgroup can be defined as a set of processes whose usage of resources is metered and monitored. The resources can be memory pages, disk i/o, CPU etc. In fact, cgroups are classified based on which resource the limit is imposed on and nature of action taken when a limit is violated.
The component in the cgroup which tracks resource utilization and controls the behaviour of processes in a cgroup is called resource-subsystem or resource controller.
Following is the set of resource controllers and their function according to RHELs [introduction to cgroups](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/ch01):
* _blkio_ — this subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, or USB).
* _cpu_ — this subsystem uses the scheduler to provide cgroup processes access to the CPU.
cpuacct — this subsystem generates automatic reports on CPU resources used by processes in a cgroup.
* _cpuset_ — this subsystem assigns individual CPUs (on a multicore system) and memory nodes to processes in a cgroup.
* _devices_ — this subsystem allows or denies access to devices by processes in a cgroup.
* _freezer_ — this subsystem suspends or resumes processes in a cgroup.
* _memory_ — this subsystem sets limits on memory use by processes in a cgroup and generates automatic reports on memory resources used by those processes.
Cgroups follow a hierarchical, tree-like structure for each resource controller i.e one cgroup exists for each controller. Each cgroup in a hierarchy inherits certain attributes (e.g limits) from its parent cgroup.
Lets try out a quick demo with memory cgroups to wrap our heads around the above ideas. Youll need a linux based OS (here, RedHat) with sudo permission to follow along.
### DEMO: cgroups
* Lets start by checking if cgroup tools are installed in your machine. Execute `mount | grep "^cgroup"`. If you have the tools installed youll see a output like this:
![Fig 1](images/cg1.png)
If not, install the tools with `sudo yum install libcgroup-tools -y`.
* Now, we create a memory cgroup called mem_group with “root” as the owner of the cgroup. Command executed `sudo cgcreate -a root -g memory:mem_group`. Verify that cgroup is created.
![Fig 2](images/cg2.png)
`/sys/fs/cgroup/<cgroup type>` is the pseudo filesystem where a newly created cgroup is added as a sub-group.
* Memory cgroup puts a limit on the memory usage of processes in the cgroup. Lets see what the limits are for mem_group. The file for checking the memory limit is memory.limit_in_bytes([more information here](https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt) , if youre interested).
![Fig 3](images/cg3.png)
* Note that mem_group has inherited the limit from its parent cgroup
![Fig 4](images/cg4.png)
* Now, lets reduce the memory usage limit to 20KB for the purpose of our demo (the actual limit is rounded off to the nearest power of 2).
![Fig 5](images/cg5.png)
This limit is too low and hence most of the processes attached to mem_group should be OOM killed.
* Create a new shell and attach it to the cgroup. We need sudo permissions for this.
![Fig 6](images/cg6.png)
The process is OOM killed as expected. You can confirm the same with dmesg logs (mm_fault_error).
If you want to try out a more in-depth exercise on cgroups, check out [this tutorial from Geeks for Geeks](geeksforgeeks.org/linux-virtualization-resource-throttling-using-cgroups/).
Lets come back to containers again. Containers share the same kernel as the underlying host operating system and provide an isolated environment of the application within. Cgroups help in managing resources used by processes within a container and namespaces help isolate network stack, pids, users, group ids and mount points in a container from another container running on the same host.
Of course, there are more components to containers which truly make it fully functional but that discussion is out of scope of this module.
## Container engine
Container engines ease the process of creating and managing containers in a host machine. How?
* The container creation workflow typically begins with a container image. A container image is a packaged, portable version of the target application bundled with all dependencies for it to run.
* These container images are either available on the host machine (container host) from previous builds or need to be pulled from a remote repository of images. Sometimes the container engine might need to build the container image from a set of instructions.
* Finally once the container image is fetched/built, the container engine unpacks the image and creates an isolated environment for the application as per the image specifications.
* The files in the container image are then mounted to the isolated environment to get the application up and running within the container.
There are several container engines available like Docker, RKT, LXC (one of the first container engines) which require different image formats (Docker, LXD). OCI (Open Container Initiative) is a collaborative project started by Docker that aims to standardize container runtime specifications and image formats across vendors. OCI [FAQ section](https://opencontainers.org/faq/) is a good place to start if youre curious about this project.
We will focus on Docker in the [next section](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/).

@ -0,0 +1,231 @@
## Introduction
Now we finally arrive at the most awaited part: running and managing containers at scale. So far, we have seen how Docker facilitates managing the life-cycle of containers and provides improved portability of applications. Docker does provide a solution for easing the deployment of containers on a large scale ( you can check out Docker Swarm, if interested) which integrates well with Docker containers. However, Kubernetes has become the de-facto tool for orchestrating the management of microservices (as containers) in large distributed environments.
Lets see the points of interest for us, SREs, to use container orchestration tools and Kubernetes in particular.
## Motivation to use Kubernetes
- _Ease of usage_
Though there is a steep learning curve associated with Kubernetes, once learnt , can be used as a one stop tool to manage your microservices. With a single command it is possible to deploy full fledged production ready environments. The desired state of an application needs to be recorded as a YAML manifest and Kubernetes manages the application for you.
- _Ensure optimum usage of resources_
We can specify limits on resources used by each container in a deployment. We can also specify our choice of nodes where Kubernetes can schedule nodes to be deployed (e.g microservices with high CPU consumption can be instructed to be deployed in high compute nodes).
- _Fault tolerance_
Self-healing is built into basic resource types of Kubernetes. This removes the headache of designing a fault tolerant application system from scratch. This applies especially to stateless applications.
- _Infrastructure agnostic_
Kubernetes does not have vendor lock-in. It can be set up in multiple cloud environments or in on-prem data centers.
- _Strong community support and documentation_
Kubernetes is open-source and many technologies like operators, service mesh etc. have been built by the community to manage and monitor Kubernetes-orchestrated applications better.
- _Extensible and customisable_
We can build our custom resource definitions which fit our use case for managing applications and use Kubernetes to manage them (with custom controllers).
You can check out [this article](https://hackernoon.com/why-and-when-you-should-use-kubernetes-8b50915d97d8) if you are more interested in this topic.
## Architecture of Kubernetes
Heres a diagram (from [the official Kubernetes documentation](https://kubernetes.io/docs/concepts/overview/components/)) containing different components which make Kubernetes work:
![Kubernetes Architecture](images/kubernetes.png)
Kubernetes components can be divided into two parts: [control plane components](https://kubernetes.io/docs/concepts/overview/components/#control-plane-components) and [data plane components](https://kubernetes.io/docs/concepts/overview/components/#node-components).
A Kubernetes cluster consists of 1 or more host machines (called nodes) where the containers managed by Kubernetes are run. This constitutes the data plane (or node plane).
The brain of Kuberentes which responds to events from the node plane (e.g create a pod, replicas mismatch) and does the main orchestration is called the control plane. All control plane components are typically installed in a master node. This master node does not run any user containers.
The Kubernetes components themselves are run as containers wrapped in Pods (which is the most basic kubernetes resource object).
- Control plane components:
- kube-apiserver
- etcd
- kube-scheduler
- kube-controller-manager
- Node plane components
- kubelet
- kube-proxy
This workflow might help you understand the working on components better:
- An SRE installs `kubectl` in their local machine. This is the client which interacts with the Kubernetes control plane (and hence the cluster).
- They create a YAML file, called manifest which specifies the desired state of the resource (e.g a deployment names “frontend” needs 3 pods to always be running)
- When they issue a command to create objects based in the YAML file, the kubectl CLI tool sends a rest API request to the `kube-apiserver`.
- If the manifest is valid, it is stored as key value pairs in the `etcd` server on the control plane.
- `kube-scheduler` chooses which nodes to put the containers on (basically schedules them)
- There are controller processes (managed by `kube-controller` manager) which makes sure the current state of the cluster is equivalent to the desired state (here, 3 pods are indeed running in the cluster -> all is fine).
- On the node plane side, `kubelet` makes sure that pods are locally kept in running state.
## LAB
### Prerequisites
The best way to start this exercise is to use a [Katacoda kubernetes playground](https://www.katacoda.com/courses/kubernetes/playground). A single node kubernetes cluster is already set up for you here for quick experimentation. You can also use this to play with docker.
The environment gets torn down after 10 mins. So make sure that you save your files if you want to resume them. For persistent kubernetes clusters, you can set it up either in your local (using [minikube](https://minikube.sigs.k8s.io/docs/start/)) or you can create a [kubernetes cluster in Azure](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal), GCP or any other cloud provider.
Knowledge of YAML is nice to have for understanding the manifest files.
### Hands-on
#### Lab 1:
We are going to create an object called Pod which is the most basic unit for running a container in Kubernetes. Here, we will create a pod called nginx-pod” which contains an nginx container called “web”. We will also expose port 80 in the container so that we can interact with the nginx container.
Save the below manifest in a file called nginx-pod.yaml
``` yaml
apiVersion: v1 #[1]
kind: Pod #[2]
metadata: #[3]
name: nginx-pod #[4]
labels: #[5]
app: nginx
spec: #[6]
containers: #[7]
- name: web #[8]
image: nginx #[9]
ports: #[10]
- name: web #[11]
containerPort: 80 #[12]
protocol: TCP #[13]
```
Lets very briefly understand whats here:
- `#[2]` - kind: The “kind” of object thats being created. Here it is a Pod
- `#[1]` - apiVersion: The apiVersion of the “Pod” resource. There could be minor changes in the values or keys in the yaml file if the version varies.
- `#[3]` - metadata: The metadata section of the file where pod labels and name is given
- `#[6]` - spec: This is the main part where the things inside the pod are defined
These are not random key value pairs! They have to be interpretable by the kubeapiserver. You can check which key value pairs are optional/mandatory using `kubectl explain pod` command. Do try it out!
* Apply the manifest using the command `kubectl apply -f nginx-pod.yaml`. This creates the “nginx-pod” pod in the kubernetes cluster.
![](images/kube1.png)
* Verify that the pod is in running state using `kubectl get pod`.
![](images/kube2.png)
It shows that nginx-pod is in Running state. 1/1 indicates that out of 1 out of 1 container(s) inside the pod is healthy.
* To check if the container running in “nginx-pod” is indeed “web” we do the `kubectl describe pod/nginx-pod` command. This gives a lengthy output with a detailed description of the pod and the events that happened since the pod was created. This command is very useful for debugging. The part we are concerned here is this:
![](images/kube3.png)
You can see “web” under the Containers section with Image as nginx. This is what we are looking for.
* How do we access the welcome page of nginx “web” container? In the describe command you can see the IP address of the pod. Each pod is assigned an IP address on creation.
![](images/kube4.png)
Here, this is 10.244.1.3
* Issue a curl request from the host `curl 10.244.1.3:80`. You will get the welcome page!
* Lets say we want to use a specific tag of nginx (say 1.20.1) in the same pod i.e we want to modify some property of the pod. You can try editing nginx-pod.yaml (image: nginx:1.20.1 in #[9])and reapplying (step 2.). It will create a new container in the same pod with the new image.
A container is created within the pod but the pod is the same. You can verify by checking the pod start time in describe command. It would show a much older time.
You can actually see the nginx container by doing `docker ps` on the node01 terminal (if youre using Katacoda).
What if we want to change the image to 1.20.1 for 1000 nginx pods? Stepping a little back, what if we want to create 1000 nginx pods. Of course, we can write a script but Kubernetes already offers a resource type called “deployment” to manage large scale deployments better.
---
#### Lab 2:
Well go a step further to see how we can create more than a single instance of the nginx pod at the same time.
* We will first create Save the below manifest in a file called _nginx-deploy.yaml_
``` yaml
apiVersion: apps/v1
kind: Deployment #[1]
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3 #[2]
selector:
matchLabels:
app: nginx #[3]
template: #[4]
metadata:
labels:
app: nginx #[5]
spec:
containers:
- name: web
image: nginx
ports:
- name: web
containerPort: 80
protocol: "TCP"
```
You can see that it is similar to a pod definition till spec (`#[1]` has Deployment as kind, api version is also different).
Another thing interesting observation is the metadata and spec parts under `#[4]` is almost the same as the _metadata_ and _spec_ section under the Pod definition in Lab 1 (do go up and cross check this). What this implies is that we are deploying 3 nginx pods similar to Lab1.
Also, the labels in matchLabels should be the same as labels under `#[4]`.
* Now apply the manifest using `kubectl apply -f nginx-deploy.yaml`
![](images/kube5.png)
Verify that 3 pods are indeed created.
![](images/kube6.png)
If youre curious, check the output of `kubectl get deploy` and `kubectl describe deploy nginx-deployment`.
* Delete one of the 3 pods using `kubectl delete pod <pod name>`. After a few seconds again do `kubectl get pod`.
![](images/kube7.png)
You can see that a new pod is spawned to keep the total number of pods as 3 (see AGE 15s compared to others created 27 minutes ago)! This is a demonstration of how Kubernetes does fault tolerance.
This is a property of Kubernetes deployment object (kill the pod from Lab1, it wont be respawned :) )
* Lets say we want to increase the number of pods to 10. Try out `kubectl scale deploy --replicas=10 nginx-deployment`.
![](images/kube8.png)
You can see that 3/10 pods are older than the rest. This means Kubernetes has added 7 extra pods to scale the deployment to 10. This shows how simple it is to scale up and scale down containers using Kubernetes.
* Lets put all these pods behind a ClusterIP service. Execute `kubectl expose deployment nginx-deployment --name=nginx-service`.
![](images/kube9.png)
Curl the IP corresponding to 10.96.114.184. This curl request reaches one of the 10 pods in the deployment “nginx-deployment” in a round robin fashion. What happens when we execute the `expose` command is that a kubernetes `Service` is created of type Cluster IP so that all the pods behind this service are accessible through a single local IP (10.96.114.184, here).
It is possible to have a public IP instead (i.e an actual external load balancer) by creating a Service of type [LoadBalancer](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/). Do feel free to play around with it!
The above exercises a pretty good exposure to using Kubernetes to manage large scale deployments. Trust me, the process is very similar to the above for operating 1000 deployments and containers too! While a Deployment object is good enough for managing stateless applications, Kuberenetes provides other resources like Job, Daemonset, Cronjob, Statefulset etc. to manage special use cases.
**eAdditional labs:**
https://www.katacoda.com/lizrice/scenarios/kube-web
https://www.katacoda.com/courses/kubernetes (Huge number of free follow-along exercises to play with Kubernetes)
## Advanced topics
Most often than not, microservices orchestrated with Kubernetes contain dozens of instances of resources like deployment, services and configs. The manifests for these applications can be auto- generated with Helm templates and passed on as Helm charts. Similar to how we have PiPy for python packages there are remote repositories like Bitnami where Helm charts (e.g for setting up a production-ready Prometheus or Kafka with a single click) can be downloaded and used. [This is a good place to begin](https://www.digitalocean.com/community/tutorials/an-introduction-to-helm-the-package-manager-for-kubernetes).
Kuberenetes provides the flexibility to create our custom resources (similar to Deployment or the Pod which we saw). For instance, if you want to create 5 instances of a resource with kind as SchoolOfSre you can! The only thing is that you have to write your custom resource for it. You can also build a custom operator for your custom resource to take certain actions on the resource instance. You can check [here](https://www.redhat.com/en/topics/containers/what-is-a-kubernetes-operator) for more information.

@ -77,7 +77,14 @@ nav:
- Threat, Attacks & Defences: level101/security/threats_attacks_defences.md
- Writing Secure code: level101/security/writing_secure_code.md
- Conclusion: level101/security/conclusion.md
#- Level 102:
- Level 102:
- Linux Advanced:
- Containerization And Orchestration:
- Introduction: level102/containerization_and_orchestration/intro.md
- Introduction To Containers: level102/containerization_and_orchestration/intro_to_containers.md
- Containerization With Docker: level102/containerization_and_orchestration/containerization_with_docker.md
- Orchestration With Kubernetes: level102/containerization_and_orchestration/orchestration_with_kubernetes.md
- Conclusion: level102/containerization_and_orchestration/conclusion.md
- Contribute: CONTRIBUTING.md
- Code of Conduct: CODE_OF_CONDUCT.md

Loading…
Cancel
Save