This course is for those who want to understand Kubernetes monitoring components, including cluster and pod monitoring and deepen their knowledge of monitoring and management including the Kubernetes Dashboard, Prometheus and Grafana.
What is the structure of the lab?
The lab consists of two sections. The first covers Kubernetes monitoring theory, and the second provides hands-on keyboard command line experience. Each section is approximately 30 minutes long; however, your time may vary, depending on how quickly you pass either section. There are a total of 7 challenges to complete during this lab. Important: On multiple-choice questions, note that more than one answer may be correct. The lab is timed, so it’s best to complete in one sitting.
This section will cover topics and terminology for Kubernetes monitoring. Each topic will review material on-screen, then pose a challenge question. You must answer the question correctly to proceed to the next section. The theory section includes the following topics:
- Observability Fundamentals
- Prometheus Metrics
- Alerting and Recording Rules
- Grafana Dashboard Theory
Topic 1: Observability Fundamentals
Kasten is built following cloud native best practices. Among them there is observability. An architecture is Observable when each component publishes metrics about his state or his actions. Then you can leverage tools like Prometheus and Grafana to get a view of your backup infrastructure and receive alerts when something goes wrong. When adapting the concept of observability to software, we must also layer additional considerations that are specific to the software engineering domain. For a software application to have observability, you must be able to do the following:
- Understand the inner workings of your application
- Understand any system state your application may have gotten itself into, even new ones you have never seen before and couldn’t have predicted
- Understand the inner workings and system state solely by observing and interrogating with external tools
- Understand the internal state without shipping any new custom code to handle it (because that implies you needed prior knowledge to explain it)
Monitoring is one way of understanding how internal states of a system work. Kasten enables centralized monitoring of all its activity by integrating with Prometheus. It exposes a Prometheus endpoint from which a central system can extract data. K10 can be installed with Grafana in the same namespace. This instance of Grafana is setup to automatically query metrics from K10’s Prometheus instance. It also comes with a pre-created dashboard that helps visualize the status of K10’s operations such as backup, restore, export and import of applications.
Topic 2: Prometheus Metrics
Prometheus is an open source, metrics-based monitoring system. It does one thing very well: It has a simple yet powerful data model and a query language that lets you analyze how your applications and infrastructure are performing. Prometheus’s main features are:
- A multi-dimensional data model with time series data identified by metric name and key/value pairs
- PromQL, a flexible query language to leverage this dimensionality
- No reliance on distributed storage; single server nodes are autonomous
- Time series collection happens via a pull model over HTTP
- Pushing time series is supported via an intermediary gateway
- Targets are discovered via service discovery or static configuration
- Multiple modes of graphing and dashboarding support
In layperson terms, metrics are numeric measurements. Time series means that changes are recorded over time. What users want to measure differs from application to application. For a web server it might be request times, for a database it might be the number of active connections or the number of active queries etc. Metrics play an important role in understanding why your application is working in a certain way. Let’s assume you are running a web application and find that the application is slow. You will need some information to find out what is happening with your application. For example, the application can become slow when the number of requests is high. If you have the request count metric, you can spot the reason and increase the number of servers to handle the load. Prometheus provides a functional query language called PromQL (Prometheus Query Language) that lets the user select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus’s expression browser, or consumed by external systems via the HTTP API.
Topic 3: Alerting and recording rules
Knowing when things are going wrong is usually the most important thing that you want monitoring for. You want the monitoring system to call in a human to look. Alerting with Prometheus is separated into two parts:
- Alerting rules in Prometheus servers send alerts to an Alertmanager.
- The Alertmanager then manages those alerts, including silencing, inhibition, aggregation and sending out notifications via methods such as email, on-call notification systems, and chat platforms
Recording rules allow PromQL expressions to be evaluated on a regular basis and their results ingested into the storage engine. Alerting rules are another form of recording rules. They also evaluate PromQL expressions regularly, and any results from those expressions become alerts. Alerts are sent to the Alertmanager.
Topic 4: Grafana Dashboard Theory
Grafana is an open common tool used to create dashboards for monitoring.
- It’s an industry de facto
- It allows you to query, visualize, alert on and understand your metrics.
- It allows you to create, explore, and share beautiful dashboards with your team.
A dashboard is a set of one or more panels organized and arranged into one or more rows. Grafana ships with a variety of panels making it easy to construct the right queries and customize the visualization so that you can create the perfect dashboard for your needs. Each panel can interact with data from any Grafana configured data source.
Section 2: Hands-on Commands — Observability Lab
In the hands-on section of this lab, we will first provision a Kubernetes Cluster, installing Kasten K10, MinIo and MongoDB We will then proceed with commands that show us how to:
- Understand observability metrics live
- Create custom PromQL requests
- Explore the existing Grafana dashboard
- Create custom alerts on failed backup and catalog storage threshold both by mail and by sending messages in a Slack channel
Is there pre-work for the lab?
Yes. Be sure to complete reading and studying this blog post, the video showing the work to be performed during the lab, and the accompanying slides.