Prometheus Monitoring-The Definitive Guide

Prometheus Monitoring: The Definitive Guide in 2021 | Monitoring Prometheus Tutorial

Prometheus is open-source metrics-based monitoring like instrumentation, collection, and storage toolkit founded in 2012 at SoundCloud.

Moreover, It has a multi-dimensional data model and has a strong query language to query that data model. But DevOps engineer or a site reliability engineer should get difficulty about monitoring with Prometheus.

So, we have come up with Prometheus Monitoring: The Definitive Guide in 2021 Tutorial to clear all your questions like what is Prometheus, why do you need it? how effective is Prometheus? what are its limits? and many more.

Just be with this definitive guide on Prometheus monitoring and acquire indepth knowledge on Prometheus.

What You Will Learn? 

This tutorial is divided into three parts like what we have done with InfluxDB:

  • First, we will have a complete overview of Prometheus, its ecosystem, and the key aspects of fast-growing technology.
  • In a second part, you will be presented with illustrated explanations of the technical terms of Prometheus. If you are unfamiliar with metrics, labels, instances, or exporters, this is definitely the chapter to read.
  • Lastly, we will identify multiple existing business cases where Prometheus is used. This chapter can give you some inspiration to emulate what other successful companies are doing.

What is Monitoring?

The process of collecting, analyzing, and displaying useful information about a system is known as Monitoring. It covers primarily the following stuff:

  • Alerting: The main aim of monitoring is to identify when the system fails and generate an alert.
  • Debugging tools or Techniques: Another different goal for monitoring is collecting the data that supports debug a problem or failure.
  • Generating Trends: The data collected by a monitoring system can be used to generate trends that help to adapt to the upcoming changes in the system.

What is Prometheus Monitoring?

Prometheus is a time-series database. For those of you who are unfamiliar with what time series databases are, the first module of my InfluxDB guide explains it in depth.

But Prometheus isn’t only a time-series database.

It spans an entire ecosystem of tools that can bind to it in order to bring some new functionalities.

Prometheus is designed to monitor targets. Servers, databases, standalone virtual machines, pretty much everything can be monitored with Prometheus.

In order to monitor systems, Prometheus will periodically scrap them.

What do we mean by target scraping?

Prometheus expects to retrieve metrics via HTTP calls done to certain endpoints that are defined in the Prometheus configuration.

what-does-prometheus-do

If we take the example of a web application, located at http://localhost:3000 for example, your app will expose metrics as plain text at a certain URL, http://localhost:3000/metrics for example.

From there, on a given scrap interval, Prometheus will pull data from this target.

How does Prometheus work?

As stated before, Prometheus is composed of a wide variety of different components.

First, you want to be able to extract metrics from your systems. You can do it in very different ways :

  • By ‘instrumenting‘ your application, meaning that your application will expose Prometheus compatible metrics on a given URL. Prometheus will define it as a target and scrap it on a given period.
  • By using the prebuilt exporters: Prometheus has an entire collection of exporters for existing technologies. You can for example find prebuilt exporters for Linux machine monitoring (node exporter), for very established databases (SQL exporter or MongoDB exporter), and even for HTTP load balancers (such as the HAProxy exporter).
  • By using the Pushgateway: sometimes you have applications or jobs that do not expose metrics directly. Those applications are either not designed for it (such as batch jobs for example), or you may have made the choice not to expose those metrics directly via your app.

As you understood it, and if we ignore the rare cases where you might use Pushgateway, Prometheus is a pull-based monitoring system.

ways-gather-metrics

What does that even mean? Why did they make it that way?

Pull vs Push

There is a noticeable difference between Prometheus monitoring and other time-series databases: Prometheus actively screens targets in order to retrieve metrics from them.

This is very different from InfluxDB for example, where you would essentially push data directly to it.

pull-vs-push

Both approaches have their advantages and inconvenience. From the literature available on the subject, here’s a list of reasons behind this architectural choice:

  • Centralized control: if Prometheus initiates queries to its targets, your whole configuration is done on the Prometheus server-side and not on your individual targets.

Prometheus is the one deciding who to scrap and how often you should scrap them.

With a push-based system, you may have the risk of sending too much data towards your server and essentially crash them. A pull-based system enables a rate control with the flexibility of having multiple scrap configurations, thus multiple rates for different targets.

  • Prometheus is meant to store aggregated metrics.

This point is actually an addendum to the first part that discussed Prometheus’s role.

Prometheus is not an event-based system and this is very different from other time-series databases. Prometheus is not designed to catch individual and punctual events in time (such as a service outage for example) but it is designed to gather pre-aggregated metrics about your services.

Concretely, you won’t send a 404 error message from your web service along with the message that caused the error, but you will send the fact your service received one 404 error message in the last five minutes.

This is the basic difference between a time series database targeted for aggregated metrics and one designed to gather ‘raw metrics’

Prometheus monitoring rich ecosystem

The main functionality of Prometheus, besides monitoring, is being a time series database.

However, when playing with Time Series Database, you often need to visualize them, analyze them, and have some custom alerting on them.

Here are the tools that compose Prometheus ecosystem to enrich its functionalities :

  • Alertmanager: Prometheus pushes alerts to the Alertmanager via custom rules defined in configuration files. From there, you can export them to multiple endpoints such as Pagerduty or Slack.
  • Data visualization: Similar to Grafana, you can visualize your time series directly in Prometheus Web UI. You can easily filter and have a concrete overview of what’s happening on your different targets.
  • Service discovery: Prometheus can discover your targets dynamically and automatically scrap new targets on demand. This is particularly handy when playing with containers that can change their addresses dynamically depending on demand.

c – Prometheus monitoring rich ecosystem

Credits: Oreilly publication

Concepts Explained About Prometheus Monitoring

As we did for InfluxDB, we are going to go through a curated list of all the technical terms behind monitoring with Prometheus.

a – Key-Value Data Model

Before starting with Prometheus tools, it is very important to get a complete understanding of the data model.

Prometheus works with key value pairs. The key describes what you are measuring while the value stores the actual measurement value, as a number.

Remember: Prometheus is not meant to store raw information like plain text as it stores metrics aggregated over time.

The key in this case is called a metric. It could be for example a CPU rate or memory usage.

But what if you wanted to give more details about your metric?

What if my CPU has four cores and I want to have four separate metrics for them?

This is where the concept of labels comes into play. Labels are designed to provide more details to your metrics by appending additional fields to it. You would not simply describe the CPU rate but you would describe the CPU rate for core one located at a certain IP for example.

basic-filtering

Later on, you would be able to filter metrics based on labels and retrieve exactly what you are looking for.

b – Metric Types

When monitoring a metric, there are essentially four ways you can describe it with Prometheus. I encourage you to follow until the end as there is a little bit of a gotcha with those points.

Counter

Probably the simplest form of the metric type you can use. A counter, as its name describes, counts elements over time.

If for example, you want to count the number of HTTP errors on your servers or the number of visits on your website, you probably want to use a counter for that.

As you would physically imagine it, a counter only goes up or resets.

As a consequence, a counter is not naturally adapted for values that can go down or for negative ones.

A counter is particularly suited to count the number of occurrences of a certain event on a period, i.e the rate at which your metric evolved over time.

Now, what if you wanted to measure the current memory usage at a given time for example?

The memory usage can go down, how can it be measured with Prometheus?

Gauges

Meet gauges!

Gauges are designed to handle values that may decrease over time. Visually, they are like thermomethers: at any given time, if you observe the thermomether, you would be able to see the current temperature value.

But, if gauges can go up and down, accept positive or negative values, aren’t they a superset of counters?

Are counters useless?

This is what I thought when I first met gauges. If they can do anything, let’s use gauges everywhere, right?

Wrong.

Gauges are perfect when you want to monitor the current value of a metric that can decrease over time.

However, and this is the big gotcha of this section: gauges can’t be used when you want to see the evolution of your metrics over time. When using gauges, you may essentially lose the sporadic changes of your metrics over time.

Why? Here’s /u/justinDavidow answer on this.

justin-davidow

If your system is sending metrics every 5 seconds, and your Prometheus is scraping the target every 15 seconds, you may lose some metrics in the process. If you perform additional computation on those metrics, your results are less and less accurate.

With a counter, every value is aggregated in it. When Prometheus scraps it, it will be aware that a value was sent during a scraping interval.

End of the gotcha!

Histogram

The histogram is a more complex metric type. It provides additional information for your metrics such as the sum of the observations and the count of them.

Values are aggregated in buckets with configurable upper bounds. It means that with histograms you are able to :

  • Compute averages: as they represent the fraction of the sum of your values divided by the number of values recorded.
  • Compute fractional measurements on your values: this is a very powerful tool as it allows you to know for a given bucket how many values follow given criteria. This is especially interesting when you want to monitor proportions or establish quality indicators.

In a real-world context, I want to be alerted when 20% of my servers respond in more than 300ms or when my servers respond in more than 300ms more than 20% of the time.

As soon as proportions are involved, histograms can and should be used.

Summaries

Summaries are an extension of histograms. Besides also providing the sum and the count of observations, they provide quantiles metrics on sliding windows.

As a reminder, quantiles are ways to divide your probability density into ranges of equal probability.

So, histograms or summaries?

Essentially, the intent is different.

Histograms aggregate values over time, giving a sum and a count function that makes it easy to see the evolution of a given metric.

On the other hand, summaries expose quantiles over sliding windows (i.e continuously evolving over time).

This is particularly handy to get the value that represents 95% of the values recorded over time.

c – Jobs & Instances

With the recent advancements made in distributed architectures and the popularity rise of cloud solutions, you don’t simply have one server standing there, alone.

Servers are replicated and distributed all over the world.

To illustrate this point, let’s take a very classic architecture made of 2 HAProxy servers that redistribute the load on nine backend web servers (No, no, definitely not the Stackoverflow stack..)

In this real-world example, we want to monitor the number of HTTP errors returned by the web servers.

Using Prometheus language, a single web server unit is called an instance. The job would be the fact that you measure the number of HTTP errors for all your instances.

jobs-instances

The best part would probably be the fact that jobs and instances become fields in your labels and that you are able to filter your results for a given instance or a given job.

Handy!

d – PromQL

You may already know InfluxQL if you are using InfluxDB based databases. Or maybe you stuck to SQL by using TimescaleDB.

Prometheus also has its own embedded language that facilitates querying and retrieving data from your Prometheus servers: PromQL.

As we saw in the previous sections, data are represented using key value pairs. PromQL is not far from this as it keeps the same syntax and returns results as vectors.

What vectors?

With Prometheus and PromQL, you will be dealing with two kinds of vectors :

  • Instant vectors: giving a representation of all the metrics tracked at the most recent timestamp;
  • Time ranged vectors: if you want to see the evolution of a metric over time, you can query Prometheus with custom time ranges. The result will be a vector aggregating all the values recorded for the period selected.

vectors

PromQL APIs expose a set of functions that facilitate data manipulation for your queries.

You can sort your values, apply mathematical functions over it (like derivative or exponential functions) and even have some forecasting functions (like the Holt-Winters function).

e – Instrumentation

Instrumentation is another big part of Prometheus. Before retrieving data from your applications, you want to instrument them.

Instrumentation in Prometheus terms means adding client libraries to your application in order for them to expose metrics to Prometheus.

Instrumentation can be done for most of the existing programming languages like Python, Java, Ruby, Go, and even Node or C# applications.

While instrumenting, you will essentially create memory objects (like gauges or counters) that you will increment or decrement on the fly.

Later on, you will choose where to expose your metrics. From there, Prometheus will pick them up and store them into its embedded time-series database.

app-instrumentation

f – Exporters

For custom applications, the instrumentation is very handy at it allows you to customize the metrics exposed and how they are changed over time.

For ‘well-known’ applications, servers, or databases, Prometheus is built with vendors exporters that you can use in order to monitor your targets. This is the main way of monitoring targets with Prometheus.

Those exporters, most of the time exposed as Docker images, are easily configurable to monitor your existing targets. They expose a preset of metrics and often preset dashboards to get your monitoring set in minutes.

Examples of exporters include :

  • Database exporters: for MongoDB databases, SQL servers, and MySQL servers.
  • HTTP exporters: for HAProxy, Apache, or NGINX servers.
  • Unix exporters: you can monitor system performance using built node exporters that expose complete system metrics out of the box.

exporters-prometheus

A word on interoperability

Most time-series databases work hand in hand to promote the interoperability of their respective systems.

Prometheus isn’t the only monitoring system to have a stance on how systems should expose metrics: existing systems such as InfluxDB (via Telegraf), CollectD, StatsD, or Nagios are opiniated on this subject.

However, exporters have been built in order to promote communication between those different systems. Even if Telegraf sends metrics in a format that is different from what Prometheus expects, Telegraf can send metrics to the InfluxDB exporter and will be scraped by Prometheus afterward.

g – Alerts

When dealing with a time-series database, you often want to get feedback from your data and this point is essentially covered by alert managers.

Alerts are a very common thing in Grafana but they are also included in Prometheus via the alert manager.

An alert manager is a standalone tool that binds to Prometheus and runs custom alerts.

Alerts are defined via a configuration file and they define a set of rules on your metrics. If those rules are met in your time series, the alerts are triggered and sent to a predefined destination.

Likewise to what you would find in Grafana, alert destinations include email alerts, Slack webhooks, PagerDuty, and custom HTTP targets.
alerting-prometheus-1

Prometheus Monitoring Use Cases

As you already know, every definitive guide ends up with a reality check. As I like to say it, technology isn’t an end in itself and should always serve a purpose.

This is what we are going to talk about in this chapter.

a – DevOps Industry

With all the exporters built for systems, databases, and servers, the primary target of Prometheus is clearly targeting the DevOps industry.

As we all know, a lot of vendors are competing for this industry and providing custom solutions for it.

Prometheus is an ideal candidate for DevOps.

The necessary effort to get your instances up and running is very low and every satellite tool can be easily activated and configured on-demand.

Target discovery, via a file exporter, for example, makes it also an ideal solution for stacks that rely heavily on containers and on distributed architectures.

In a world where instances are created as fast as they are destroyed, service discovery is a must-have for every DevOps stack.

b – Healthcare

Nowadays, monitoring solutions are not made only for IT professionals. They are also made to support large industries, providing resilient and scalable architectures for healthcare.

As the demand grows more and more, the IT architectures deployed have to match that demand. Without a reliable way to monitor your entire infrastructure, you may run the risk of having massive outages on your services. Needless to say that those dangers have to be minimized for healthcare solutions.

c – Financial Services

The last example was selected from a conference held by InfoQ discussing using Prometheus for financial institutions.

The talk was presented by Jamie Christian and Alan Strader that showcased how exactly they are using Prometheus to monitor their infrastructure at Northern Trusts. Definitely instructive and worth watching.

Going Further

As I like to believe it, there is a time for theory and a time for practice.

Today, you have been introduced to the fundamentals of Prometheus monitoring, what Prometheus helps you to achieve, its ecosystem, as well as an entire glossary of technical terms, explained.

To begin your Prometheus monitoring journey, start by taking a look at all the exporters that Prometheus offers. Accordingly, install the tools needed, create your first dashboard, and you are ready to go!

If you need some inspiration, I wrote an article on Linux process monitoring with Prometheus and Grafana. It helps setting up the tools as well as getting your first dashboard done.

If you would rather stick to official exporters, here is a complete guide for the node exporter.

I hope you learned something new today.

If there is a subject you want me to describe in a future article, make you leave your idea, it always helps.

Until then, have fun, as always.

GUJGASLTD Pivot Calculator

How To Install InfluxDB Telegraf and Grafana on Docker

How To Install InfluxDB Telegraf and Grafana on Docker

Finding all monitoring platforms installation on Docker is very tricky but we are yet with this magical tutorial where you all gonna learn How To Install InfluxDB Telegraf and Grafana on Docker. Not only installation, configuration, and setup but also basics about three platforms such as InfluxDB Telegraf and Grafana.

InfluxDB is one of the most used time-series databases nowadays and was created in 2013 by InfluxData. Telegraf is a plugin-driven server agent for collecting and sending metrics and events from databases, systems, and IoT sensors. Grafana is a free and open-source (FOSS/OSS) visualization tool utilized on top of a variety of diverse data stores but is most commonly used together with Graphite, InfluxDB, Prometheus, and Elasticsearch

On the other hand, Docker is a virtualization environment that provides an easy way to create, manage and delete containers on the fly.

All these were the most widely used for monitoring and dashboarding in the DevOps industry by many successful companies in the world. If you are seeking help to install any one of these tools on docker then follow this tutorial till to the end and make installation of InfluxDB Telegraf and Grafana on Docker easily. 

Note: InfluxDB is currently shifting to Influx 2.0, providing a single platform to manage all the components of the TICK stack. As a result, another tutorial will be available for InfluxDB 2.0.

Prerequisites

The requirements that you should do prior to install InfluxDB Telegraf and Grafana on Docker are as follows:

Sudo privileges

First of all, you need to have sudo rights on your Linux machine, otherwise, you won’t be able to install InfluxDB on your host.

To verify it, run the following command

$ sudo -v

If no error messages are shown on your terminal, you are good to go.

Next, you want to make sure that Docker is correctly installed on your system.

Installation & Configuration of Docker Correctly

To install Docker on Ubuntu and Debian, you can follow this tutorial. It provides great and extensive details on how to correctly set up Docker on Linux.

Again, to verify that Docker is correctly installed, you can run the following command.

$ docker --version
Docker version 19.03.1, build 74b1e89

Now that Docker is ready, let’s have a quick look at the networking strategy we are going to use for our containers.

Also Check: 

Designing the network strategy for InfluxDB

Before starting, it is important to have a few words about networking.

By default, containers are created to run on the bridge network stack.

By default, you will install InfluxDB, which will expose useful ports (like the 8086 one) to your current network stack.

Later on, you will bind Telegraf to it, but Telegraf does not have to expose any ports to your current host stack.

As a consequence, we can run InfluxDB on the default bridge network, and have Telegraf running in the same stack as InfluxDB.

Later on, we will add Grafana to our bridge network in order to visualize metrics gathered by Telegraf

Designing the network strategy for InfluxDB docker-bridge

Now that we have seen the network strategy we are going to use, let’s install the InfluxDB container for Docker.

To install InfluxDB on Docker, you have two ways of doing it.

You can either prepare your filesystem manually and run the InfluxDB on a Docker container with no initialization scripts.

This is the simplest way of initializing InfluxDB.

This method should be used if you plan on running InfluxDB on a single instance, if your initial InfluxDB configuration is very simple, or if you prefer to have full control over your containers.

However, there is a way to initialize InfluxDB with scripts (either bash scripts, or InfluxQL scripts).

This is the version that you should use if you are automating a lot of servers with InfluxDB (with Chef or Puppet for example), and you want to have the same initial setup on all your instances.

Installing InfluxDB 1.7.x on Docker

https://www.youtube.com/watch?v=0CpHwszFjUY

The official InfluxDB image for Docker is named: influxdb.

Installing InfluxDB 1.7.x on Docker influxdb-docker-hub

It is part of the Docker Official Images, so you can check that you are running an official version of InfluxDB on your system.

Moreover, the other tools of the TICK stack (Telegraf, InfluxDB, Chronograf, and Kapacitor) belong to the Docker Official Images.

The InfluxDB image is going to install the InfluxDB server responsible for storing time-series metrics on your system.

If you are familiar with Docker, you already know that you can map volumes from your local filesystem to your container in order to manipulate data easier in your container.

This is exactly what we are going to do in this tutorial.

Configuration files, as well as directories storing actual data, will be stored on our local filesystem.

a – Prepare InfluxDB 1.7.x for Docker

If you carefully followed the tutorial on setting up InfluxDB on Ubuntu, you know that you are going to create a specific user for your InfluxDB database.

$ sudo useradd -rs /bin/false influxdb

In your etc directory, create a new folder for your InfluxDB configuration files.

$ sudo mkdir -p /etc/influxdb

Creating a configuration file for InfluxDB and Docker

Luckily, you don’t have to create an InfluxDB configuration file by yourself.

To create an InfluxDB configuration file using Docker, run the following command.

docker run --rm influxdb influxd config | sudo tee /etc/influxdb/influxdb.conf > /dev/null

As a quick explanation, the “influxd config” command will print a full InfluxDB configuration file for you on the standard output (which is by default your shell)

As the –rm option is set, Docker will run a container in order to execute this command and the container will be deleted as soon as it exists.

Instead of having the configuration file printed on the standard output, it will be redirected to our InfluxDB configuration file.

Next, reassign the folder permissions for your newly created file, otherwise, your container won’t be able to interact with it properly.

$ sudo chown influxdb:influxdb /etc/influxdb/*

influxdb-conf-permissions

Creating a lib folder for InfluxDB and Docker

As stated in the documentation, InfluxDB stores its data, metadata as well as the WAL (for write-ahead log) in the /var/lib/influxdb folder by default.

As a consequence, you have to create this folder if it is not currently existing.

$ sudo mkdir -p /var/lib/influxdb

Again, make sure that the permissions are correctly set for your container to write into this folder.

$ sudo chown influxdb:influxdb /var/lib/influxdb

Creating a lib folder for InfluxDB and Docker permissions-var-lib

Now that our folders are ready, let’s see how we can initialize InfluxDB with custom scripts.

Preparing initialization scripts for InfluxDB on Docker (optional)

With the InfluxDB image, there is a way to automate database initialization on your containers.

As an example, we will instruct our Docker container to create an administrator account, a regular user account (for Telegraf), and a database with custom retention via a custom InfluxQL script.

Anatomy of the InfluxDB image

On container boot, the entrypoint.sh script is executed, it is set as the entrypoint of your Docker container.

The entry point can be executed in two ways.

You can execute the entry point script in order to launch a simple InfluxDB instance on your container.

This is for example what we have done in the previous section. We specified the configuration flag and it was used in order to set your InfluxDB server initialization.

However, there is a second way to execute the entrypoint script: by executing the init-influxdb script.

The init-influxdb script is made of two parts :

  • First, it will watch for environments variables passed to your docker command, and it will execute commands accordingly
  • Next, if you have a docker-entrypoint-initdb.d directory at the root directory of your container, it will execute either bash scripts or IQL scripts in it.

We are going to use that information to create our InfluxDB container.

First, make sure that no folders are already created in your /var/lib/influxdb folder.

$ ls -l /var/lib/influxdb
total 0

Execute the following command for the meta folder (in the influxdb folder) to be updated with the correct information.

As a reminder, we want an admin account and a regular account for Telegraf (named telegraf)

Creating initialization scripts on your host

In order for the initialization scripts to be run on initialization, they have to be mapped to the docker-entrypoint-initdb.d folder in your container.

First, create a scripts folder on your host wherever you want.

In my case, it is going to be created in /etc/influxdb.

$ sudo mkdir -p /etc/influxdb/scripts

Edit a new script file on your newly created folder, and make sure to give it a .iql extension.

$ sudo touch influxdb-init.iql

CREATE DATABASE weather;
CREATE RETENTION POLICY one_week ON weather DURATION 168h REPLICATION 1 DEFAULT;

This simple initialization script will create a database for weather data, and it will assign a one-week retention policy for it.

Great!

The last step will be to prepare our meta folder for InfluxDB initialization.

Creating/updating the InfluxDB Meta Database

In order to update your meta database, run the following command

$ docker run --rm -e INFLUXDB_HTTP_AUTH_ENABLED=true \
         -e INFLUXDB_ADMIN_USER=admin \
         -e INFLUXDB_ADMIN_PASSWORD=admin123 \
         -v /var/lib/influxdb:/var/lib/influxdb \
         -v /etc/influxdb/scripts:/docker-entrypoint-initdb.d \
         influxdb /init-influxdb.sh

Note: Setting the INFLUXDB_HTTP_AUTH_ENABLED to true does not mean that authentication is enabled on your InfluxDB server. Authentication is enabled in one of the next sections, this parameter is only used for the initialization script.

A couple of logs should be printed on your terminal.

If this is not the case, make sure that you specified the correct environments variables for your container.

Creating updating the InfluxDB meta database influxdb-init

If you chose to create initialization scripts for your container, you should also have a log line for it.

Creating updating the InfluxDB meta database init-scripts

As the last verification step, you can inspect your meta.db file in your meta folder to make sure that the changes were correctly written.

$ cat /var/lib/influxdb/meta/meta.db | grep one_week

Creating updating the InfluxDB meta database meta-file

Now that your InfluxDB files are prepared, let’s head over to some configuration verifications.

b – Verifying your InfluxDB configuration for Docker

If you used the configuration command detailed in the section above, you should be presented with a simple configuration file in the /etc/influxdb folder.

Open your file and verify that everything is correct.

HTTP Interface

Head over to the [http] section of your configuration and make sure that it is enabled.

Verify the bind-address that should be 8086 by default.

This is the port that you are going to use to send some commands to your InfluxDB database, like creating a database or adding a user for example.

By default, authentication and encryption are disabled. However, sections of this tutorial explain how you can set up authentication in depth.

Data, meta, and WAL configurations

By default, your configuration file should have the paths that we created in the first section, so you don’t have to change anything.

However, you should check that your paths are correct.

[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  wal-dir = "/var/lib/influxdb/wal"

Running the InfluxDB container on Docker

We are going to use the InfluxDB image from the official Docker repositories.

As a quick reminder, you need to use the docker container run command in order to start a Docker container.

First, make sure that nothing is running on port 8086.

$ sudo netstat -tulpn | grep 8086

If you remember correctly, we configured our folders to be accessible by the InfluxDB user (belonging in the InfluxDB group).

As a consequence, we will need the user ID of the InfluxDB user in order to run our container.

To find the InfluxDB user ID, head over to the passwd file on your host and run

$ cat /etc/passwd | grep influxdb
influxdb:x:997:997::/var/lib/influxdb:/bin/false

As you can see, the user ID for my InfluxDB user is 997.

Note: The user ID will surely be different on your system, and you should modify it accordingly when running the docker command.

To start InfluxDB on Docker, run the following command.

docker run -d -p 8086:8086 --user 997:997 --name=influxdb \ 
-v /etc/influxdb/influxdb.conf:/etc/influxdb/influxdb.conf \ 
-v /var/lib/influxdb:/var/lib/influxdb \ 
influxdb \ 
-config /etc/influxdb/influxdb.conf

Testing your InfluxDB container

In order to test if your InfluxDB container is correctly running, you can check that the HTTP API is correctly enabled.

$ curl -G http://localhost:8086/query --data-urlencode "q=SHOW DATABASES"

Testing your InfluxDB container show-databases-query

You can also check that your InfluxDB server is correctly listening on port 8086 on your host.

$ netstat -tulpn | grep 8086
tcp6    0    0 :::8086      :::*       LISTEN       -

Awesome!

Your InfluxDB container is correctly running on Docker.

By default, your InfluxDB server does not contain any databases except for the _internal used, as its name describes, internal metrics about InfluxDB itself.

However, if you created initialization scripts for your InfluxDB database, make sure that your databases and retention policies are correctly assigned.

$ influx
Connected to http://localhost:8086 version 1.7.8
InfluxDB shell version: 1.7.7

> SHOW USERS

user    admin
----    -----
admin   true

> SHOW DATABASES
name: databases
name
----
weather

Enabling authentication on InfluxDB for Docker

In order to enable authentication for InfluxDB 1.7.x, you are going to create an administrator account for your InfluxDB database (if you didn’t use initialization scripts)

Create an administrator account with docker exec

You don’t have to create an administrator account if you initialized your InfluxDB image with environment variables in the previous sections.

This is only necessary if you choose a fully customized InfluxDB image that you configure yourself.

To create an administrator account, connect to a bash process in your container and run the influx utility by yourself.

To achieve that, run the following commands

$ docker container ls

Create an administrator account with docker exec docker-container-ls

Note: Your container is not appearing here? Run this command with the -a (for all) flag to make sure that your container hasn’t crashed.

Identify the container ID of your InfluxDB container, and run the following command to have a bash in your container.

$ docker exec -it <container_id> /bin/bash

As a reminder, the docker exec is used in order to run a command in a running container.

Here are the options specified with it :

  • -i: for interactive, it will keep the standard input open even if not attached
  • -t: to allocate a pseudo-TTY to your current shell environment.

Right now, you should have a shell prompt, similar to this :

docker-exec

In your container, run the influx utility to create your administrator account.

$ influx
Connected to http://localhost:8086 version 1.7.8
InfluxDB shell version: 1.7.8
> CREATE USER admin WITH PASSWORD 'admin123' WITH ALL PRIVILEGES
> SHOW USERS
user  admin
----  -----
admin true

Now that you have an administrator account, you can enable HTTP authentication for your database.

Enable HTTP Authentication in your configuration file

To achieve that, exit your container, and head to the configuration folder you created for InfluxDB.

Ctrl + D (to exit your container)

$ sudo nano /etc/influxdb/influxdb.conf

[http]
  enabled = true
  bind-address = ":8086"
  auth-enabled = true

Save your file and restart your container for the changes to be applied.

$ docker container restart <container_id>

To make sure that your changes are effective, try querying the HTTP API again.

You should be unable to execute a query without specifying the correct credentials.

$ curl -G http://localhost:8086/query --data-urlencode "q=SHOW DATABASES"
{"error":"unable to parse authentication credentials"}

Great!

Authentication is correctly enabled.

Let’s try to execute the InfluxQL query again with the correct credentials.

$ curl -G -u admin:admin123 http://localhost:8086/query --data-urlencode "q=SHOW DATABASES"

{"results":[{"statement_id":0,"series":[{"name":"databases","columns":["name"],"values":[["_internal"]]}]}]}

With this curl command, we made sure that our credentials were correctly set up for our InfluxDB server.

Now that your time-series database is up and running, it is time to install our metrics collection agent: Telegraf.

Installing Telegraf on Docker

For those who are not familiar with Telegraf, Telegraf is a plugin-driven agent that periodically collects metrics about a variety of different systems.

The metrics are pushed to InfluxDB and they can be later on analyzed in Chronograf or Grafana.

Luckily, Telegraf also belongs to the official Docker images.

Installing Telegraf on Docker telegraf-docker-huyb

In this section, we are going to configure Telegraf for it to gather system metrics on our Linux host (in this case, a Debian 10, but it works in the same way for Ubuntu-based ones)

Prepare Telegraf for InfluxDB and Docker

Likewise to our InfluxDB setup, we are going to create a Telegraf user for our host. It ensures that correct permissions are set for our future configuration files.

$ sudo useradd -rs /bin/false telegraf

In your etc directory, create a new folder for your Telegraf configuration files.

$ sudo mkdir -p /etc/telegraf

Creating a configuration file for Telegraf and Docker

Again, we don’t have to create a Telegraf configuration file by ourselves.

The Telegraf Docker image is built very close to the InfluxDB one.

As a consequence, it is able to run a simple telegraf config command to generate a configuration on the fly.

By default, the Telegraf configuration file has the following defaults:

  • interval: 10 seconds. Telegraf is going to gather and send metrics to InfluxDB every 10 seconds.
  • round_interval: true. The agent is going to collect metrics on :00, :10, or :(00 + n*interval)
  • The InfluxDB output plugin is enabled by default.
  • The CPUdiskdiskiokernelmemoryprocessesswap, and system inputs plugins are enabled. As those inputs use the /proc mountpoint to gather metrics, we will have to remap volumes on the container.

To create a Telegraf configuration file using Docker, run the following command.

docker run --rm telegraf telegraf config | sudo tee /etc/telegraf/telegraf.conf > /dev/null

Creating a configuration file for Telegraf and Docker telegraf-config

Next, reassign the correct permissions to your Telegraf configuration folder.

This will ensure that only Telegraf itself and the root account are able to write to the configuration file.

$ sudo chown telegraf:telegraf /etc/telegraf/*

Modify your Telegraf configuration file

With Telegraf, most of the time, you will want to send metrics directly to InfluxDB itself.

This is why the InfluxDB output is enabled by default in your Telegraf configuration file.

By default, Telegraf will send metrics to a database named “telegraf” on InfluxDB.

This is a customizable parameter, however in this case we are only going to specify the InfluxDB authentication parameters.

Edit your Telegraf configuration file, and locate the [[outputs.influxdb]] section.

Modify your Telegraf configuration file outputs-influxbd

In this configuration file, locate the “HTTP Basic Auth” section and modify the credentials accordingly.

## HTTP Basic Auth
username = "admin"
password = "admin123"

You are of course free to create a dedicated administrator account for Telegraf by using the method we described above (using docker exec)

Save and exit your file, now it is time to run the container.

Running the Telegraf container on Docker

As stated before, Telegraf enables system inputs by default.

As a consequence, we will have to remap the /proc host folder to the /host folder on our Docker image.

This is made to ensure that Telegraf is not gathering metrics from the Docker container itself and that the container filesystem is not altered in any way.

To achieve this, unless you gave a name to your InfluxDB container, run this command to get your InfluxDB container ID.

It will be used to connect Telegraf and InfluxDB to the same virtual network.

$ docker container ls | grep influxdb
1939ba611410   influxdb   "/entrypoint.sh -conf..."   24 minutes ago    Up 30 minutes    0.0.0.0:8086->8086/tcp    ecstatic_moore

Isolate your Telegraf user ID by running the following command.

$ getent passwd | grep telegraf
telegraf:x:998:998::/etc/telegraf:/bin/false

Next, to run the Telegraf Docker image, run the following command.

$ docker run -d --user 998:998 --name=telegraf \
      --net=container:<influx_container_id> \
      -e HOST_PROC=/host/proc \
      -v /proc:/host/proc:ro \
      -v /etc/telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
      telegraf

Note: The net option can be replaced by –net=influxdb if you chose to create your InfluxDB container with a name.

Great!

To make sure your Telegraf instance is running correctly, run the following command.

$ docker container logs -f --since 10m telegraf

telegraf-logs

Telegraf seems no to raise any error messages, but let’s double-check the correctness of our setup by inspecting the InfluxDB database.

$ docker exec -it <container_id> influx -username admin -password admin123

InfluxDB shell version 1.7.8
> SHOW DATABASES

name: databases
name
----
weather
_internal
telegraf

> USE telegraf
> SELECT * FROM cpu WHERE time < now() - 1m

If you are seeing data points, congratulations!

Your Telegraf instance is correctly sending metrics to your InfluxDB server.

telegraf-metrics

Now that all metrics are stored in Telegraf, for one week, we can install a modern dashboarding tool in order to visualize them: Grafana.

Visualizing Telegraf metrics in Grafana

For those who are unfamiliar with Grafana, Grafana is a dashboarding tool that binds to a wide variety of different datasources in order to create beautiful graphs.

It can bind to traditional SQL databases, but it can also bind to time series databases, which is what we are going to do in this section.

Installing Grafana on Docker

The Grafana docker image is stored under the Grafana repository.

Installing Grafana on Docker grafana-image

To create a Grafana container, run the following command on your host.

$ docker run -d --name=grafana -p 3000:3000 grafana/grafana

A Grafana server container should now be up and running on your host. To make sure of it, run the following command.

$ docker container ls | grep grafana

You can also make sure that it is correctly listening on port 3000.

$ netstat -tulpn | grep 3000

Configuring Grafana for InfluxDB

With your web browser, head over to

You should be redirected to the Grafana home page.

The default credentials for Grafana are admin/admin.

Immediately, you are asked to change your password. Choose a strong password and click on “Save

Configuring Grafana for InfluxDB grafana-default-pass

You should now be redirected to the Grafana default Web UI.

Configuring Grafana for InfluxDB add-datasource

Click on “Add data source” to add an InfluxDB datasource.

Configuring Grafana for InfluxDB influxdb-datasource

Next, select the InfluxDB option and click on “Select“.

For this tutorial, we are not using InfluxDB images as Docker services. As a consequence, we have to isolate InfluxDB public IP on our bridge network.

To do that, run the following command

$ docker network inspect bridge | grep influxdb -A 5
"Name": "influxdb",
"EndpointID": "7e4eb0574a346687efbb96b6b45",
"MacAddress": "02:42:ac:11:00:04",
"IPv4Address": "172.17.0.2/16",
"IPv6Address": ""

Copy the IPv4 address and paste it into the InfluxDB configuration for Grafana.

Select the Basic Auth option, specify your administrator credentials, and fill in the details about your InfluxDB database.

Here is the final configuration.
influxdb-config-grafana

Click on “Save and Test” to make sure that your configuration is working properly.

Awesome!

Now that everything is set up in Grafana, let’s import a dashboard designed for Telegraf host metrics.

Importing a Grafana dashboard

To import a Grafana dashboard, select the “Plus” icon in the left menu, and click on “Import“.

Importing a Grafana dashboard grafana-import

In the import text box, put 1443 as a dashboard ID.

Importing a Grafana dashboard id-1443

In the next window, make sure that the information is correct and bind it to your newly created datasource.

Importing a Grafana dashboard import-dashboard-grafana

Click on “Import” and your dashboard should be up and running!

Importing a Grafana dashboard final-dashboard

Do Check Some Other Grafana Tutorials from here: 

How to setup telegraph, influxdb, and grafana with Docker compose?

Conclusion

Congratulations, you learned how to install InfluxDB 1.7.x, Telegraf and Grafana using Docker.

As you can realize that this tutorial concentrates on a fully customizable installation of your images. In case you want to automate your container setups, it might be a good idea to use docker-compose.

Furthermore, ensure to read through the entire list of Telegraf input plugins to start getting ideas about what to monitor. Until then, have fun & enjoy the work with discovering our docker tutorials, as always.

InfluxDays London Recap

InfluxDays London 2021 Recap | Key Facts of InfluxDays London 2021

InfluxDays London is an annual convention held by InfluxData on the 13th and 14th of June 2019, where you get to know some general insights into the recent developments of the company and the industry by InfluxDays.

Not only recent development insights but also highlights various guest speakers whose feedbacks simply regarding real-world use cases and reality checks. In 2019, this convention was hosted in Brewery, London. This was specialized in hosting corporate venues in the Finsbury district of London.

InfluxDays London 2019 Recap

InfluxDays London 2019 Recap what_we_do

OLYMPUS DIGITAL CAMERA

Given the very fast pace of changes in the industry with new languages, new APIs, new partnerships, and a complete shift from 1.x to 2.x, expectations were high.

I wanted to have an appropriate section for the recent industry collaborations and a community chapter to end this recap.

So, in this tutorial, you will be going to have an entire look at all the latest & innovative features declared for the future of InfluxDB. Also, you guys can keep your eye on What are the key facts that were announced and what’s in the future for developers? from here.

Without wasting another single minute, let’s check out some key facts of InfluxDays London 2021 Recap prevailing below:

I – From InfluxDB 1.x to InfluxDB 2.x

If you have been working with InfluxDB for quite some time, you already know about the TICK stack, composed of Telegraf, InfluxDB, Chronograf, and Kapacitor.

All those components cover a very specific use case, whether it is about visualization with Chronograf, stream processing and analysis with Kapacitor, or metrics gathering using Telegraf.

However, some of those components showed some limitations, in addition to the inability of InfluxQL to scale up besides being a great ramp for the platform at the beginning.

That’s the reason why InfluxData invested in InfluxDB 2.0, unveiling the Flux language and revamping the whole platform thus inherently putting an end to the TICK stack as we used to know it.

The announcements made in InfluxDays London confirmed the transition.

a – InfluxDB as a single platform

Influx is much more than just InfluxDB. Even if the product is called InfluxDB 2.0, the single UI platform that InfluxData is building is doing way more than that.

Right now, Chronograf is integrated directly into the InfluxDB 2.0 platform. You have a real coexistence of the two products and you don’t have to rely on two different products to visualize data anymore.

a – InfluxDB as a single platform InfluxDB-2.0-GREY

As you can see, InfluxDB is stated to be a time series database visualization (Chronograf aspect of it), query, and task engine (tasks will be explained a little bit later).

Having a single platform for everything is interesting.

Developers are often hesitant about installing many different tools as it means that it will create maintenance costs for every single tool in the stack.

With two different products and Telegraf remaining an entity on its own, the data visualization and manipulation are done in a single place.

InfluxDB-Cloud-2-0-beta-create-API-token-for-use-with-client-library-image

Opinion

Having a single platform for pretty much everything is definitely interesting.

It is definitely a great UI/UX challenge as it can be hard to design tools that are functionally that ambitious.

However, from the live demonstrations that I saw on Friday (the workshop day), the process seems smooth enough.

InfluxData definitely delivers on the “Time To Awesome” feature. Besides being a great marketing catchword, it is actually reinforced by the predefined templates (for your system, a Docker or a Kubernetes instance) that can create a complete dashboard in seconds.

protoboard-system

Links

InfluxDB 2.0 documentation

InfluxDB 2.0 Announcements and Future Ahead – by Paul Dix

b – Emphasis on Flux

As stated before, Flux is a new data scripting and query language built by InfluxData.

flux-logo

Flux comes as a replacement of InfluxQL and provides an API, built on the concepts of piping, i.e sequencing, operations in order to manipulate data. You can for example send data to third-party services, pivot it or perform join operations on it.

For those interested, I wrote an article on the difference between Flux and SQL.

But Flux isn’t only a language. It has also a query plannera query optimizer as well as a CLI (or a REPL) that one can use in order to directly test Flux functions.

The major evolution here is that Flux is seen as a first-class citizen, and not a subpart of InfluxDB itself.

In a way, Flux should exist on its own.

InfluxData, via its CEO voice Paul Dix, clearly stated that they want Flux to integrate with other systems and that other languages might be transpiled to Flux in the future.

Maybe we could see an efficient transpilation from SQL to Flux in the future? Opinions diverge.

flux-builder

Opinion

For those following the blog, you know that I advocate technologies that are designed to reach the widest possible audience. Languages should not be created for elite programmers especially when they are aimed at data manipulation and analysis.

Flux is definitely designed that way.

With an expressive API, that one can easily understand, it should not be too hard for non-tech people to manipulate it and bring value easily.

Even if Flux is theoretically impressive, and well documented, I believe that its potential will be confirmed by real use cases and popularized by developers becoming way more productive than with other languages.

c – Tasks

In short, tasks are scheduled jobs that one can run in order to manipulate, transform or transfer data.

They can be seen as cron jobs following a very special syntax close to JSON.

task-template

The goal here is to have small units that perform tasks on their own.

The task contains Flux scripts and should be composable to perform multiple operations.

Administrators are able to create tasks templates that can be used by all the users afterward. I believe this is a great point when you want to reuse some of the work you have done previously.

Tasks can be managed via the InfluxDB UI, the CLI, and with the new API that InfluxData is building very specifically for tasks.

This API will provide a set of endpoints (such as /create or /delete) for you to easily perform some administrative operations on your tasks.

Tasks are versioned, so you could theoretically go back to a specific version if your new task isn’t satisfying enough.

Finally, you have tasks runs and tasks logs directly in the interface to see when your task was run and if it was run successfully.

What about alerts and notifications?

Alerts and notifications channels are still a very big part of InfluxDB 2.0. Special APIs were created for alerts and notifications and you should theoretically be able to create, modify or delete alert rules easily.

Opinion

In my opinion, the real challenge with tasks is about defining the best practices around them.

Having tasks that perform small operations is interesting for debugging purposes. Also, with runs and logs, you can have feedback on how your actions perform.

However, having small and independent units can lead to a very large pool of tasks to maintain, as well as code duplication probably leading to bug duplication.

There is a big challenge for developers to create an efficient data flow that is not too convoluted. Tasks are probably easy to create, but they should not be seen as the ultimate option to perform every single operation on your database.

For example, do you find tasks adapted for data cleansing? Wouldn’t it be more adapted to perform it before inserting your data into the platform?

Do’s and don’ts will definitely be needed.

d – Giraffe and Clockface

d – Giraffe and Clockface chronograf

Giraffe and Clockface are two products that were announced in InfluxDays 2021 London.

Right now, InfluxDB 2.0 provides a wide panel of visualizations for your metrics.

Graphs, scatter plots, single stats, histograms, and many more are provided by default in the interface.

However, InfluxData wants developers to be able to build their own visualizations.

Following the recent moves made by Grafana, moving from Angular panels to React panels, InfluxData created two libraries that allow developers to build their own panels and share them.

Giraffe

Giraffe is a library that is leveraging React components to provide a complete API for developers to use.

Pretty much like Grafana plugins, we can imagine that InfluxData is going to build a place for people to share their panels.

For now, the library is still in the pre-alpha stage, but I’ll share more info about it as soon as it becomes available to the public.

Clockface

At first, I had some trouble understanding the actual difference between Giraffe and Clockface.

Giraffe is designed for Chronograf-like panels and Clockface is designed to build visualization applications.

In short, it allows you to tweak the existing interface to add actions that are not existing natively in the platform.

Let’s say that you want to create a button on InfluxDB 2.0 that hides a certain panel that you find not that useful.

You would do it using Clockface, creating a button, and performing some Javascript operations to hide the panel.

e – User Packages

User packages are another big part of the next features of InfluxDB 2.0. Pretty much like NPM packages, Influx packages are meant to allow developments of any kind of logic into the platform without having to perform a pull request.

Proposals were made for interacting with Influx packages, such as a CLI with custom functions such as:

> influx package init
> influx package publish

You can define types and rules inside a configuration file that is very similar to a package.json file in Node ecosystems.

Warning: Influx user packages are different from Flux packages.

Flux packages are used to import Flux functions that were already coded and that you would want to use in a new Flux

e – User Packages influx1 e – User Packages influx2 e – User Packages influx3

II – Recent Industry Collaborations

In this industry, I believe that collaboration is one of the ways to succeed.

Most of the existing products (at least Grafana, InfluxDB, and Prometheus) are working towards improving the interoperability of their respective products.

What do we mean by interoperability?

Even if those tools are not accepting the same metrics format, and not dealing with the same APIs, there is a work in progress in accepting various different formats in InfluxDB.

a – Grafana

grafana
Even if InfluxDB 2.0 definitely intersects some of the features of Grafana, Grafana still provides new ways to monitor your InfluxDB metrics.

Both InfluxDB and Grafana are working on data exploration features.

Grafana wants to become an observability platform. It means that Grafana can be used to monitor real-time data, but it also can be used to have a closer look at certain metrics, and even compare them with metrics from other buckets or databases.

b – Prometheus & Flux Transpilation

b – Prometheus & Flux Transpilation prom

We know that Prometheus and Flux work quite differently when it comes to data querying.

As a reminder, I explained how Prometheus querying via PromQL worked in my Prometheus guide.

One of the news that was unveiled at InfluxDays London 2021 was the transpiling work done from Prometheus to Flux.

The goal of this project is to have InfluxDB as long-term storage for Prometheus. Prometheus queries would be run against this transpiler to produce Flux code.

Flux code would then be interpreted by the dedicated Flux interpreter and run on the Flux engine.

Julius Volz revealed his advancements on this project, explaining how he designed the data mapper (with metric names as fields, the measurement being ‘Prometheus’, label names being column names, and so on).

He also explained how he used abstract syntax trees to map PromQL functions to a Flux pipeline.

Remarks: there are still some concerns regarding the speed performances of the transpiling process.

InfluxData and Prometheus are working on a way to make joint operations more efficient, thus inducing faster transpiling operations.

Do Check Related Articles on Grafana & Prometheus: 

III – Real World Use Cases

During InfluxDays London, many of the talks were done by guest speakers. Guests came from very different industries and showcased on they used InfluxDB for their own needs.

This is in my opinion the most interesting part of the convention, as it featured many different use cases that highlighted how InfluxDB is solving problems in the real world.

a – Playtech

a – Playtech atavgen a – Playtech sotellme a – Playtech spaghetti

This talk was presented by Alex Tavgen, technical architect for Playtech which is a popular gambling software development company.

Alex Tavgen unveiled his views on architecture monitoring solutions and provided very concrete examples of how they should be designed.

His talk was centered around the following points :

  • “Managing a zoo” is hard: when developers are free to choose their stacks, it often leads to having teams with many different stacks which leads to obvious maintenance issues;
  • Abusive use of dashboards: it is tempting to record everything, just in case. More is not better in monitoring, and metrics should not be monitored for the sake of it. We need to bring actual sense in monitoring;
  • Are dashboards useless? No, but humans need to know what they have to look for in a dashboard. Dashboarding needs guidance and documentation;
  • Anomaly detection to preserve client trust: you should tell the client that he has an issue, and not the other way around. Many investments were made to improve actual anomaly detection, not just outlier detection. Alex Tavgen explained how he used machine learning to create a predictive engine that raises alerts when it truly matters. (they also plan to open source their Java engine).

b – Worldsensing

b – Worldsensing worldsensing b – Worldsensing worldsensing2 b – Worldsensing worldsensing3

This talk was presented by Albert Zaragoza, CTO and Head of Engineering at WorldSensing.Worldsensing is a company that provides monitoring solutions and intelligent components, leveraging IoT and time series databases to build smart cities. They provide intelligent parking solutions, as well as traffic flow solutions among other products.

Worldsensing produces client-facing applications, providing real-time feedback with extensive use of monitoring dashboards. As part of their infrastructure, Worldsensing uses Kapacitor and Flux in order to provide fast and reliable information to their end clients.

I will provide a link to the video as soon as it is available. However, if you use Postgres databases or if you are interested in providing Spatio-temporal feedback to your clients, you should definitely check what they are doing at Worldsensing.

IV – Community

InfluxDays London 2021 was also a great moment for us to talk and share our experiences with InfluxDB and the other products of its ecosystem.

a – InfluxAces

InfluxAces share their knowledge in a variety of ways, online and offline, through blogging, podcasting, attending meetups, answering community questions, building courses.. the list goes on!

And I am part of them now! Thank you!

a – InfluxAces a – InfluxAces mycard a – InfluxAces aces

As part of the InfluxAces program, I want to encourage creators and builders to submit their original articles to us as long as it is related to the subjects detailed in the article, and of course to InfluxDB.On my side, I will keep on writing about those technologies because I believe that they will shape the world of tomorrow.

b – Community Links

Meetups and conventions are great to connect with people. But they are not the only way to do it.

If you are interested in InfluxDB, you should:

  • Join the Slack community available here.
  • Ask your questions in the community forum (and have your questions answered by MarcV)
  • Tweet to or about InfluxDB on Twitter.

A Closing Word

On a technological level, it was truly inspiring to see how fast those technologies change and evolve in such a short amount of time.

The progress made so far is huge and it looks very promising for the future.

At InfluxDays London, I was lucky to be able to connect with passionate and humble people from all over the world. I was able to share my views on the future and to listen to what the others had to say about it.

Sure, I learned more about InfluxDB, but I learned even more about the people behind the scenes, whether they are working for InfluxData or not.

Engineers, developers, managers, or just curious individuals, all share the same ambition to create, learn and give back to the community.

The open-source community is definitely an amazing community.

See you next year!

How To Install an Elasticsearch Cluster on Ubuntu 18.04

Elasticsearch is a platform for distributed search and data analysis in real-time. It offers a multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents with a simple installation.

By Elasticsearch, you can execute and combine various types of searches giving you the like Kibana, Logstash, X-Pack, etc., Elasticsearch can collect and monitor Big Data at a large scale. This Elasticsearch cluster includes three data nodes and with this, we could avoid a split-brain and have a quorum of master-eligible nodes.

In this free and ultimate tutorial, we will be going to learn how to install and configure a 3 Node Elasticsearch Cluster on Ubuntu 18.04 and with this you can go through some API examples on creating indexes, ingesting documents, searches, etc.Elasticsearch Logo

What is ElasticSearch?

ElasticSearch is a highly scalable open-source analytics engine, RESTful search engine built on top of Apache Lucene and issued under an Apache license. It is the most famous search engine and is generally used for full-text search, log analytics, security intelligence, business analytics, analyze big volumes of data faster and in near real-time, etc. Also, Elasticsearch is Java-based and can search and index document files in different formats.

Features of ElasticSearch

Before we get to the main topic, let’s cover some basics about Elasticsearch from below: 

Basic Concepts of Elasticsearch

  • An Elasticsearch Cluster is made up of a number of nodes;
  • Each Node includes Indexes, where an Index is a Collection of Documents;
  • Master nodes are subjective for Cluster related tasks, creating/deleting indexes, tracking of nodes, allocate shards to nodes;
  • Data nodes are liable for hosting the actual shards that have the indexed data also handles data related operations like CRUD, search, and aggregations;
  • Indexes are split into Multiple Shards;
  • Shards exist of Primary Shards and Replica Shards;
  • A Replica Shard is a Copy of a Primary Shard that is used for HA/Redundancy;
  • Shards get placed on random nodes throughout the cluster;
  • A Replica Shard will NEVER be on the same node as the Primary Shard’s associated shard-id.

Representation of Nodes, Index and Shards on 2 Nodes (as an example)

Note on Master Elections

The least number of master eligible nodes that want to join a newly elected master in order for an election is configured via the setting discovery.zen.minimum_master_nodes. This configuration is very powerful, as it makes each master-eligible node informed of the minimum number of master-eligible nodes that must be visible in order to form a cluster.

Without this setting or incorrect configuration, this might lead to a split-brain, where let’s say something went wrong and upon nodes rejoining the cluster, it may form 2 different clusters, which we want to avoid at all costs.

From consulting elasticsearch documentation, to avoid a split brain, this setting should be set to a quorum of master-eligible nodes via the following formula:

(master_eligible_nodes / 2) + 1
# in our case:
(3/2) + 1 = 2

It is advised to evade having only two master eligible nodes since a quorum of two is two. To read more on elasticsearch cluster master election process, take a look at their documentation

Prerequisites

We have to set the internal IP addresses of our nodes to either our hosts’ file or DNS server. To keep it easy & straightforward, I will add them to my host file. This needs to apply to both nodes:

$ sudo su - 
$ cat > /etc/hosts << EOF
127.0.0.1 localhost
172.31.0.77 es-node-1
172.31.0.45 es-node-2
172.31.0.48 es-node-3
EOF

Now that our host entries are set, we can start with the fun stuff.

Installing Elasticsearch on Ubuntu

The following instructions and directions should be implemented to both nodes.

Get the Elasticsearch repositories and update your system so that your servers are aware of the newly added Elasticsearch repository:

$ apt update && apt upgrade -y
$ apt install software-properties-common python-software-properties apt-transport-https -y
$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
$ apt update

Elasticsearch relies on Java, so install the java development kit:

$ apt install default-jdk -y

Verify that java is installed:

$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)
OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)

Install Elasticsearch:

$ apt install elasticsearch -y

Once Elasticsearch is installed, repeat these steps on the second node. Once that is done, move on to the configuration section.

Configuring Elasticsearch

For nodes to join the same cluster, they should all share the same cluster name.

We also need to specify the discovery hosts as the masters so that the nodes can be discoverable. Since we are installing a 3 node cluster, all nodes will contribute to a master and data node role.

Feel free to inspect the Elasticsearch cluster configuration, but I will be overwriting the default configuration with the config that I need.

Make sure to apply the configuration on both nodes:

$ cat > /etc/elasticsearch/elasticsearch.yml << EOF
cluster.name: es-cluster
node.name: \${HOSTNAME}
node.master: true
node.data: true
path.logs: /var/log/elasticsearch
path.data: /usr/share/elasticsearch/data
bootstrap.memory_lock: true
network.host: 0.0.0.0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts: ["es-node-1", "es-node-2"]
EOF

Important settings for your elasticsearch cluster is described on their docs:

  • Disable swapping
  • Increase file descriptors
  • Ensure sufficient virtual memory
  • Ensure sufficient threads
  • JVM DNS cache settings
  • Temporary directory not mounted with noexec

Increase the file descriptors on the nodes, as instructed by the documentation:

$ cat > /etc/default/elasticsearch << EOF
ES_STARTUP_SLEEP_TIME=5
MAX_OPEN_FILES=65536
MAX_LOCKED_MEMORY=unlimited
EOF

Ensure that pages are not swapped out to disk by requesting the JVM to lock the heap in memory by setting LimitMEMLOCK=infinity.

Set the maximum file descriptor number for this process: LimitNOFILE and increase the number of threads using LimitNPROC:

$ vim /usr/lib/systemd/system/elasticsearch.service
[Service]
LimitMEMLOCK=infinity
LimitNOFILE=65535
LimitNPROC=4096
...

Increase the limit on the number of open files descriptors to user elasticsearch of 65536 or higher

$ cat > /etc/security/limits.conf << EOF
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
EOF

Increase the value of the map counts as elasticsearch uses maps directory to store its indices:

$ sysctl -w vm.max_map_count=262144

For a permanent setting, update vm.max_map_count in /etc/sysctl.conf and run:

$ sysctl -p /etc/sysctl.conf

Change the permissions of the elasticsearch data path, so that the elasticsearch user and group has permissions to read and write from the configured path:

$ chown -R elasticsearch:elasticsearch /usr/share/elasticsearch

Make sure that you have applied these steps to all the nodes before continuing.

Start Elasticsearch

Enable Elasticsearch on boot time and start the Elasticsearch service:

$ systemctl enable elasticsearch
$ systemctl start elasticsearch

Verify that Elasticsearch is running:

$ netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp6       0      0 :::9200                 :::*                    LISTEN      278/java
tcp6       0      0 :::9300                 :::*                    LISTEN      278/java

Using Elasticsearch Restful API

In this section we will get comfortable with using Elasticsearch API, by covering the following examples:

  • Cluster Overview;
  • How to view Node, Index, and Shard information;
  • How to Ingest Data into Elasticsearch;
  • Who to Search data in Elasticsearch;
  • How to delete your Index

View Cluster Health

From any node, use an HTTP client such as curl to investigate the current health of the cluster by looking at the cluster API:

$ curl -XGET http://localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "es-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

As you can see the cluster status is Green, which means everything works as expected.

In Elasticsearch you get Green, Yellow and Red statuses. Yellow would essentially mean that one or more replica shards are in an unassigned state. Red status means that some or all primary shards are unassigned which is really bad.

From this output, we can also see the number of data nodes, primary shards, unassigned shards, etc.

This is a good place to get an overall view of your Elasticsearch cluster’s health.

View the Number of Nodes in your Cluster

By looking at that /_cat/nodes API we can get information about our nodes that is part of our cluster:

$ curl -XGET http://localhost:9200/_cat/nodes?v
ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.31.0.77              10          95   0    0.00    0.00     0.00 mdi       -      es-node-1
172.31.0.45              11          94   0    0.00    0.00     0.00 mdi       -      es-node-2
172.31.0.48              25          96   0    0.07    0.02     0.00 mdi       *      es-node-3

As you can see, we can see information about our nodes such as the JVM Heap, CPU, Load averages, node role of each node, and which node is master.

As we are not running dedicated masters, we can see that node es-node-3 got elected as master.

Create your first Elasticsearch Index

Note that when you create an index, the default primary shards are set to 5 and the default replica shard count is set to 1. You can change the replica shard count after an index has been created, but not the primary shard count as that you will need to set on index creation.

Let’s create an Elasticsearch index named myfirstindex:

$ curl -XPUT http://localhost:9200/myfirstindex
{"acknowledged":true,"shards_acknowledged":true,"index":"myfirstindex"}

Now that your index has been created, let’s have a look at the /_cat/indices API to get information about our indices:

$ curl -XGET http://localhost:9200/_cat/indices?v
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   myfirstindex xSX9nOQJQ2qNIq4A6_0bTw   5   1          0            0      1.2kb           650b

From the output you will find that we have 5 primary shards and 1 replica shard, with 0 documents in our index and that our cluster is in a green state, meaning that our primary and replica shards have been assigned to the nodes in our cluster.

Note that a replica shard will NEVER reside on the same node as the primary shard for HA and Redundancy.

Let’s go a bit deeper and have a look at the shards, to see how our shards are distributed through our cluster, using the /_cat/shards API:

$ curl -XGET http://localhost:9200/_cat/shards?v
index        shard prirep state   docs store ip             node
myfirstindex 1     r      STARTED    0  230b 172.31.0.77    es-node-2
myfirstindex 1     p      STARTED    0  230b 172.31.0.48    es-node-3
myfirstindex 4     p      STARTED    0  230b 172.31.0.48    es-node-3
myfirstindex 4     r      STARTED    0  230b 172.31.0.77    es-node-1
myfirstindex 2     r      STARTED    0  230b 172.31.0.45    es-node-2
myfirstindex 2     p      STARTED    0  230b 172.31.0.77    es-node-1
myfirstindex 3     p      STARTED    0  230b 172.31.0.45    es-node-2
myfirstindex 3     r      STARTED    0  230b 172.31.0.48    es-node-3
myfirstindex 0     p      STARTED    0  230b 172.31.0.45    es-node-2
myfirstindex 0     r      STARTED    0  230b 172.31.0.77    es-node-1

As you can see each replica shard of its primary is spread on different nodes.

Replicating a Yellow Cluster Status

For a yellow cluster status, we know that it’s when one or more replica shards are in an unassigned state.

So let’s replicate that behavior by scaling our replica count to 3, which would mean that 5 replica shards will be in an unassigned state:

$ curl -XPUT -H 'Content-Type: application/json' \
http://localhost:9200/myfirstindex/_settings -d \
'{"number_of_replicas": 3}'

Now we have scaled the replica count to 3, but since we only have 3 nodes, we will have a yellow state cluster:

$ curl -XGET http://localhost:9200/_cat/indices?v
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myfirstindex xSX9nOQJQ2qNIq4A6_0bTw   5   3          0            0      3.3kb           1.1kb

The cluster health status should show the number of unassigned shards, and while they are unassigned we can verify that by looking at the shards API again:

$ curl -XGET http://localhost:9200/_cat/shards?v
index        shard prirep state      docs store ip             node
myfirstindex 1     r      STARTED       0  230b 172.31.0.45    es-node-2
myfirstindex 1     p      STARTED       0  230b 172.31.0.48    es-node-3
myfirstindex 1     r      STARTED       0  230b 172.31.0.77    es-node-1
myfirstindex 1     r      UNASSIGNED
myfirstindex 4     r      STARTED       0  230b 172.31.0.45    es-node-2
myfirstindex 4     p      STARTED       0  230b 172.31.0.48    es-node-3
myfirstindex 4     r      STARTED       0  230b 172.31.0.77    es-node-1
myfirstindex 4     r      UNASSIGNED
myfirstindex 2     r      STARTED       0  230b 172.31.0.45    es-node-2
myfirstindex 2     r      STARTED       0  230b 172.31.0.48    es-node-3
myfirstindex 2     p      STARTED       0  230b 172.31.0.77    es-node-1
myfirstindex 2     r      UNASSIGNED
myfirstindex 3     p      STARTED       0  230b 172.31.0.45    es-node-2
myfirstindex 3     r      STARTED       0  230b 172.31.0.48    es-node-3
myfirstindex 3     r      STARTED       0  230b 172.31.0.77    es-node-1
myfirstindex 3     r      UNASSIGNED
myfirstindex 0     p      STARTED       0  230b 172.31.0.45    es-node-2
myfirstindex 0     r      STARTED       0  230b 172.31.0.48    es-node-3
myfirstindex 0     r      STARTED       0  230b 172.31.0.77    es-node-1
myfirstindex 0     r      UNASSIGNED

At this point in time, we could either add another node to the cluster or scale the replication factor back to 1 to get the cluster health to green again.

I will scale it back down to a replication factor of 1:

$ curl -XPUT http://localhost:9200/myfirstindex/_settings -d '{"number_of_replicas": 1}'

Ingest Data into Elasticsearch

We will ingest 3 documents into our index, this will be a simple document consisting of a name, country and gender, for example:

{ 
  "name": "james", 
  "country": "south africa", 
  "gender": "male"
}

First, we will ingest the document using a PUT HTTP method, when using a PUT method, we need to specify the document ID.

PUT methods will be used to create or update a document. For creating:

$ curl -XPUT -H 'Content-Type: application/json' \
http://localhost:9200/myfirstindex/_doc/1 -d '
{"name":"james", "country":"south africa", "gender": "male"}'

Now you will find we have one index in our cluster:

$ curl -XGET http://localhost:9200/_cat/indices?v
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   myfirstindex xSX9nOQJQ2qNIq4A6_0bTw   5   1          1            0     11.3kb          5.6kb

Since we know that the document ID is “1”, we can do a GET on the document ID to read the document from the index:

$ curl -XGET http://localhost:9200/myfirstindex/people/1?pretty
{
  "_index" : "myfirstindex",
  "_type" : "people",
  "_id" : "1",
  "found" : false
}

If we ingest documents with a POST request, Elasticsearch generates the document ID for us automatically. Let’s create 2 documents:

$ curl -XPOST -H 'Content-Type: application/json' \
http://localhost:9200/myfirstindex/_doc/ -d '
{"name": "kevin", "country": "new zealand", "gender": "male"}'

$ curl -XPOST -H 'Content-Type: application/json' \
http://localhost:9200/myfirstindex/_doc/ -d '
{"name": "sarah", "country": "ireland", "gender": "female"}'

When we have a look again at our index, we can see that we now have 3 documents in our index:

$ curl -XGET http://localhost:9200/_cat/indices?v
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   myfirstindex xSX9nOQJQ2qNIq4A6_0bTw   5   1          3            0       29kb         14.5kb

Search Queries

Now that we have 3 documents in our elasticsearch index, let’s explore the search APIs to get data from our index. First, let’s search for the keyword “sarah” as a source query parameter:

$ curl -XGET 'http://localhost:9200/myfirstindex/_search?q=sarah&pretty'
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "myfirstindex",
        "_type" : "_doc",
        "_id" : "cvU96GsBP0-G8XdN24s4",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "sarah",
          "country" : "ireland",
          "gender" : "female"
        }
      }
    ]
  }
}

We can also narrow our search query down to a specific field, for example, show me all the documents with the name kevin:

$ curl -XGET 'http://localhost:9200/myfirstindex/_search?q=name:kevin&pretty'
{
...
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "myfirstindex",
        "_type" : "_doc",
        "_id" : "gPU96GsBP0-G8XdNHoru",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "kevin",
          "country" : "new zealand",
          "gender" : "male"
        }
      }
    ]
  }
}

With Elasticsearch we can also search with our query in the request body, a similar query as above would look like this:

$ curl -XPOST -H 'Content-Type: application/json' \
'http://localhost:9200/myfirstindex/_search?pretty' -d '
{
  "query": {
    "match": {
      "name": "kevin"
    }
  }
}'

{
...
        "_index" : "myfirstindex",
        "_source" : {
          "name" : "kevin",
          "country" : "new zealand",
          "gender" : "male"
        }
...
}

We can use wildcard queries:

$ curl -XPOST -H 'Content-Type: application/json' \
'http://172.31.0.77:9200/myfirstindex/_search?pretty' -d '
{
  "query": {
    "wildcard": {
      "country": "*land"
    }
  }
}'

{
...
    "hits" : [
      {
        "_index" : "myfirstindex",
        "_type" : "_doc",
        "_id" : "cvU96GsBP0-G8XdN24s4",
        "_score" : 1.0,
        "_source" : {
          "name" : "sarah",
          "country" : "ireland",
          "gender" : "female"
        }
      },
      {
        "_index" : "myfirstindex",
        "_type" : "_doc",
        "_id" : "gPU96GsBP0-G8XdNHoru",
        "_score" : 1.0,
        "_source" : {
          "name" : "kevin",
          "country" : "new zealand",
          "gender" : "male"
        }
      }
    ]
...
}

Have a look at their documentation for more information on the Search API

Delete your Index

To wrap this up, we will go ahead and delete our index:

$ curl -XDELETE http://localhost:9200/myfirstindex

Going Further

If this got you curious, then definitely have a look at this Elasticsearch Cheatsheet that I’ve put together and if you want to generate lots of data to ingest to your elasticsearch cluster, have a look at this python script.

Our other links related to ELK:

Monitoring Linux Logs with Kibana and Rsyslog

Monitoring Linux Logs with Kibana and Rsyslog | Using Kibana and Rsyslog to monitor Linux logs

A system administrator and enthusiastic application developers can grab this best opportunity of digging deep into this tutorial and acquire the complete details about Monitoring Linux Logs with Kibana and Rsyslog along with how to build a monitoring pipeline to analyze Linux logs with ELK 7.2 and Rsyslog.

If you are curious to learn solutions for such queries as monitoring SSH intrusions on VMs, observing what errors were raised by your application server on a special day, on a very special hour, or else you need to check a few insights regarding who stopped your systemd service on one of your VMs.

In case, you have raised even one of these questions in your mind while working on Linux logs then you are on the right site.

So, concentrate on this entire tutorial where you will get to know how to build a complete log monitoring pipeline using the ELK stack (ElasticSearch, Logstash, and Kibana) and Rsyslog as a powerful Syslog server.

As you know that you are on the right page, but before going to learn the main concepts or directly go through the technical considerations right away, let’s have a talk about why do we want to monitor Linux logs with Kibana.

Why should you monitor Linux logs?

Monitoring Linux logs is crucial and every DevOps engineer should know how to do it. Here’s why :

  • You have real-time visual feedback about your logs: probably one of the key aspects of log monitoring, you can build meaningful visualizations (such as data tables, pies, graphs, or aggregated bar charts) to give some meaning to your logs.
  • You are able to aggregate information to build advanced and more complex dashboards: sometimes raw information is not enough, you may want to join it with other logs or to compare it with other logs to identify a trend. A visualization platform with expression handling lets you perform that.
  • You can quickly filter for a certain term, or given a certain time period: if you are only interested in SSH logs, you can build a targeted dashboard for it.
  • Logs are navigable in a quick and elegant way: I know the pain of tailing and grepping your logs files endlessly. I’d rather have a platform for it.

why-log-monitoring

What You Will Learn?

There are many things that you are going to learn if you follow this tutorial:

  • How logs are handled on a Linux system (Ubuntu or Debian) and what rsyslog is.
  • How to install the ELK stack (ElasticSearch 7.2, Logstash, and Kibana) and what those tools will be used for.
  • How to configure rsyslog to forward logs to Logstash
  • How to configure Logstash for log ingestion and ElasticSearch storage.
  • How to play with Kibana to build our final visualization dashboard.

tools-for-log-monitoring

The prerequisites for this tutorial are as follows :

  • You have a Linux system with rsyslog installed. You either have a standalone machine with rsyslog, or a centralized logging system.
  • You have administrator rights or you have enough rights to install new packages on your Linux system.

Without further due, let’s jump into it!

What does a log monitoring architecture look like?

a – Key concepts of Linux logging

Before detailing how our log monitoring architecture looks like, let’s go back in time for a second.

Historically, Linux logging starts with syslog.

Syslog is a protocol developed in 1980 which aims at standardizing the way logs are formatted, not only for Linux but for any system exchanging logs.

From there, syslog servers were developed and were embedded with the capability of handling syslog messages.

Do Check: Syslog: The Complete System Administrator Guide

They rapidly evolved to functionalities such as filtering, having content routing abilities, or probably one of the key features of such servers: storing logs and rotating them.

Rsyslog was developed keeping this key functionality in mind: having a modular and customizable way to handle logs.

The modularity would be handled with modules and the customization with log templates.

In a way, rsyslog can ingest logs from many different sources and it can forward them to an even wider set of destinations. This is what we are going to use in our tutorial.

b – Building a log monitoring architecture

Here’s the final architecture that we are going to use for this tutorial.

  • rsyslog: used as an advancement Syslog server, rsyslog will forward logs to Logstash in the RFC 5424 format we described before.
  • Logstash: part of the ELK stack, Logstash will transform logs from the syslog format to JSON. As a reminder, ElasticSearch takes JSON as an input.
  • ElasticSearch: the famous search engine will store logs in a dedicated log index (logstash-*). ElasticSearch will naturally index the logs and make them available for analysis.
  • Kibana: used as an exploration and visualization platform, Kibana will host our final dashboard.

log-monitoring-architecture

Now that we know in which direction we are heading, let’s install the different tools needed.

Installing The Different Tools

tools-used

a – Installing Java on Ubuntu

Before installing the ELK stack, you need to install Java on your computer.

To do so, run the following command:

$ sudo apt-get install default-jre

At the time of this tutorial, this instance runs the OpenJDK version 11.

ubuntu:~$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)
OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)

b – Adding Elastic packages to your instance

For this tutorial, I am going to use a Ubuntu machine but details will be given for Debian ones.

First, add the GPG key to your APT repository.

$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Then, you can add Elastic source to your APT source list file.

$ echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

$ cat /etc/apt/sources.list.d/elastic-7.x.list
deb https://artifacts.elastic.co/packages/7.x/apt stable main

$ sudo apt-get update

From there, you should be ready to install every tool in the ELK stack.

Let’s start with ElasticSearch.

c – Installing ElasticSearch

ElasticSearch is a search engine built by Elastic that stores data in indexes for very fast retrieval.

To install it, run the following command:

$ sudo apt-get install elasticsearch

The following command will automatically :

  • Download the deb package for ElasticSearch;
  • Create an elasticsearch user;
  • Create an elasticsearch group;
  • Automatically create a systemd service fully configured (inactive by default)

elastic-service

At the first start, the service is inactive, start it and make sure that everything is running smoothly.

$ sudo systemctl start elasticsearch
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-07-08 18:19:45 UTC; 2 days ago
     Docs: http://www.elastic.co

In order to make sure that ElasticSearch is actually running, you can execute one of those two commands:

  • Watching which applications listen on a targeted port
$ sudo lsof -i -P -n | grep LISTEN | grep 9200
java      10667   elasticsearch  212u  IPv6 1159208890      0t0  TCP [::1]:9200 (LISTEN)
java      10667   elasticsearch  213u  IPv6 1159208891      0t0  TCP 127.0.0.1:9200 (LISTEN)
  • Executing a simple ElasticSearch query
$ curl -XGET 'http://localhost:9200/_all/_search?q=*&pretty'

Your ElasticSearch instance is all set!

Now, let’s install Logstash as our log collection and filtering tool.

Let’s gain some more knowledge on Elasticsearch by clicking here: How To Install an Elasticsearch Cluster on Ubuntu 18.04

d – Installing Logstash

If you added Elastic packages previously, installing Logstash is as simple as executing:

$ sudo apt-get install logstash

Again, a Logstash service will be created, and you need to activate it.

$ sudo systemctl status logstash
$ sudo systemctl start logstash

By default, Logstash listens for metrics on port 9600. As we did before, list the open ports on your computer looking for that specific port.

$ sudo lsof -i -P -n | grep LISTEN | grep 9600
java      28872        logstash   79u  IPv6 1160098941      0t0  TCP 127.0.0.1:9600 (LISTEN)

Great!

We only need to install Kibana for our entire setup to be complete.

e – Installing Kibana

As a reminder, Kibana is the visualization tool tailored for ElasticSearch and used to monitor our final logs.

Not very surprising, but here’s the command to install Kibana:

$ sudo apt-get install kibana

As usual, start the service and verify that it is working properly.

$ sudo systemctl start kibana
$ sudo lsof -i -P -n | grep LISTEN | grep 5601
node       7253          kibana   18u  IPv4 1159451844      0t0  TCP *:5601 (LISTEN)

Kibana Web UI is available on port 5601.

Head over to http://localhost:5601 with your browser and you should see the following screen.

kibana-welcome-page

Nice!

We are now very ready to ingest logs from rsyslog and to start visualizing them in Kibana.

Routing Linux Logs to ElasticSearch

As a reminder, we are routing logs from rsyslog to Logstash and those logs will be transferred to ElasticSearch pretty much automatically.

a – Routing from Logstash to ElasticSearch

how-does-logstash-work

Before routing logs from rsyslog to Logstash, it is very important that we setup log forwarding between Logstash and ElasticSearch.

To do so, we are going to create a configuration file for Logstash and tell it exactly what to do.

To create Logstash configuration files, head over to /etc/logstash/conf.d and create a logstash.conf file.

Inside, append the following content:

input {                                                                                      
  udp {                                                                                      
    host => "127.0.0.1"                                                                      
    port => 10514                                                                            
    codec => "json"                                                                          
    type => "rsyslog"                                                                        
  }                                                                                          
}                                                                                            
                                                                                             
                                                                            
# The Filter pipeline stays empty here, no formatting is done.                                                                                           filter { }                                                                                   
                                                                                             
                   
# Every single log will be forwarded to ElasticSearch. If you are using another port, you should specify it here.                                                                                             
output {                                                                                     
  if [type] == "rsyslog" {                                                                   
    elasticsearch {                                                                          
      hosts => [ "127.0.0.1:9200" ]                                                          
    }                                                                                        
  }                                                                                          
}

Note: For this tutorial, we are using the UDP input for Logstash, but if you are looking for a more reliable way to transfer your logs, you should probably use the TCP input. The format is pretty much the same, just change the UDP line to TCP.

Restart your Logstash service.

$ sudo systemctl restart logstash

To verify that everything is running correctly, issue the following command:

$ netstat -na | grep 10514
udp        0      0 127.0.0.1:10514         0.0.0.0:*
Great!

Logstash is now listening on port 10514.

b – Routing from rsyslog to Logstash

how-rsyslog-work

As described before, rsyslog has a set of different modules that allow it to transfer incoming logs to a wide set of destinations.

Rsyslog has the capacity to transform logs using templates. This is exactly what we are looking for as ElasticSearch expects JSON as an input, and not syslog RFC 5424 strings.

In order to forward logs in rsyslog, head over to /etc/rsyslog.d and create a new file named 70-output.conf

Inside your file, write the following content:

# This line sends all lines to defined IP address at port 10514
# using the json-template format.

*.*                         @127.0.0.1:10514;json-template

Now that you have log forwarding, create a 01-json-template.conf file in the same folder, and paste the following content:

template(name="json-template"
  type="list") {
    constant(value="{")
      constant(value="\"@timestamp\":\"")     property(name="timereported" dateFormat="rfc3339")
      constant(value="\",\"@version\":\"1")
      constant(value="\",\"message\":\"")     property(name="msg" format="json")
      constant(value="\",\"sysloghost\":\"")  property(name="hostname")
      constant(value="\",\"severity\":\"")    property(name="syslogseverity-text")
      constant(value="\",\"facility\":\"")    property(name="syslogfacility-text")
      constant(value="\",\"programname\":\"") property(name="programname")
      constant(value="\",\"procid\":\"")      property(name="procid")
    constant(value="\"}\n")
}

As you probably guessed, for every incoming message, rsyslog will interpolate log properties into a JSON formatted message, and forward it to Logstash, listening on port 10514.

Restart your rsyslog service, and verify that logs are correctly forwarded to ElasticSearch.

Note: logs will be forwarded in an index called logstash-*.

$ sudo systemctl restart rsyslog
$ curl -XGET 'http://localhost:9200/logstash-*/_search?q=*&pretty'
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "logstash-2019.07.08-000001",
        "_type": "_doc",
        "_id": "GEBK1WsBQwXNQFYwP8D_",
        "_score": 1,
        "_source": {
          "host": "127.0.0.1",
          "severity": "info",
          "programname": "memory_usage",
          "facility": "user",
          "@timestamp": "2019-07-09T05:52:21.402Z",
          "sysloghost": "schkn-ubuntu",
          "message": "                                  Dload  Upload   Total   Spent    Left  Speed",
          "@version": "1",
          "procid": "16780",
          "type": "rsyslog"
        }
      }
    ]
  }
}

Awesome! We know have rsyslog logs directly stored in ElasticSearch.

It is time for us to build our final dashboard in Kibana.

Building a Log Dashboard in Kibana

This is where the fun begins.

We are going to build the dashboard shown in the first part and give meaning to the data we collected.

Similar to our article on Linux process monitoring, this part is split according to the different panels of the final dashboard, so feel free to jump to the section you are interested in.

a – A Few Words On Kibana

Head over to Kibana (on http://localhost:5601), and you should see the following screen.

kibana-welcome-page-1

If it is your first time using Kibana, there is one little gotcha that I want to talk about that took me some time to understand.

In order to create a dashboard, you will need to build visualizations. Kibana has two panels for this, one called “Visualize” and another called “Dashboard”

kibana-menu

In order to create your dashboard, you will first create every individual visualization with the Visualize panel and save them.

When all of them will be created, you will import them one by one into your final dashboard.

kibana-process

Head over to the “Visualize” panel, and let’s start with one first panel.

b – Aggregated bar chart for processes

To build your first dashboard, click on “Create new visualization” at the top right corner of Kibana. Choose a vertical bar panel.
vertical-bar-visu
The main goal is to build a panel that looks like this :
log-split-by-host

As you can see, the bar chart provides a total count of logs per processes, in an aggregated way.

The bar chart can also be split by host if you are working with multiple hosts.

log-split-by-process

Without further ado, here’s the cheatsheet for this panel.

tcheatsheet-1-final

c – Pie by program name

Very similarly to what we have done before, the goal is to build a pie panel that divides the log proportions by program name.

tcheatsheet-2-panel

Here is the cheatsheet for this panel!
tcheatsheet-2-final

d – Pie by severity

This panel looks exactly like the one we did before, except that it splits logs by severity.

It can be quite useful when you have a major outage on one of your systems, and you want to quickly see that the number of errors is increasing very fast.

It also provides an easy way to see your log severity summary on a given period if you are interested for an instance in understanding what severities occur during the night or for particular events.

tcheatsheet-3-panel

Again as you are probably waiting for it, here’s the cheatsheet for this panel!

tcheatsheet-3-final

e – Monitoring SSH entries

This one is a little bit special, as you can directly go to the “Discover” tab in order to build your panel.

When entering the discover tab, your “logstash-*” should be automatically selected.

From there, in the filter bar, type the following filter “programname : ssh*”.

As you can see, you now have direct access to every log related to the SSHd service on your machine. You can for example track illegal access attempts or wrong logins.

tcheatsheet-4-panel

In order for it to be accessible in the dashboard panel, click on the “Save” option, and give a name to your panel.

Now in the dashboard panel, you can click on “Add”, and choose the panel you just created.

Nice! Now your panel is included into your dashboard, from the discover panel.

tcheatsheet-4-panel-2

Conclusion

With this tutorial, you now have a better understanding of how you can monitor your entire logging infrastructure easily with Rsyslog and the ELK stack.

With the architecture presented in this article, you can scale the log monitoring of an entire cluster very easily by forwarding logs to your central server.

One piece of advice would be to use a Docker image for your rsyslog and ELK stack in order to be able to scale your centralized part (with Kubernetes for example) if the number of logs increases too much.

It is also important to note that this architecture is ideal if you choose to change the way your monitor logs in the future.

You can still rely on rsyslog for log centralizing, but you are free to change either the gateway (Logstash in this case), or the visualization tool.

It is important to note that you could use Grafana for example to monitor your Elasticsearch logs very easily.

With this tutorial, will you start using this architecture in your own infrastructure? Do you think that other panels would be relevant for you to debug major outages on your systems?

If you have ideas, make sure to leave them below, so that they can help other engineers.

Until then, have fun, as always.

How To List Docker Images | What is Docker Image | docker image ls command

In the docker ecosystem, one of the biggest parts is Docker Images. It is mainly used to specify the instructions to be performed in your containers.

After creating a container with the help of a specific Docker image, the image is downloaded and stored locally on your Docker host. Still, if you build various environments or multiple tools, the list of docker images will rise fast. As a result, you require commands to list your docker images effortlessly.

Here, in this tutorial, we will be discussing how to list Docker images using Docker commands along with other essentials.

What is a Docker Image?

Basically, Docker images are a base layout from which containers are made. In order to start a container also need an image. On the docker hub, you need plenty of pre-built images out there and also require your own custom image built by using Dockerfile and the command “docker build”.

Prerequisites

The main requirement that everyone needs in their devices to try out the following examples is docker installation. In case, you don’t have docker installed on your computer, then have a look at one of the articles mentioned below on Docker installation.

List Docker Images with Command

The easiest way to list Docker images is to use the “docker images” with no arguments.

When using this command, you will be presented with the complete list of Docker images on your system.

$ docker images

Alternatively, you can use the “docker image” command with the “ls” argument.

$ docker image ls

Note that you will have to make sure that you have written “image” and not “images”.

As an example, let’s say that you want to list Docker images on your current Windows operating system.

To achieve that, you would run the following command

$ docker images

List Docker Images docker-images

Congratulations, you successfully listed Docker images on your system!

Using those commands, you will be presented with all the results, but what if you want to restrict your results to specific words?

Filtering Docker Image List

In order to filter the Docker image list results, you need to use the “docker images” command with the “–filter” followed by the filter in the form of a key-value tuple.

$ docker images --filter "<key>=<value>"

With the “–filter” option, you can specify the following keys :

  • reference”: that can be used in order to isolate images having a certain name or tag;
  • before”: to filter images created “before” a specific point in time;
  • since”: to filter images since a specific point in time (usually another image creation);
  • label”: if you used the LABEL instruction to create metadata for your image you can filter them later with this key
  • dangling”: in order to isolate images that are not used anymore.

As a short example, let’s say that you want to filter your image list given the name of your image.

As a reminder, we had an “Ubuntu” and a “Debian” image sitting on our Docker machine.

If we want to isolate all the images starting with “Deb”, we would run the following command.

$ docker images --filter "reference=deb*"

Filtering Docker Image List filter-reference

Similarly, if you are working with multiple images of the same repository, you can add an additional tag separated by a colon character.

$ docker images

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
debian              8                   00b72214a37e        3 days ago          129MB
debian              latest              971452c94376        3 days ago          114MB

$ docker images --filter "reference=debian:8"

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
debian              8                   00b72214a37e        3 days ago          129MB

Awesome, you successfully filtered your images given the image name specified.

Filtering Unused Docker Images

As a reminder, “dangling” images are not images that are not used by any container anymore. We would define those images as “unused”, but not as “dangling”.

“Dangling” images are created whenever you create a new version of a Docker image with the same “image plus tag” pair.

In short, you would have the following output when listing your images

$ docker images -a
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
debian              8                   00cc44e1e1a2        3 days ago          129MB
<none>              <none>              00b72214a37e        5 days ago          110MB

Those images or layers are not actively used anymore: they have been “re-tagged”.

If you want to filter images that are defined as “dangling”, you can append the “dangling=true” option.

$ docker images --filter "dangling=true"

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
<none>              <none>              00b72214a37e        5 days ago          110MB

Bonus: If you are looking to prune your dangling images, you can simply execute the “docker images purge”

Great, you now know how you can easily filter your dangling images on your Docker environment.

Filtering Images on Date

In some cases, you want to list Docker images depending on their creation date or when they were downloaded for the first time.

Unfortunately, you can’t specify a specific Unix epoch or point in time, you have to specify it depending on another image.

So for example, if you are looking to list Docker images created before another image, you would run the following command.

$ docker images --filter "before=<image_name>"

In our case, the “Debian Jessie” image was the most recent one: to list images created before, we would run the following command.

$ docker images --filter "before=debian:8"

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
debian              latest              971452c94376        3 days ago          114MB
ubuntu              latest              72300a873c2c        7 days ago          64.2MB

Similarly, you can choose to list Docker images created after another image, you would use the “since” keyword instead of the “before” keyword.

$ docker images --filter "since=<image_name>"

$ docker images --filter "since=ubuntu"

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
debian              8                   00b72214a37e        3 days ago          129MB
debian              latest              971452c94376        3 days ago          114MB

Awesome, you can filter your images depending on when they were created!

Listing and Formatting Docker Images

In our previous chapters, we got the same output when running our commands.

Docker will by default display the results in a set of five different columns: repository, tag, image ID, the creation date, and the size of your image.

However, in some cases, you may want to have a different output if you need to process the information in a specific system for example.

In order to list and format Docker images, you have to use the “docker images” command followed by the “–format” option and a Go template.

$ docker images --format <go_template>

In short, a Go template is a set of variables that are enclosed in double curly braces, representing the different columns to be displayed.

For example, if you want to have the output as the repository named followed by the image ID, you would run the following command.

$ docker images --format "{{.Repository}} has the following {{.ID}}"

Listing and Formatting Docker Images format

Here is a list of the different placeholders that you can use in order to format your docker images.

Placeholder Description
.ID The ID of your image
.Repository The image repository
.Tag The tag of your image
.Size The size of your image
.CreatedSince The time since your image was created
.CreatedAt The point in time when your image was created
.Digest The digest of your image (in short their UUIDs)

Great, now you have all the tools to filter your images following a specific format.

Listing Only Image IDs

In order to list only image IDs on Docker, you have to use the “docker images” command with the “–quiet” option to suppress all other columns.

$ docker images --quiet

$ docker images -q

Listing Only Image IDs quiet

As you can see, you are only presented with the list of image IDs available in your current Docker environment, awesome!

Listing Locally Stored Docker Images

To list all the locally stored Docker images, you can run the following command:

$ docker image list

As you can see, all the locally stored Docker images are listed. You can see the repository name, tag, short image ID, date of creation, and size of the locally stored Docker images.

docker image listing

The short image ID is sufficient to differentiate between the Docker images uniquely. But if you prefer the long image ID, you can use the –no-trunc option with the previous command.

To list the locally stored Docker images along with the long image ID, run the following command:

$ docker image list --no-trunc

As you can view in the following image, the long sha256 format of the image ID is displayed.

docker image list with long image ID

Conclusion

In this tutorial, you learned how you can easily list your Docker images using the “docker images” command.

You also discovered that it is possible for you to append options in order to have more detailed output.

You can use filters to have specific columns or to isolate dangling images for example.

If you are looking for more tutorials related to Docker, we have multiple articles that may interest you :

Also, if you are interested in Docker or DevOps, we have a complete section dedicated to it on the website, so make sure to check it out!

Complete MySQL dashboard with Grafana & Prometheus

Complete MySQL dashboard with Grafana & Prometheus | MySQL Database Monitoring using Grafana and Prometheus

If you play an important role as a system administrator or a database administrator, then monitoring your MySQL server is also a very crucial move towards diagnosing problems. To do such actions, you need real-time monitoring on active connections, locks, or queries which are operating on your database. Along with that, you also require active users in addition to average query times.

In order to perform monitoring on such metrics, we are planning to create a complete MySQL dashboard with the help of advanced software tools like Grafana and Prometheus.

If you stick with this Complete MySQL dashboard with Grafana & Prometheus tutorial, you will definitely learn how to build this dashboard and be performed with a collection of 10+ MySQL dashboards designed by Percona. Furthermore, You can bet that there will be a dashboard for your requirements!

What You Will Learn

The concepts that learners can follow from this tutorial are listed below:

  • What a basic Prometheus monitoring architecture looks like;
  • How to install and configure a Prometheus server on your Linux machine;
  • How to configure Grafana to import Percona’s MySQL dashboards in minutes.
  • How to set up the MySQL server exporter and how to bind it to Prometheus;

Enthusiastic to start learning?

MySQL, Grafana & Prometheus Architecture

Before beginning to learn MySQL Database Monitoring using Grafana and Prometheus, you should aware of what a Prometheus monitoring architecture looks like:

Our definitive guide on Prometheus will make you a clear idea on Prometheus works with exporters. Exporters are meant to bind to existing data sources to retrieve metrics from them. For instance, exporters would be the MongoDB exporter, the ElasticSearch exporter, or in our case the MySQL exporter.

Exporters are revealed as Docker images or as standalone binaries that you can run as background tasks or services. But here, we are using the MySQL exporter that is available on Prometheus’s official Github page.

The MySQL exporter binds to our MySQL instance and exposes metrics straight for Prometheus to apply.

As part of its configuration, Prometheus is going to bind to it and scrape metrics from it. From there, they will be noticeable on Grafana.

Also Check: How To Install and Configure Debian 10 Buster with GNOME

Complete MySQL dashboard with Grafana Prometheus mysql-grafana-architecture

Quite easy, right?

Now that you better understand how we will build it, let’s install all the tools you need to create your MySQL dashboard with Grafana.

Steps to Install Different Tools for MySQL Database Monitoring with Grafana and Prometheus

  • Install and configure Grafana
  • Install and configure Prometheus
  • Install a MySQL exporter
  • Creating a MySQL exporter service

Installing Prometheus

For those who are new to Prometheus, we covered Prometheus installation in our previous tutorial.

From this point, your Prometheus instance should be up and running with your configuration files on /etc/prometheus/prometheus.yml.

To verify it, head over to http://localhost:9090. You should see the Prometheus web interface running. If not, something is definitely wrong with your installation.

Complete MySQL dashboard with Grafana Prometheus prometheus-interface-1For now, no metrics are currently stored in your Prometheus instance because we did not set the MySQL exporter.

That’s what we are going to do next.

Installing the MySQL exporter

As detailed before, the MySQL exporter is available here. It is actually an official exporter created by Prometheus itself.

The MySQL exporter comes as a standalone binary, but we are going to configure it as a service.

First, create a Prometheus user on your instance if it is not already existing.

> sudo useradd -rs /bin/false prometheus

As a quick reminder, with this command, you will create a system (-r) user named Prometheus with a no shell access (-s) This is the user you will use to create your MySQL exporter service.

First, download the latest MySQL exporter binaries on your instance.

All distributions are available here. Find the one that suits your needs and run:

> wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.11.0/mysqld_exporter-0.11.0.linux-amd64.tar.gz

Now that your binaries are download, extract them in your current folder.

> tar xvzf mysqld_exporter-0.11.0.linux-amd64.tar.gz

Move the binaries to the /usr/local/bin folder that we are going to build a service out of it. You need sudo rights to perform this operation.

> cd mysqld_exporter-0.11.0.linux-amd64/
> sudo mv mysqld_exporter /usr/local/bin/

From there, you should be able to create a user for the exporter on your MySQL database.

Run the MySQL shell, and configure your database as follows (you should have the rights to grant permissions on your database).

> sudo mysql
> CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'password' WITH MAX_USER_CONNECTIONS 3;
> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';

Running those commands, you will create an exporter user with a ‘password’ password on MySQL. Now, you are going to set those variables in a configuration file.

In your /etc folder, create a new configuration file named .exporter.cnf and write the following credentials to it.

> cd /etc
> sudo vi .exporter.cnf

Set the credentials in the configuration file as follows:

[client]
user=exporter
password=password

(If you set different users or passwords in the step before, you need to reflect the changes in the configuration file.)

Creating a MySQL exporter service

Now that everything is ready, it is time to create a service for your MySQL exporter. Head over to /lib/systemd/system and create a new service file.
<pre> sudo vi /lib/systemd/system/mysql_exporter.service

Paste the following configuration into it:

[Unit]
Description=MySQL Exporter
User=prometheus

[Service]
Type=simple
Restart=always
ExecStart=/usr/local/bin/mysqld_exporter \
--config.my-cnf /etc/.exporter.cnf \
--collect.auto_increment.columns \
--collect.binlog_size \
--collect.engine_innodb_status \
--collect.engine_tokudb_status \
--collect.global_status \
--web.listen-address=0.0.0.0:9104

[Install]
WantedBy=multi-user.target

Restart your system daemon and start your service.
> sudo systemctl daemon-reload
> sudo systemctl status mysql _exporter.service

Check that your service is running by issuing the following command:

> sudo systemctl status mysql_exporter
● mysql_exporter.service - MySQL Exporter
   Loaded: loaded (/lib/systemd/system/mysql_exporter.service; disabled; vendor preset: enabled)
   Active: active (running) since Sat 2019-06-08 15:11:12 UTC; 5min ago
 Main PID: 3136 (mysqld_exporter)
    Tasks: 8 (limit: 4704)
   CGroup: /system.slice/mysql_exporter.service
           └─3136 /usr/local/bin/mysqld_exporter --config.my-cnf /etc/.exporter.cnf

Congratulations!

Your MySQL exporter is all set. Now it is time to configure Prometheus to scrape it.

Quick note: your MySQL exporter runs on port 9104 and MySQL runs on port 3306 by default.

Configuring Prometheus

Prometheus scrapes targets via its configuration file. As we added a new exporter, let’s add it to the Prometheus configuration.

Head over to the location of your Prometheus configuration file and edit it as follows:

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    static_configs:
            - targets: ['localhost:9090', 'localhost:9104']

Restart Prometheus, and head over to Prometheus Web Interface (http://localhost:9090).

Go to the targets tab, and make sure that Prometheus is correctly scrapping the MySQL exporter target.

Back to the ‘graph’ tab, in the expression text field, type ‘mysql_exporter_scrapes_total‘. If you see a result there, it means that your Prometheus instance is correctly configured to extract metrics from the exporter.

Congratulations! The entire configuration is now done!

Installing Grafana

Grafana will be used to display our metrics. The steps to install Grafana were detailed in one of our articles already, make sure to read it before you continue.

Click Here: How To Create a Grafana Dashboard? (UI + API methods)

If you are looking for a tutorial to install it on Ubuntu 18.04, I wrote a detailed Grafana installation guide for Ubuntu users.

How to Create the MySQL dashboard with Grafana

In this case, you have essentially two choices:

  • Create your own customized Grafana dashboard: tailored to your needs, you can choose what metrics you want to display and how you want to display them.
  • Use existing community dashboards and save some time.

In this tutorial, we are going to go for option two. We are going to use awesome Percona MySQL dashboards and import them right into our Grafana.

Configuring Prometheus data source

Before starting, and if you did not do it already, you need to configure Prometheus as a Grafana data source.

Create a new data source, and configure it as follows:

a – Configuring Prometheus data source prometheus-data-source

If your Prometheus instance is not running on port 9090 by default, make sure to write the correct port in the configuration.

Now that Prometheus is configured, we can browse Percona’s existing dashboards and import one of the existing MySQL dashboards in Grafana.

Percona dashboards are available here. You can also play with the existing dashboards on Percona’s own Grafana instance here.

Complete MySQL dashboard with Grafana & Prometheus dashboards-percona

In the dashboards folder of Percona’s GitHub, download the json file that you are interested in.

> wget https://github.com/percona/grafana-dashboards/blob/master/dashboards/MySQL_Overview.json

Now that your dashboard is downloaded, in Grafana, go to Dashboards > Import > Upload .json file.

Complete MySQL dashboard with Grafana & Prometheus dashboard-import

If you press “Import”, your entire MySQL dashboard will be created automatically and it will start displaying data immediately!

Complete MySQL dashboard with Grafana & Prometheus panel-grafana

This is the MySQL overview dashboard, but there are more than 20+ dashboards for you to choose from.

Here the complete list of dashboards created by Percona:

Saves a set of Grafana dashboards for database and system monitoring using Prometheus data source.

  • Amazon RDS OS metrics (CloudWatch data source)
  • Cross-Server Graphs
  • Disk Performance
  • Disk Space
  • MongoDB Cluster Summary
  • MongoDB Overview
  • MongoDB ReplSet
  • MongoDB RocksDB
  • MongoDB Wired Tiger
  • MongoDB MMAPv1
  • MongoDB InMemory
  • MySQL InnoDB Metrics
  • MySQL InnoDB Metrics Advanced
  • MySQL InnoDB Compression
  • MySQL MYISAM/Aria Metrics
  • MySQL Overview
  • MySQL Performance Schema
  • MySQL Query Response Time
  • MySQL Replication
  • MySQL Table Statistics
  • MySQL TokuDB Graphs
  • MySQL User Statistics
  • MySQL Command Handler Counters Compare
  • PXC/Galera Cluster Overview
  • PXC/Galera Graphs
  • Prometheus
  • ProxySQL Overview
  • Summary Dashboard
  • System Overview
  • Trends Dashboard

Going Further with MySQL and Grafana

If you want to dig a little bit more into the subject, many videos can help you have an in-depth understanding of how companies are building dashboards, especially MySQL dashboards with Grafana.

Very recently, in February 2019, Peter Zaitsev (CEO at Percona) made a very great talk about it. The entire business model of Percona is built on monitoring MySQL, MongoDB, and MariaDB at scale.

A second use-case is Verizon using Grafana and MySQL and demonstrated how it can optimize automation and self-service practically.

I hope that you learned something new today. If you did, make sure to leave us a comment on how you plan on using this knowledge to your own specific needs.

Until then, have fun, as always.

Docker Logs Complete Guide

Docker Logs Complete Guide | Definition of Docker Logs, Logging Strategies & Best Practices

It is very crucial for DevOps developers to keep their mind and soul in learning Docker Logs as they give insightful knowledge regarding what appeared on your container in case they crashed or if you want to monitor them.

Also, examining the Docker logs on a daily basis in your containerized environments is also essential for everyone learning about the Docker tool. Want to explore more about the main topic then just have a look at this ultimate Docker Logs Complete Guide entirely also find some of the best practices & docker logging strategies.

What Are Docker Container Logs?

On Docker, container logs can either be inspected by using the “logs” command or they can be stored on an external system (like Logstash or Syslog) in order to be analyzed later on.

When they are sent to an external system, you will need to have a logging driver installed for Docker to send its container logs.

Some particularities may exist depending on the image used.

One good example is the NGINX image actually redirecting access and error logs to some specific devices on your Linux system (/dev/stdout and /dev/stderr).

Similarly, the Apache Web Server will redirect the logs directly to specific file descriptors (/proc/self/fd/1) on your system.

However, in most cases, using the “logs” command is enough.

Where are Docker logs?

When it comes to Docker logs, you either want to inspect your container logs or the logs for the Docker daemon.

Do Refer: 

Docker Logging Strategies and Best Practices

One of the biggest challenging tasks on docker is logging. So, to handle it very easily we have compiled a few best practices of docker logging that you should memorize while working with containerized applications. These Docker logging best practices are listed in a shareable image to pass on to everyone easily:

Docker logging best practices

Docker Daemon Logs

On the other hand, the Docker daemon is also producing some logs for you to inspect.

Depending on the system you are running Docker on, you will find the Docker daemon logs in different places :

  • /var/log/daemon.log : for Debian distributions;
  • /var/log/messages : for RHEL and Oracle Linux;
  • journalctl -u docker.service: for Ubuntu 16.04+ and CentOS 7/8
  • /var/log/upstart/docker.log : for Ubuntu distributions still using upstart;
  • AppData/Local: for Windows operating systems;

As a small example, if you installed Docker on Ubuntu or Debian, you would run this command to inspect your Docker daemon logs.

$ tail -f -n 1000 /var/log/daemon.log | grep --color docker

Alternatively, if you are using systemd to manage your services, you can inspect Docker daemon logs with the journalctl command.

$ sudo journalctl -u docker.service | less

View Docker Logs using the logs option

In order to view and inspect logs on Docker, you have to use the “docker logs” command with custom options.

$ docker logs <container_id>

$ docker logs <container_name>

For example, in order to see the Docker logs from a Grafana server started from a Docker image, you would run

$ docker logs grafana

View Docker Logs using the logs option docker-logs-container

View Docker logs since a specific date

When you are inspecting your Docker logs, you often want to limit the output to a given number of lines, not to be flooded with information.

In order to view Docker logs since a specific date or duration, use the “–since” option with a date or a duration.

For example, to see Docker container logs for 10 minutes, you would write

$ docker logs --since 10m <container_id>

You can also write a date format as long as it is provided in ISO format

$ docker logs --since 2019-10-20T10:00:00 <container_id>

Note that the logs will be shown given the current date of your system and not the date of the Docker logs.

View Docker logs until a specific date

Similar to the since option, you can choose to inspect your container logs until a specific date.

You can also specify absolute dates (like an ISO date for example) or relative like 10 minutes, 30 minutes or 50 minutes.

To view logs until a specific date, use the “–until” option with a date or a duration.

$ docker logs --until 10m <container_id>

You can also provide a date format like you did before for the since option.

$ docker logs --until 2019-10-20T10:00:00 <container_id>

Tail Docker Logs using the tail option

In some cases, you want to restrict the number of lines printed on your screen from the Docker logs.

In order to achieve this result, you will have to use the “–tail” option in the following way.

$ docker logs --tail <number> <container_id|name>

For example, in order to show 100 lines from your Grafana for Docker logs, you will have to write

$ docker logs --tail 100 grafana

$ docker logs --tail 100 4921d714d338

Exporting Docker Logs using logging drivers

In some cases, you may want to export your Docker logs in order to analyze them later on.

For simple debugging needs, having the logs printed to stdout or stderr is enough.

However, if you plan on running a more complex analysis, you will need to have them stored in logging systems such as Syslog or Logstash.

This is where the concept of logging drivers comes into play.

What are logging drivers?

In Docker, logging drivers are a set of custom plugins that one can activate or install in order to export logs to an external tool such as syslog, Logstash, or custom datasources.

For example, you can choose to export your Docker logs to :

  • syslog: Docker will write its logs to the syslog daemon. Later on, those logs can be centralized and analyzed in an external system such as Kibana;
  • journald: on modern distributions, journald is the default logging option. As a consequence, Docker can export logs to it;
  • gelf: for administrators using Graylog and the GELF (Graylog Extended Log Format) format;
  • awslogs: in order to export logs to Amazon CloudWatch Logs;
  • none: if you choose not to have logs for your containers;
  • etwlogs: used to write logs to the Event Tracing for Windows;
  • fluentd: in order to write Docker logs to the fluentd daemon;
  • local: in order to store logs in a minimal format.

Now that you have an idea about logging drivers, you might be wondering what logging driver you are currently using on your system.

To find the default logging driver used in Docker, use the “docker info” command.

$ docker info | grep Logging
  Logging Driver: json-file

In most instances, the default logging driver is “json-file” unless you already modified this parameter before.

Defining a default logging driver

In order to define a default logging driver on Linux, you will need to overwrite the log-driver option located in your daemon.log configuration file.

The daemon.log file is located in the /etc/docker configuration folder on your system.

If the “daemon.log” is not existing, make sure to create it and paste the following content in it.

{
  "log-driver": "syslog"
}

Those default settings will be applied whenever you create a new container.

However, you can also create containers with “docker run” and specify the logging driver to be used instead of the default one.

Save your file and restart your Docker service for the changes to be applied.

$ sudo systemctl restart docker
$ sudo systemctl status docker

Finally, run the “docker info” command again in order to check if your logging preferences were changed.

$ docker info | grep Logging
  Logging Driver: syslog

In order to inspect the logs, you will have to navigate to the default log location on Linux : /var/log

By default, your Docker daemon will send its logs to the daemon.log in /var/log.

In order to see the logs associated to the Docker daemon, you can run the following command

$ sudo tail -f daemon.log | grep docker

However, note that defining the default logging driver does not imply that your containers will automatically send their logs to the source you specified.

Redirecting container logs to syslog

In order to make sure that logs are correctly redirected, you have to inspect your containers and the logging driver associated with them.

$ docker inspect <container_id|container_name> | grep -A 5 LogConfig

"LogConfig": {
  "Type": "syslog"
  "Config": {}
}

As you can see, the logging driver is defined to syslog, but what if I used an old container using another logging driver?

To redirect container logs to syslog, you have two options :

  • Creating another container, making sure that you defined the logging driver in the previous section
  • Run “docker run” and specify the logging driver that you want to use.

To define the logging driver for a container, you have to use “docker run” with the “–log-driver” option.

$ docker container run -it -d --log-driver syslog <image>

Then, make sure that your logging driver was correctly set by running the inspect command again.

$ docker inspect <container_id|container_name> | grep -A 5 LogConfig

In order to view Docker logs, you have to inspect the “daemon.log” file and search for your container id.

$ sudo tail -f -n1000 daemon.log | grep <container_id>

Redirecting container logs to syslog daemon-log

Redirecting container logs to journald

On modern distributions, there is also a way to inspect system logs: by using the journald utility from the systemd environment.

Some administrators may prefer to have Syslog in order to implement a centralized logging system for example.

Others may choose to have journald because of the handy commands it exposes.

In order to redirect container logs to journald, you can either define it in the daemon.log configuration or you can specify it at runtime with the “–log-driver” option.

$ docker container run -it -d --log-driver journald <image>

Next, verify that your logging driver was correctly set for the container.

$ docker inspect <container_id|container_name> | grep -A 5 LogConfig

"LogConfig": {
  "Type": "journald"
  "Config": {}
}

In order to inspect logs sent to journald, you have to run the following command

$ sudo journalctl CONTAINER_NAME=<container_name>

For example, for a container named “mycontainer”, you would run

$ sudo journalctl CONTAINER_NAME=mycontainer<

Redirecting container logs to journald blob

As you can see, this is not very handy as the data shown in the journal is not fully displayed.

To avoid having “blob data” presented when running journalctl, append the “–all” option to your command.

$ sudo journalctl --all CONTAINER_NAME=mycontainer

Redirecting container logs to journald grafana

Great!

You successfully exporter your Docker container logs to journald.

View Docker Compose Logs

In order to inspect Docker Compose logs, you can also run a simple command to have them shown on the standard output.

$ docker-compose logs

You can also choose to have the logs displayed for one service and not for your entire Docker Compose stack.

$ docker-compose logs <app>

Note that the options detailed before are also available using docker-compose.

For example, if you want to have only 100 lines of logs written for your service in docker-compose, you would write

$ docker-compose logs --tail 100 <app>

Conclusion

In this tutorial, you learned more about Docker logs and how they can be inspected, and how you can use options in order to monitor them.

You also learned about logging drivers and how they can be used in order to export Docker logs to an external logging system such as Syslog or journald.

If you are curious about Linux logging, we have a complete section on the website dedicated to Logging.

How To Install Docker on Windows 7 8 10 Home and Pro

How To Install Docker on Windows 7/8/10 Home and Pro | Docker for Windows 10 Home Tutorial for Beginners

Have you wondered initially what is docker? Docker is a computer program that functions Operating-system-level Virtualization, also called “containerization”. Whenever the containers made with docker permit you to create an isolated development environment on your system. You can find various benefits of docker from the official docker website or else from our docker tutorials.

For performing any of the tasks related to docker, firstly, you need to install docker on your windows version. If your windows version is 7/8/9 then you should download and install the Docker toolbox.

Do you want to know what the docker toolbox means, how to install the docker desktop on windows, what do you need for installation, installation instructions, and many more? checking out this entire How To Install Docker on Windows 7/8/10 Home and Pro Tutorial is the perfect solution.

What is meant by Docker?

Docker is the most popular virtualization platform available by far and it was founded in 2010 by Solomon Hykes. Software engineers make use of this docker platform widely for DevOps operations like continuous development and deployment purposes. Also, Docker attains as a reference for managing, deploying, and running applications all over the world.

What is Docker Toolbox?

Docker Toolbox is for older Mac and Windows systems that do not meet the requirements of Docker for Mac and Docker for Windows. Docker Toolbox automatically performs a Linux VM on VirtualBox that hosts Docker on your Windows system.

Prerequisites

First of all, you want to determine the current version of Windows running on your computer.

Get your current Windows version

Installations are different between the Windows Pro Edition and the Windows Home Edition as Windows Home does not ship the Hyper-V feature (necessary to run “Docker for Windows”)

To determine your current Windows version, open your control panel.

a – Get your current Windows version control-panel

Click on System and Security, and you should see this panel.

a – Get your current Windows version system

Click on the System option, and you should be able to see your system version.

As you can see, I am running Windows 10 Pro (or Enterprise), so I will be able to install Docker Desktop for Windows.

If you are running Windows 7/8 or 10 Home, you should install the Docker Toolbox.

windows-10-pro-2

Verify virtualization is enabled on Windows

In order to verify that virtualization is enabled on Windows, press Ctrl + Alt + Del and head over to the “Performance” panel.

On this panel, you should see the Virtualization option.

Make sure virtualization is enabled.

virtu-enabled

Docker heavily relies on virtualization, making it impossible to run Docker if this option is not enabled on your computer.

If you need to enable Windows virtualization, you can follow this tutorial.

Steps to Install Docker on Windows 7/8/10 Home

First, we are going to see how you can install Docker for computers that do not have the Hyper-V feature available.

For those computers, you are going to need Docker Toolbox.

a – Download the Docker Toolbox executable

First of all, head over to the official Docker Toolbox repository on Github and download the latest executable available.

At the time of this tutorial, version v19.03.1 is available.

Click on the executable available.

As you can see, the Docker Toolbox comes with multiple executables:

  • Docker: the docker binaries available to create and run containers on your computer. When installing Docker, you get a Docker CLI for you to communicate with a Docker server. This way, you can launch instructions and they will be executed on your containers;
  • Docker-machine: a tool that handles provisioning on your containers (installing packages, deleting packages, running them, etc.);
  • Docker-compose: a tool for environments containing multiple containers with different environments. This way, you can start multiple containers together, or stop them together;
  • Kitematic: a sweet graphical interface for you to control your containers on Windows and Mac;
  • Boot2Docker ISO: a tiny Linux distribution for you to run Docker on Windows;
  • VirtualBox: open-source hypervisor for Windows and used to emulate operating systems on your Windows system.

a – Download the Docker Toolbox executable docker-toolbox-exe

Save your executable somewhere appropriate, and run it to install Docker on Windows.

b – Run the Docker Toolbox installer for Windows.

First, this is what you should see when running the executable on Windows.

step-1-toolbox

Click on Next.

For the second screen, choose a location where to store your Docker files, and click on Next.

step-2-toolbox

Select the full installation on Windows, and check the Git for Windows option if not already installed on your computer.

If VirtualBox is already installed on your computer, you can uncheck this option (it won’t get reinstalled if you let the checkbox be checked)
step-3-toolbox

Click on Next.

For the additional tasks, you can let the default options and click on Next.

step-4-toolbox

Finally, click on Install to start the Docker Toolbox installation.

step-5-toolbox

After a short while, your installation should be complete. Leave the checkbox checked in order to see the shortcuts created by Docker.

step-6-toolbox

c – Run the Docker Quickstart for Windows

Now that Docker is installed, you are going to run the Docker quickstart process.

shortcuts

Double-click on the icon to start Docker Quickstart.

A terminal shell opens and starts running some commands on your computer to setup Docker properly. This setup step will create a VirtualBox VM using the Boot2Docker ISO downloaded.

docker-quickstart-step-1

Wait a few moments for the setup to complete.

After some moments, you should see Moby (the whale of Docker!), meaning that your installation was successful.

docker-toolbox-exe-2

To verify your current Docker version, run the following command

$ docker version

docker-version-1

Awesome!

You have successfully installed Docker on Windows 7/8 and 10 Home.

Steps to Install Docker on Windows 10 Pro and Enterprise

Windows 10 has the Hyper-V feature available, so we are going to install the Docker Desktop version that includes the Docker Community Edition for Windows.

a – Check your virtualization settings

First, we are going to check if Hyper-V is enabled on your machine. If not, you will have to enable it.

In the Windows search menu, type “Type Windows Features On and Off“, and select it.

a – Check your virtualization settings turn-windows-features

A window similar to this one should appear. Make sure that the Hyper V checkbox is enabled.

hyper-v-2

Restart your computer for the changes to be applied.

Nice!

Now let’s download Docker Desktop for Windows.

b – Create an account on Docker Hub

In order to download Docker Desktop, you are going to need a Docker Hub account.

Head over to the Docker Hub sign-up page, and create a new account.

docker-sign-up

Click on “Continue“.

On the next screen, fill up the information needed, and click on “Continue”.

sign-up-2

On the next screen, you are asked to verify your email. Make sure to check your inbox and to click on the verify link from Docker Hub.

check-inbox

As an example, this is the email that I received from Docker.

confirm-email

Click on “Confirm your email with Docker”, and this is what you should see.

verif-success

Click on “Go to Sign In” and sign in using the credentials you specified before.

sign-in-docker

Great! Now let’s download Docker Desktop for Windows.

c – Install Docker Desktop for Windows

In order to download Docker Desktop, head over to this page, and click on “Get Docker“.

c – Install Docker Desktop for Windows get-docker

Your browser should start download the executable file. When the download is done, click on it to start the Docker Desktop installation for Windows.

exec-1

Make sure that the executable is coming from “Docker Inc” and click on “Run”.

Initially, the Docker Desktop installer should start download some packages. This is completely normal, just wait a couple of seconds until it is done.

exec-2

When it is done, just leave the default options on the next screen, unless you want to use Windows containers in the future.

exec-3

Right after, Docker will start unpacking some files. Just wait a couple of seconds until it is done.

Finally, click on “Close and Log Out”. Make sure that you don’t have any unsaved work or important programs running in the background.
exec5
When signing back in, this is what you should see at the bottom right corner of your desktop.
exec6
Now that Docker Desktop is running, open a Powershell console and check your Docker version.
docker-version-powershell

Great, as you can see I am now running the latest stable version of Docker on my Windows 10 Pro.

d – Configuring Docker Desktop

Before using Docker Desktop for Windows 10 Pro, you may need to do some quick configuration steps for it to run as smoothly as possible.

Sharing Drives

At the bottom right corner of your screen, click on the Docker icon and select “Settings”.

Settings-docker-desktop

On the “Shared drives” section, make sure to enable some of your hard drives if you plan on writing code on them and running it inside Docker.

This can be useful in case you plan on mapping volumes for example to Docker containers. Without checking those hard drives, you might not be able to do it.
shared-drives
In my case, I will share the C: drive on my computer and click on “Apply”.
docker-share-filesystem

Docker will need to access your computer’s filesystem in order to share drives. Simply enter your account password and click on “Ok”.

Customize Docker resources

In the “Advanced” tab, make sure that you are allocating the right amount of resources to Docker for it to run smoothly.

Those are the parameters set by default but you may need to have higher specifications if Docker is running badly.

advanced-tab

Great! You learned how to install Docker Desktop on Windows 10 Pro.

Conclusion

In this tutorial, you learned how to install Docker Toolbox and Docker Desktop for the latest versions of Windows.

Now that you have a complete Docker setup, it is time for you to run your first container.

Have a look at this video in order to create your first Docker container.

How To Install Docker on Ubuntu 18.04 & Debian 10

Nowadays, the most widely used and fastest running DevOps techologies in the industry is Docker. Docker has two versions – Docker CE (Community Edition) and Docker EE (Enterprise Edition). In case you are having a small-scale project, or you’re simply learning then use Docker CE.

Docker is a machine that allows you to create, deploy, and manage lightweight, stand-alone packages named containers. This tutorial completely makes you learn How To Install Docker on Ubuntu 18.04 & Debian 10 with detailed steps along with the post-installation instructions.

Prerequisites

To learn & understand this tutorial, you will need the following requirements:

  • One Ubuntu 18.04 server set up by following the Ubuntu 18.04 initial server setup guide, including a sudo non-root user and a firewall.
  • Ubuntu 18.04 64-bit operating system
  • A user account with sudo privileges
  • Command-line/terminal (CTRL-ALT-T or Applications menu > Accessories > Terminal)
  • Docker software repositories (optional)

Also Check: How To Install Docker on Windows 7/8/10 Home and Pro

Ensure you have sudo rights

First of all, you want to make sure that you have sudo (administrative) rights on your Linux instance.

Without sudo rights, you won’t be able to install the Docker packages.

To check sudo rights, run the following command :

$ sudo -l
User devconnected may run the following commands on debian-10:
   (ALL : ALL) ALL

Now that you have sudo rights, let’s install Docker.

Steps to Install Docker using get-docker.sh (fastest)

This has to be the quickest way to install Docker on Ubuntu and Debian, yet not many tutorials describe this step.

Docker created an entire script that detects your Linux distribution, the package management system you are using (APT, YUM) in order to install Docker properly.

a – Install cURL

You will need cURL in order to download the installation script.

To install cURL on Linux, run the following command :

$ sudo apt-get update
$ sudo apt-get install curl
$ curl --version
curl 7.64.0 (x86_64-pc-linux-gnu)

b – Download the get-docker.sh script

Here is the script available. As you can see, this is a plain text script, running many commands on your system to install Docker.

By default, the “stable” version of Docker is installed.

If you want another version (nightly or test), make sure to modify the parameter in the script.

get-docker-sh-2

To download the get-docker.sh script, run the following commands.

$ curl -fsSL https://get.docker.com -o get-docker.sh
$ sh get-docker.sh

The Docker installation process should start.

Docker will automatically grab the packages it needs to install (like the apt-transport-https package or ca-certificates).

When it is done, this is what you should see on your screen.

get-docker-sh-3

Awesome, Docker is now installed on your Linux system.

c – Add the user to the docker group

In order to execute docker commands, you will need sudo rights.

However, you can add users to the docker group to avoid prefixing commands with the sudo command.

To add a user to the docker group, run the following command.

$ sudo groupadd docker
$ sudo usermod -aG docker devconnected
$ sudo reboot

d – Get the current Docker version

To verify that everything was installed correctly, you can check your current Docker version.

$ docker -v
Docker version 19.03.1, build 74b1e89

Great!

You successfully installed Docker on Ubuntu and Debian.

Make sure to read the post-installation steps in order to customize your environment for Docker.

Steps to Install Docker from Official Repository

Here are the detailed steps that should be followed by the developer while installing docker from the official repository. Just follow them carefully:

a – Update Local Database

Firstly, you should update the local database with the following command:

sudo apt-get update

b – Download Dependencies

In the next step, you’ll need to run these commands to enable your operating system to access the Docker repositories over HTTPS.

In the terminal window, type:

sudo apt-get install apt-transport-https ca-certificates curl software-properties-common

For more explanation, here’s a brief breakdown of each command:

  • apt-transport-https: Allows the package manager to transfer files and data over https
  • ca-certificates: Allows the system (and web browser) to check security certificates
  • curl: This is a tool for transferring data
  • software-properties-common: Adds scripts for managing software

c – Add Docker’s GPG Key

The GPG key is a security feature.

To ensure that the software you’re installing is authentic, enter:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add –

add docker gpg key

d – Install the Docker Repository

Just enter the following command for installation of Docker repository:

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

The command “$(lsb_release –cs)” scans and returns the codename of your Ubuntu installation – in this case, Bionic. Moreover, the final word of the command – stable– is the type of Docker release.

install docker repository

A stable release is tested and confirmed to work, but updates are released less frequently. You may substitute edge if you’d like more frequent updates, at the cost of potential instability. There are other repositories, but they are riskier.

e – Update Repositories

Update the repositories you just added:

sudo apt-get update

f – Install the Latest Version of Docker

Make use of the given command to install the latest version of docker:

sudo apt-get install docker-ce

g – Install Specific Version of Docker(Optional)

List the available versions of Docker by entering the following in a terminal window:

apt-cache madison docker-ce

list available docker version image for docker installation

The system needs to return a list of available versions as in the image above.

At this point, type the command:

sudo apt-get install docker-ce=[version]

However, substitute [version] for the version you want to install (pulled from the list you just generated).

For instance:

how to install docker using official repository example

Process to Install Docker manually on your Linux system

If you are reluctant to use the get-docker script to install Docker automatically, you can still install the packages by yourself.

Here are the steps to install Docker manually.

a – Remove old installed Docker versions

First, you need to make sure that you are not running any old versions of Docker locally.

$  sudo apt remove -y docker docker-engine docker.io contained runc

b – Set up the Docker repositories

Next, you are going to need to setup the Docker repositories, and make sure that you are downloading packages from secure and official Docker repos.

To do that, install the following packages.

$ sudo apt-get install apt-transport-https ca-certificates curl gnupg2 software-properties-common

c – Add the official Docker GPG keys

To add the official Docker GPG keys, run the following command.

$ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -

If the command is successful, the terminal should return OK.
docker-key

d – Verify the key fingerprint

In order to make sure that you grabbed the official and secure Docker key, you have to search for the fingerprint in your key.

Run the following command:

$ sudo apt-key fingerprint 0EBFCD88

pub   4096R/0EBFCD88 2017-02-22
      Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
uid                  Docker Release (CE deb) <docker@docker.com>
sub   4096R/F273FCD8 2017-02-22

Great! As you can see you got the key from the official Docker repositories.

e – Install Docker CE on your instance

In order to get the stable repository from Docker, you will need to run the following command.

$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"

The command $(lsb_release -cs) will return the name of the distribution.

lsb-release

Now that you are done, simply install docker-ce on your computer.

$ sudo apt-get update
$ sudo apt-get install docker-ce

This should install Docker as a service. To verify it, run the following command:

$ sudo systemctl status docker

docker-service

Again, add the user to the docker group in order for the user to execute docker commands without sudo.

$ sudo groupadd docker
$ sudo usermod -aG docker devconnected
$ sudo reboot

And finally, check your Docker version.

$ docker -v
Docker version 19.03.1, build 74b1e89

You now have Docker installed on your instance.

Post Installation Docker instructions

In order to have a complete and functional Docker installation on your Linux system, you will need to complete a few more steps.

On Linux, docker-machine and docker-compose don’t come automatically with your docker-ce installation.

This is a problem that you won’t have for Windows and macOS as they come bundled with the Docker Community Edition binary.

a – Install docker-machine on Linux

A Docker machine is a tool that gives you the ability to manage your Docker containers with provisioning features.

It is the utility that will handle getting the binaries from the official repositories, install them on your container and run them.

Docker CE on the other hand is a client-server architecture that allows clients to communicate with Docker servers via the Docker CLI.

To install docker-machine, run the following command.

$ sudo - i

$ curl -L https://github.com/docker/machine/releases/download/v0.16.1/docker-machine-`uname -s`-`uname -m` >/tmp/docker-machine &&
    chmod +x /tmp/docker-machine &&
    sudo cp /tmp/docker-machine /usr/local/bin/docker-machine 

$ sudo +x /usr/local/bin/docker-machine
$ exit

Want another version? All the docker-machine versions are available here.

Make sure that the docker-machine utility is correctly installed on your computer.

docker-machine

b – Install docker-compose on Linux

Again, docker-compose is not shipped by default with Docker Community Edition.

Docker-compose is a tool that lets you “compose” Docker containers – i.e running multiple containers in the same isolated environment.

To install docker-compose, run the following commands:

$ sudo - i

$ curl -L https://github.com/docker/compose/releases/download/1.24.1/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose

$ sudo +x /usr/local/bin/docker-compose
$ exit

Make sure that the docker-compose tool is correctly installed on your computer.

docker-compose

Awesome! Now that everything is ready, let’s start our first container.

c – Create your first container on Docker

For this tutorial, I am going to create a Debian 8 Jessie container running on my Debian 10 Buster instance.

Head over to https://hub.docker.com/ which is the place where most Docker container images are stored.

docker-hub

Search for Debian 8 in the search text field and click on the verified Debian option in the suggestions dropdown.

debian-official-docker-hub

By scrolling a bit, you should see all the Debian distributions available (Buster, Jessie, Wheezy etc..). For this tutorial, we are going to take a look at the Jessie distribution.

To grab the Jessie image, run the following command :

$ docker container run debian:jessie

docker-run

The docker image was successfully downloaded from the Docker Hub repositories.

You can check it by running a simple docker images command.

docker-images

The container was also successfully created by the run command, but it is inactive by default.

To see all your containers, run the following command:

$ docker container ls -a

docker-container-ls

We did not choose a name for our container when downloading it, so Docker assigned a name by default to the host (vigorous_kirch).

Time to go into our Jessie container. This is how to do it:

$ docker container start 61f66b78e140
61f66b78e140

$ docker exec -it 61f66b78e140 /bin/bash
root@61f66b78e140:/#
root@61f66b78e140:/# cat /etc/issue
Debian GNU/Linux 8

Awesome! We have a Jessie distribution running on a Debian 10 Buster one.

Conclusion

Today, we hope you have seen & learned how to install and configure Docker for Ubuntu and Debian distributions. Also, studied the post-installation steps that you must do to perform a complete Docker installation.

If you need more information, make sure to read the official Docker documentation. They provide great information about how to run commands and how to maintain your containers.