当前位置：网站首页>Rancher 2.6 new monitoring QuickStart

Rancher 2.6 new monitoring QuickStart

2022-06-23 11:14:00 【InfoQ】

Author's brief introduction
Wan Shaoyuan ,CNCF Official foundation certification Kubernetes CKA&CKS The engineer , Cloud native solution architect . Yes ceph、Openstack、Kubernetes、prometheus Technology and other cloud native related technologies . Participated in the design and implementation of multiple financial projects 、 insurance 、 Manufacturing and other industries IaaS and PaaS Platform design and application cloud native transformation guidance .

General Statement

Rancher 2.6 The monitoring enabling method is quite different from the previous version , Belong to the original Prometheus-Operator, By abstracting some Kubernetes CRD resources , It can better integrate the monitoring and alarm functions , Improve ease of use .Prometheus-operator Including the following CRD Resource objects ：

PrometheusRules： Define alarm rules

Alert Managers：Altermanager start-up CRD, be used for Altermanager Boot copy

Receivers： Configure alarm receiving media CRD

Routers： Match the alarm rules with the alarm media

ServiceMonitor： Definition Prometheus Address of the collected monitoring indicators

Pod Monitor： A more granular pair POD monitor

Configuration and use

Enable monitoring

Switch to the corresponding cluster , Choose the lower left corner clusterTools Enable Prometheus：

Deploy to System In the project , Check custom helm Parameters ：

Modify deployment requirements according to actual requirements ：

If you need to dock with remote storage （ Such as infuxdb）, It needs to be modified yaml To configure , And configure the pointing influxdb：

remoteRead:
 - url: http://192.168.0.7:8086/api/v1/prom/read?db=prometheusremoteWrite:
 - url: http://192.168.0.7:8086/api/v1/prom/write?db=prometheus

Default node-exporter resources limit Low configuration , It is easy to be OOM KILL fall , You need to modify the default memory limit to 150Mi：

 podLabels:
 jobLabel: node-exporter
 resources:
 limits:
 cpu: 200m
 memory: 150Mi
 requests:
 cpu: 100m
 memory: 30Mi

You can click the following page to enter the corresponding component configuration page , Such as ：

Altermanager： Enter the alarm information view page

Grafana： View the monitoring data icon

Prometheus Graph：Prometheus Expression execution page

Prometheus Rules： see Prometheus Configured alarm expression page

Prometheus Targets： Monitoring and data collection

On the cluster overview page, you can see the corresponding indicator monitoring items ：

At the cluster level ：

cpu usage

Cluster node load

Memory usage

Disk usage

disk IO

Network Traffic

Network IO

Kubernetes Components ：

ApiServer Request rate

Controller-Manager Queue depth

POD scheduler Scheduling status

Ingress-Controller The number of connections

ETCD monitor ：

Leader Election status

Leader The number of elections

GRPC Client Traffic

ETCD Data usage capacity

Active Streams

RPC rate

Disk data synchronization time

At the same time, in each deployment POD They also contain corresponding monitoring items :

Configure custom monitoring indicators

Enabling monitoring by default will automatically add some ServiceMonitor Monitoring rules and Prometheus Rules Alarm rules , Mainly for platform component monitoring and node status monitoring and alarm in the cluster .

If these monitoring indicators do not meet your needs , You can add... Manually . For example java Applied jmx monitor ,Jmx There is an official prometheus-export, We just need to jar Package download let java Application loading jar Package and load its configuration .

Take an application as an example , The overall process is as follows ：

utilize JMX exporter, stay Java Start a small Http server

To configure Prometheus Grab that Http server Provided metrics

To configure Grafana Connect Prometheus, To configure Dashboard

First , You need to create a folder ：

mkdir -p /Dockerfile/jmx-exporter/

then , download jmx-export.jar Put the package in this directory ：

https://github.com/prometheus/jmx_exporter
https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar

And then , To write jvm-export Configuration file placement /root/jmx-exporter/ Catalog , establish simple-config.yml The contents are as follows ：

---
rules:
- pattern: &quot;.*&quot;

This means to capture all monitoring information . take jvm-export Integrated into the tomcat in , Rewrite Dockerfile：

FROM tomcat
COPY ./jmx_prometheus_javaagent-0.12.0.jar /jmx_prometheus_javaagent-0.12.0.jar
ENV CATALINA_OPTS=&quot;-Xms64m -Xmx128m -javaagent:/jmx-exporter/jmx_prometheus_javaagent-0.12.0.jar=6060:/jmx-exporter/simple-config.yml&quot;

again docker build,build Then perform the following docker run The command can view the collected monitoring indicators , here 6060 The port is ours jmx-export port ：

docker build -t tomcat:v1.0 .
docker run -itd -p 8080:8080 -p 6060:6060 tomcat:v1.0

Visit to view ：http://host_ip:6060

Deploy to Rancher platform ：

to Service In the play label, be used for ServiceMonitor relation ：

kubectl label svc tomcat app=tomcat

establish ServiceMonitor：

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
 name: tomcat-app namespace: defaultspec:
 endpoints:
 - port: exporter
 selector:
 matchLabels:
 app: tomcat

After the creation is successful, pass Prometheus You can view the corresponding Target：

The corresponding monitoring indicators have also been captured ：

Get into grafana Page add dashboard, The default account password is admin/prom-operator：

add to dashboard, Input dashboard-id,8878, The offline environment needs to be Dashboard Download it , adopt json Mode import ：

Configure alarms

PrometheusRule Used to define alarm rules , By default, some alarm strategies for platform components and nodes have been included . Can be configured by Root and Receivers Configure alarm media to notify corresponding personnel of corresponding alarms . use Routing Tree The alarm structure can quickly classify the alarms , Then send it to the designated person for processing .

Receivers Configure the alarm media , For example, fill in SMTP Address and configured account number / password , Default received mailbox ：

Routes Configure the alarm media to match the alarm rules , Created by default root The rules , Used to match all alarm rules , Configure the corresponding alarm media created ：

At this time, all alarm rules will be sent to the configured alarm media , To subdivide alarm rules, create new Routes adopt Label And Prometheus Rules The corresponding Alter name docking .

If it matches alert:etcdNoLeader This alarm rule ：

Regular expressions can also be used to match multiple rules, such as ：

Grouping The configuration is mainly used for alarm rule classification 、 Suppress and avoid the interference of a large number of useless alarms ：

group_by： Used to configure alarm groups , Achieve alarm suppression effect , The same group The alarms of will only be aggregated and sent once . for example host01 The database is running on , Then the corresponding alarm includes host down、mysql down. They are configured in a group Inside , So if host down The corresponding mysql It must be down 了 , Well, because they are configured in a group in , therefore host down and mysql down The alarms will be aggregated and sent out .

group_wait： New AlterGroup How long to wait before the first alarm is triggered .

group_interval：AlterGroup Different alarm triggering intervals generated in .

repeat_interval：AlterGroup If the alarm is the same all the time ,Altermanager In order to avoid long-term interference , Waiting time for alarm de duplication .

After the match , Alarm triggered , You can receive the corresponding alarm email ：

Custom alarm

When the default alarm rules cannot meet the requirements , You can add custom alarms according to the actual situation , In fact, add the corresponding PrometheusRule. For example: , add to pod Not running Status alarm .

UI To configure ：

Corresponding yaml To configure ：

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
 name: podmonitor
 namespace: cattle-monitoring-system
spec:
 groups:
 - name: pod_node_ready
 rules:
 - alert: pod_not_ready
 annotations:
 message: '{{ $labels.namespace }}/{{ $labels.pod }} is not ready.'
 expr: 'sum by (namespace, pod) (kube_pod_status_phase{phase!~&quot;Running|Succeeded&quot;})
 > 0 '
 for: 180s
 labels:
 severity:  serious