当前位置:网站首页>Rancher 2.6 new monitoring QuickStart

Rancher 2.6 new monitoring QuickStart

2022-06-23 11:14:00 InfoQ

Author's brief introduction
Wan Shaoyuan ,CNCF  Official foundation certification  Kubernetes CKA&CKS  The engineer , Cloud native solution architect . Yes  ceph、Openstack、Kubernetes、prometheus  Technology and other cloud native related technologies . Participated in the design and implementation of multiple financial projects 、 insurance 、 Manufacturing and other industries  IaaS  and  PaaS  Platform design and application cloud native transformation guidance .

General   Statement

Rancher 2.6  The monitoring enabling method is quite different from the previous version , Belong to the original  Prometheus-Operator, By abstracting some  Kubernetes CRD  resources , It can better integrate the monitoring and alarm functions , Improve ease of use .Prometheus-operator  Including the following  CRD  Resource objects :

  • PrometheusRules: Define alarm rules
  • Alert Managers:Altermanager  start-up  CRD, be used for  Altermanager  Boot copy
  • Receivers: Configure alarm receiving media  CRD
  • Routers: Match the alarm rules with the alarm media
  • ServiceMonitor: Definition  Prometheus  Address of the collected monitoring indicators
  • Pod Monitor: A more granular pair  POD  monitor

null

Configuration and use

Enable monitoring
Switch to the corresponding cluster , Choose the lower left corner  clusterTools  Enable  Prometheus:



Deploy to  System  In the project , Check custom  helm  Parameters :

null
Modify deployment requirements according to actual requirements :

null
null
If you need to dock with remote storage ( Such as  infuxdb), It needs to be modified  yaml  To configure , And configure the pointing  influxdb:

remoteRead:
 - url: http://192.168.0.7:8086/api/v1/prom/read?db=prometheusremoteWrite:
 - url: http://192.168.0.7:8086/api/v1/prom/write?db=prometheus

Default  node-exporter  resources  limit  Low configuration , It is easy to be  OOM KILL  fall , You need to modify the default memory limit to  150Mi:

 podLabels:
 jobLabel: node-exporter
 resources:
 limits:
 cpu: 200m
 memory: 150Mi
 requests:
 cpu: 100m
 memory: 30Mi

You can click the following page to enter the corresponding component configuration page , Such as :

  • Altermanager: Enter the alarm information view page
  • Grafana: View the monitoring data icon
  • Prometheus Graph:Prometheus  Expression execution page
  • Prometheus Rules: see  Prometheus  Configured alarm expression page
  • Prometheus Targets: Monitoring and data collection

null
On the cluster overview page, you can see the corresponding indicator monitoring items :

null
At the cluster level :

  • cpu  usage
  • Cluster node load
  • Memory usage
  • Disk usage
  • disk  IO
  • Network Traffic
  • Network IO

Kubernetes  Components :

  • ApiServer  Request rate
  • Controller-Manager  Queue depth
  • POD scheduler  Scheduling status
  • Ingress-Controller  The number of connections

ETCD  monitor :

  • Leader  Election status
  • Leader  The number of elections
  • GRPC Client Traffic
  • ETCD  Data usage capacity
  • Active Streams
  • RPC  rate
  • Disk data synchronization time

At the same time, in each deployment  POD  They also contain corresponding monitoring items :

null
Configure custom monitoring indicators
Enabling monitoring by default will automatically add some  ServiceMonitor  Monitoring rules and  Prometheus Rules  Alarm rules , Mainly for platform component monitoring and node status monitoring and alarm in the cluster .

If these monitoring indicators do not meet your needs , You can add... Manually . For example  java  Applied  jmx  monitor ,Jmx  There is an official  prometheus-export, We just need to  jar  Package download let  java  Application loading  jar  Package and load its configuration .

Take an application as an example , The overall process is as follows :

  • utilize  JMX exporter, stay  Java  Start a small  Http server
  • To configure  Prometheus  Grab that  Http server  Provided  metrics
  • To configure  Grafana  Connect  Prometheus, To configure  Dashboard

First , You need to create a folder :

mkdir -p /Dockerfile/jmx-exporter/
then , download  jmx-export.jar  Put the package in this directory :

https://github.com/prometheus/jmx_exporter
https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar

And then , To write  jvm-export  Configuration file placement /root/jmx-exporter/ Catalog , establish  simple-config.yml  The contents are as follows :

---
rules:
- pattern: ".*"

This means to capture all monitoring information . take  jvm-export  Integrated into the  tomcat  in , Rewrite  Dockerfile:

FROM tomcat
COPY ./jmx_prometheus_javaagent-0.12.0.jar /jmx_prometheus_javaagent-0.12.0.jar
ENV CATALINA_OPTS="-Xms64m -Xmx128m -javaagent:/jmx-exporter/jmx_prometheus_javaagent-0.12.0.jar=6060:/jmx-exporter/simple-config.yml"

again  docker build,build  Then perform the following  docker run  The command can view the collected monitoring indicators , here  6060  The port is ours  jmx-export  port :

docker build -t tomcat:v1.0 .
docker run -itd -p 8080:8080 -p 6060:6060 tomcat:v1.0

Visit to view :http://host_ip:6060

null
Deploy to  Rancher  platform :

null
to  Service  In the play  label, be used for  ServiceMonitor  relation :

kubectl label svc tomcat app=tomcat
establish  ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
 name: tomcat-app namespace: defaultspec:
 endpoints:
 - port: exporter
 selector:
 matchLabels:
 app: tomcat

After the creation is successful, pass  Prometheus  You can view the corresponding  Target:

null
The corresponding monitoring indicators have also been captured :

null
Get into  grafana  Page add  dashboard, The default account password is  admin/prom-operator:

null
null
add to  dashboard, Input  dashboard-id,8878, The offline environment needs to be  Dashboard  Download it , adopt  json  Mode import :

null
Configure alarms
PrometheusRule  Used to define alarm rules , By default, some alarm strategies for platform components and nodes have been included . Can be configured by  Root  and  Receivers  Configure alarm media to notify corresponding personnel of corresponding alarms . use  Routing Tree  The alarm structure can quickly classify the alarms , Then send it to the designated person for processing .

Receivers  Configure the alarm media , For example, fill in  SMTP  Address and configured account number / password , Default received mailbox :

null
Routes  Configure the alarm media to match the alarm rules , Created by default  root  The rules , Used to match all alarm rules , Configure the corresponding alarm media created :

null
At this time, all alarm rules will be sent to the configured alarm media , To subdivide alarm rules, create new  Routes  adopt  Label  And  Prometheus Rules  The corresponding  Alter name  docking .

If it matches  alert:etcdNoLeader  This alarm rule :

null
null
Regular expressions can also be used to match multiple rules, such as :

null
Grouping  The configuration is mainly used for alarm rule classification 、 Suppress and avoid the interference of a large number of useless alarms :

  • group_by: Used to configure alarm groups , Achieve alarm suppression effect , The same  group  The alarms of will only be aggregated and sent once . for example  host01  The database is running on , Then the corresponding alarm includes  host down、mysql down. They are configured in a  group  Inside , So if  host down  The corresponding  mysql  It must be  down  了 , Well, because they are configured in a  group  in , therefore  host down  and  mysql down  The alarms will be aggregated and sent out .
  • group_wait: New  AlterGroup  How long to wait before the first alarm is triggered .
  • group_interval:AlterGroup  Different alarm triggering intervals generated in .
  • repeat_interval:AlterGroup  If the alarm is the same all the time ,Altermanager  In order to avoid long-term interference , Waiting time for alarm de duplication .

After the match , Alarm triggered , You can receive the corresponding alarm email :

null
Custom alarm
When the default alarm rules cannot meet the requirements , You can add custom alarms according to the actual situation , In fact, add the corresponding  PrometheusRule. For example: , add to  pod  Not  running  Status alarm .

UI  To configure :

null
Corresponding  yaml  To configure :

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
 name: podmonitor
 namespace: cattle-monitoring-system
spec:
 groups:
 - name: pod_node_ready
 rules:
 - alert: pod_not_ready
 annotations:
 message: '{{ $labels.namespace }}/{{ $labels.pod }} is not ready.'
 expr: 'sum by (namespace, pod) (kube_pod_status_phase{phase!~"Running|Succeeded"})
 > 0 '
 for: 180s
 labels:
 severity:  serious

  • for: Indicates the duration
  • message: Indicates the information in the alarm notification
  • label.severity: Indicates the alarm level
  • expr: Index get expression

Configure alarm receiver :

null
Match to this according to the tag  PrometheusRule:

null
null
Reference link :
https://mp.weixin.qq.com/s/fT-AXnPP8rrWxTposbi-9A
https://github.com/prometheus-operator/prometheus-operator
https://rancher.com/docs/rancher/v2.6/en/monitoring-alerting/guides/enable-monitoring/
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206231100460841.html