当前位置:网站首页>Cloud native monitoring configuration self built alertmanager to realize alarm
Cloud native monitoring configuration self built alertmanager to realize alarm
2022-06-24 17:15:00 【Nieweixing】
At present k8s The main monitoring software of is prometheus, In order to better monitor the tke colony , Tencent cloud also launched prometheus Service for , It's called cloud native monitoring , Cloud native monitoring can monitor our tke colony , Of course, it also supports configuring alarms , The alarm of cloud native monitoring is also adopted alertmanager, Self built and default configurations are supported here , If you don't deploy yourself alertmanager, Cloud native monitoring will deploy one in the background alertmanager To configure and generate alarms , But the default deployment alertmanager To adapt to Tencent cloud , For the time being, only Tencent cloud's message generation channels and webhook.
But sometimes we need to send the alarm to our own chat software , Such as slack, Enterprise WeChat , Mailbox, etc , So here we need to use self built alertmanager To implement the , Today, let's talk about how to configure self built in cloud native monitoring alertmanager An alarm occurs on our enterprise wechat .
1. Deploy alertmanager
First, we deploy a in our cluster alertmanager, And then through an intranet LoadBalancer type service To expose the services provided to the cloud native monitoring instance for calling .
apiVersion: apps/v1 kind: Deployment metadata: name: alertmanager namespace: monitor spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: alertmanager qcloud-app: alertmanager strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 0 type: RollingUpdate template: metadata: creationTimestamp: null labels: k8s-app: alertmanager qcloud-app: alertmanager spec: containers: - args: - --config.file=/etc/alertmanager/config.yml - --storage.path=/alertmanager/data image: prom/alertmanager:v0.15.3 imagePullPolicy: Always name: alertmanager resources: limits: cpu: 500m memory: 1Gi requests: cpu: 250m memory: 256Mi securityContext: privileged: false terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/alertmanager name: alertcfg dnsPolicy: ClusterFirst imagePullSecrets: - name: qcloudregistrykey restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - configMap: defaultMode: 511 name: alertmanager name: alertcfg
You also need to deploy the corresponding alertmanager Of configmap, Here, you need to configure the enterprise wechat channel for receiving alarm messages , Specific enterprise application methods can be Baidu , The corresponding enterprise wechat application secret key can be obtained by referring to the following notes , Here I have applied for a personal enterprise wechat to test alarm reception .
apiVersion: v1 data: config.yml: | global: resolve_timeout: 5m route: group_by: ['alertname'] group_interval: 1m group_wait: 10s receiver: default-receiver repeat_interval: 1m receivers: - name: default-receiver wechat_configs: - corp_id: 'ww0c31105f29c8' # Enterprise information (" My business "--->"CorpID"[ At the bottom ]) to_user: '@all' # Everyone is @all, Or a designated person agent_id: '100002' # Enterprise WeChat (" Enterprise applications "-->" Custom application "[Prometheus]--> "AgentId") api_secret: 'BXllYvWYXBy4HH9itlPzd9T-e2JfWP9E' # Enterprise WeChat (" Enterprise applications "-->" Custom application "[Prometheus]--> "Secret") send_resolved: true # When the problem is solved, send a message kind: ConfigMap metadata: labels: addonmanager.kubernetes.io/mode: EnsureExists kubernetes.io/cluster-service: "true" name: alertmanager namespace: monitor
Here we attach to 163 Configuration of mailbox alarm , If you want to use the email to accept the alarm , You can use this cm To configure .
apiVersion: v1 data: config.yml: | global: resolve_timeout: 5m smtp_smarthost: 'smtp.163.com:25' smtp_from: '[email protected]' smtp_auth_username: '[email protected]' smtp_auth_password: 'HYLVOJCTU' # The password here is the authorization code of the email , You can go to the mailbox settings to get smtp_require_tls: false route: group_by: ['alertname'] group_interval: 1m group_wait: 10s receiver: default-receiver repeat_interval: 1m receivers: - name: default-receiver email_configs: - to: "[email protected]" kind: ConfigMap metadata: labels: addonmanager.kubernetes.io/mode: EnsureExists kubernetes.io/cluster-service: "true" name: alertmanager namespace: monitor
Here to alertmanager Deploy a service Provide access to cloud native monitoring instances ,service After deployment ,alertmanager The access portal for is 10.0.0.143:9093
apiVersion: v1 kind: Service metadata: annotations: service.cloud.tencent.com/direct-access: "true" service.kubernetes.io/loadbalance-id: lb-n1jjuq service.kubernetes.io/qcloud-loadbalancer-clusterid: cls-b3mg1p92 service.kubernetes.io/qcloud-loadbalancer-internal-subnetid: subnet-ktam6hp8 name: alertmanager namespace: monitor spec: clusterIP: 172.16.56.208 externalTrafficPolicy: Cluster ports: - name: 9093-9093-tcp nodePort: 32552 port: 9093 protocol: TCP targetPort: 9093 selector: k8s-app: alertmanager qcloud-app: alertmanager sessionAffinity: None type: LoadBalancer status: loadBalancer: ingress: - ip: 10.0.0.143
Here we built it by ourselves alertmanager The deployment is complete , Let's deploy the corresponding cloud native monitoring instance .
2. Create a cloud native monitoring instance
On the console of the container service, click cloud native monitoring to create an instance , Here you need to click Advanced settings , Then click Add alertmanager, Enter your deployed alertmanager Of service Access portal 10.0.0.143.9093.
It should be noted here that if you select the default deployment when creating cloud native monitoring alertmanager, The interface switch to self built is not supported yet alertmanager, If you need to switch, you need to submit the work order to the Engineer for switching , Therefore, it is recommended to select self built when creating alertmanager.
After the instance is created , In the basic information of the instance, the self built configuration will be displayed alertmanager and prometheus And so on
3. relation tke colony
After the cloud native monitoring instance is created , Actually prometheus The service does not monitor any k8s colony , We need to tke Cluster to join our cloud native monitoring for data collection , We associate our in an association cluster tke Just cluster .
After the cluster is associated , We can see our associated cluster information on the console , You can click on the target Go to check whether the collection status is healthy
We can also go to prometheus The query interface of is used to query data , look down tke Whether the monitoring of the cluster has collected prometheus.
Click data query , If there is a result returned , explain prometheus collection tke The monitoring data of the cluster is successful .
4. Configure alarms
Let's write and configure alarm rules , Let's test the alarm of node memory utilization , In order to better trigger the alarm , The memory utilization of the nodes here exceeds 10%, Let's call the police , First of all we have prometheus Of ui Page progress promsql Write alarm rules .
100 - (node_memory_MemFree_bytes{endpoint !="target"}+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 10
Here we can use the above sql The query shows that the memory utilization rate is greater than 10% The node of , Next, we go to the alarm configuration console of cloud native monitoring to configure alarms
- Rule name : Name of alarm rule , No more than 40 Characters .
- PromQL: Alarm rule statement .
- The duration of the : The time when the conditions described in the above statement are met , Reaching this duration will trigger an alarm .
- Label: Add... To each rule Prometheus label .
- Alarm content : After the alarm is triggered, the specific contents of the alarm notification are sent through e-mail or SMS , The alarms configured here are as follows
{{$labels.cluster}} Of {{$labels.instance}} The memory of the node exceeds the alarm threshold 10% , The current memory usage is {{$value}} , Please pay attention to and deal with it in time !!!
5. Enterprise wechat viewing alarm
[FIRING:3] NodeMemoryUsage (node.rules cls-b3mg1p92 tke node-exporter kube-system mem alert-7pjasfmm tke-node-exporter) node.rules test tke colony node Node memory alarm alert-7pjasfmm Alerts Firing: Labels: - alertname = NodeMemoryUsage - alertName = node.rules - cluster = cls-b3mg1p92 - cluster_type = tke - instance = 10.0.0.10 - job = node-exporter - namespace = kube-system - node = mem - notification = alert-7pjasfmm - pod = tke-node-exporter-xnfvb - service = tke-node-exporter Annotations: - alertName = node.rules - content = cls-b3mg1p92 Of 10.0.0.10 The memory of the node exceeds the alarm threshold 10% , The current memory usage is 52.578219305872885 , Please pay attention to and deal with it in time !!! - describe = test tke colony node Node memory alarm - notification = alert-7pjasfmm Source: /graph?g0.expr=100+-+%28node_memory_MemFree_bytes%7Bendpoint%21%3D%22target%22%7D+%2B+node_memory_Cached_bytes+%2B+node_memory_Buffers_bytes%29+%2F+node_memory_MemTotal_bytes+%2A+100+%3E+10&g0.tab=1 Labels: - alertname = NodeMemoryUsage - alertName = node.rules - cluster = cls-b3mg1p92 - cluster_type = tke - instance = 10.0.0.157 - job = node-exporter - namespace = kube-system - node = mem - notification = alert-7pjasfmm - pod = tke-node-exporter-vcnjl - service = tke-node-exporter Annotations: - alertName = node.rules - content = cls-b3mg1p92 Of 10.0.0.157 The memory of the node exceeds the alarm threshold 10% , The current memory usage is 34.298334259939 , Please pay attention to and deal with it in time !!! - describe = test tke colony node Node memory alarm - notification = alert-7pjasfmm Source: /graph?g0.expr=100+-+%28node_memory_MemFree_bytes%7Bendpoint%21%3D%22target%22%7D+%2B+node_memory_Cached_bytes+%2B+node_memory_Buffers_bytes%29+%2F+node_memory_MemTotal_bytes+%2A+100+%3E+10&g0.tab=1 Labels: - alertname = NodeMemoryUsage - alertName = node.rules - cluster = cls-b3mg1p92 - cluster_type = tke - instance = 10.0.0.3 - job = node-exporter - namespace = kube-system - node = mem - notification = alert-7pjasfmm - pod = tke-node-exporter-vpcmf - service = tke-node-exporter Annotations: - alertName = node.rules - content = cls-b3mg1p92 Of 10.0.0.3 The memory of the node exceeds the alarm threshold 10% , The current memory usage is 31.307402547932455 , Please pay attention to and deal with it in time !!!
We go to our enterprise wechat prometheus Check whether the alarm occurs , Check whether you can receive the alarm information , It shows that we have successfully passed the self built alertmanager When an alarm occurs, the enterprise wechat succeeds .
6. Email receiving alarm
Next, we change the threshold value of the alarm to memory utilization exceeding 50%, Clustered alertmanagerconfigmap Change to the configuration information of the mailbox , Use the email to accept the alarm and see what it looks like , By default, only one node should send an alarm , Let's test .
From the above promsql Look at the query results , Only 10.0.0.10 Memory usage exceeded 50%, So our email only received 10.0.0.10 The alarm email of this node , Explain through alertmanager Sending alarm to our mailbox has been successful .
边栏推荐
- Quick view of product trends in February 2021
- 主链系统发展解析
- How to collect and define project requirements in the early stage of EDI project implementation?
- FPGA systematic learning notes serialization_ Day8 [design of 4-bit multiplier and 4-bit divider]
- Robot toolbox matlab robotics toolbox
- 5g brings opportunities and challenges. Are you ready to defend against DDoS?
- H265/webvr video web page without plug-in player easyplayer Solution to the problem of cumulative delay of FLV video played by JS
- Solution to the problem that kibana's map cannot render longitude and latitude coordinate data
- Kubernetes 1.20.5 setting up Sentinel
- Video intelligent analysis platform easycvr derivative video management platform menu bar small screen adaptive optimization
猜你喜欢
Why do you develop middleware when you are young? "You can choose your own way"
[leetcode108] convert an ordered array into a binary search tree (medium order traversal)
Daily algorithm & interview questions, 28 days of special training in large factories - the 15th day (string)
MySQL learning -- table structure of SQL test questions
随机推荐
5g brings opportunities and challenges. Are you ready to defend against DDoS?
FPGA systematic learning notes serialization_ Day10 [sequential logic, competitive adventure, synchronous reset, asynchronous reset]
Tensor and tensor network background and significance - basic knowledge
Zblog determines whether a plug-in installs the enabled built-in function code
Create a green city and 3D visualization of digital twin natural gas stations
Today, Tencent safety and SAIC Group officially announced!
API documents are simple and beautiful. It only needs three steps to open
Tencent security officially released the IOT security capability map
[play Tencent cloud] experience and development of game multimedia engine (II)
Yupi made an AI programming nickname generator!
中金证券靠谱吗?是否合法?开股票账户安全吗?
Complete the log service CLS questionnaire in 1 minute and receive the Tencent cloud 30 yuan threshold free voucher ~
实现TypeScript运行时类型检查
Implement typescript runtime type checking
Introduction to koa (II) building the koa program
QQ domain name detection API interface sharing (with internal access automatic jump PHP code)
Tiktok Kwai, e-commerce enters the same river
[play with Tencent cloud] check 9 popular Tencent cloud products
网站SEO排名越做越差是什么原因造成的?
Jmeter+grafana+influxdb build a visual performance test monitoring platform