当前位置:网站首页>Cloud native monitoring configuration self built alertmanager to realize alarm
Cloud native monitoring configuration self built alertmanager to realize alarm
2022-06-24 17:15:00 【Nieweixing】
At present k8s The main monitoring software of is prometheus, In order to better monitor the tke colony , Tencent cloud also launched prometheus Service for , It's called cloud native monitoring , Cloud native monitoring can monitor our tke colony , Of course, it also supports configuring alarms , The alarm of cloud native monitoring is also adopted alertmanager, Self built and default configurations are supported here , If you don't deploy yourself alertmanager, Cloud native monitoring will deploy one in the background alertmanager To configure and generate alarms , But the default deployment alertmanager To adapt to Tencent cloud , For the time being, only Tencent cloud's message generation channels and webhook.
But sometimes we need to send the alarm to our own chat software , Such as slack, Enterprise WeChat , Mailbox, etc , So here we need to use self built alertmanager To implement the , Today, let's talk about how to configure self built in cloud native monitoring alertmanager An alarm occurs on our enterprise wechat .
1. Deploy alertmanager
First, we deploy a in our cluster alertmanager, And then through an intranet LoadBalancer type service To expose the services provided to the cloud native monitoring instance for calling .
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
namespace: monitor
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: alertmanager
qcloud-app: alertmanager
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: alertmanager
qcloud-app: alertmanager
spec:
containers:
- args:
- --config.file=/etc/alertmanager/config.yml
- --storage.path=/alertmanager/data
image: prom/alertmanager:v0.15.3
imagePullPolicy: Always
name: alertmanager
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 256Mi
securityContext:
privileged: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager
name: alertcfg
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: qcloudregistrykey
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 511
name: alertmanager
name: alertcfgYou also need to deploy the corresponding alertmanager Of configmap, Here, you need to configure the enterprise wechat channel for receiving alarm messages , Specific enterprise application methods can be Baidu , The corresponding enterprise wechat application secret key can be obtained by referring to the following notes , Here I have applied for a personal enterprise wechat to test alarm reception .
apiVersion: v1
data:
config.yml: |
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_interval: 1m
group_wait: 10s
receiver: default-receiver
repeat_interval: 1m
receivers:
- name: default-receiver
wechat_configs:
- corp_id: 'ww0c31105f29c8' # Enterprise information (" My business "--->"CorpID"[ At the bottom ])
to_user: '@all' # Everyone is @all, Or a designated person
agent_id: '100002' # Enterprise WeChat (" Enterprise applications "-->" Custom application "[Prometheus]--> "AgentId")
api_secret: 'BXllYvWYXBy4HH9itlPzd9T-e2JfWP9E' # Enterprise WeChat (" Enterprise applications "-->" Custom application "[Prometheus]--> "Secret")
send_resolved: true # When the problem is solved, send a message
kind: ConfigMap
metadata:
labels:
addonmanager.kubernetes.io/mode: EnsureExists
kubernetes.io/cluster-service: "true"
name: alertmanager
namespace: monitorHere we attach to 163 Configuration of mailbox alarm , If you want to use the email to accept the alarm , You can use this cm To configure .
apiVersion: v1
data:
config.yml: |
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:25'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'HYLVOJCTU' # The password here is the authorization code of the email , You can go to the mailbox settings to get
smtp_require_tls: false
route:
group_by: ['alertname']
group_interval: 1m
group_wait: 10s
receiver: default-receiver
repeat_interval: 1m
receivers:
- name: default-receiver
email_configs:
- to: "[email protected]"
kind: ConfigMap
metadata:
labels:
addonmanager.kubernetes.io/mode: EnsureExists
kubernetes.io/cluster-service: "true"
name: alertmanager
namespace: monitorHere to alertmanager Deploy a service Provide access to cloud native monitoring instances ,service After deployment ,alertmanager The access portal for is 10.0.0.143:9093
apiVersion: v1
kind: Service
metadata:
annotations:
service.cloud.tencent.com/direct-access: "true"
service.kubernetes.io/loadbalance-id: lb-n1jjuq
service.kubernetes.io/qcloud-loadbalancer-clusterid: cls-b3mg1p92
service.kubernetes.io/qcloud-loadbalancer-internal-subnetid: subnet-ktam6hp8
name: alertmanager
namespace: monitor
spec:
clusterIP: 172.16.56.208
externalTrafficPolicy: Cluster
ports:
- name: 9093-9093-tcp
nodePort: 32552
port: 9093
protocol: TCP
targetPort: 9093
selector:
k8s-app: alertmanager
qcloud-app: alertmanager
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: 10.0.0.143Here we built it by ourselves alertmanager The deployment is complete , Let's deploy the corresponding cloud native monitoring instance .
2. Create a cloud native monitoring instance
On the console of the container service, click cloud native monitoring to create an instance , Here you need to click Advanced settings , Then click Add alertmanager, Enter your deployed alertmanager Of service Access portal 10.0.0.143.9093.
It should be noted here that if you select the default deployment when creating cloud native monitoring alertmanager, The interface switch to self built is not supported yet alertmanager, If you need to switch, you need to submit the work order to the Engineer for switching , Therefore, it is recommended to select self built when creating alertmanager.
After the instance is created , In the basic information of the instance, the self built configuration will be displayed alertmanager and prometheus And so on
3. relation tke colony
After the cloud native monitoring instance is created , Actually prometheus The service does not monitor any k8s colony , We need to tke Cluster to join our cloud native monitoring for data collection , We associate our in an association cluster tke Just cluster .
After the cluster is associated , We can see our associated cluster information on the console , You can click on the target Go to check whether the collection status is healthy
We can also go to prometheus The query interface of is used to query data , look down tke Whether the monitoring of the cluster has collected prometheus.
Click data query , If there is a result returned , explain prometheus collection tke The monitoring data of the cluster is successful .
4. Configure alarms
Let's write and configure alarm rules , Let's test the alarm of node memory utilization , In order to better trigger the alarm , The memory utilization of the nodes here exceeds 10%, Let's call the police , First of all we have prometheus Of ui Page progress promsql Write alarm rules .
100 - (node_memory_MemFree_bytes{endpoint !="target"}+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 10Here we can use the above sql The query shows that the memory utilization rate is greater than 10% The node of , Next, we go to the alarm configuration console of cloud native monitoring to configure alarms
- Rule name : Name of alarm rule , No more than 40 Characters .
- PromQL: Alarm rule statement .
- The duration of the : The time when the conditions described in the above statement are met , Reaching this duration will trigger an alarm .
- Label: Add... To each rule Prometheus label .
- Alarm content : After the alarm is triggered, the specific contents of the alarm notification are sent through e-mail or SMS , The alarms configured here are as follows
{{$labels.cluster}} Of {{$labels.instance}} The memory of the node exceeds the alarm threshold 10% , The current memory usage is {{$value}} , Please pay attention to and deal with it in time !!!5. Enterprise wechat viewing alarm
[FIRING:3] NodeMemoryUsage (node.rules cls-b3mg1p92 tke node-exporter kube-system mem alert-7pjasfmm tke-node-exporter) node.rules test tke colony node Node memory alarm alert-7pjasfmm Alerts Firing: Labels: - alertname = NodeMemoryUsage - alertName = node.rules - cluster = cls-b3mg1p92 - cluster_type = tke - instance = 10.0.0.10 - job = node-exporter - namespace = kube-system - node = mem - notification = alert-7pjasfmm - pod = tke-node-exporter-xnfvb - service = tke-node-exporter Annotations: - alertName = node.rules - content = cls-b3mg1p92 Of 10.0.0.10 The memory of the node exceeds the alarm threshold 10% , The current memory usage is 52.578219305872885 , Please pay attention to and deal with it in time !!! - describe = test tke colony node Node memory alarm - notification = alert-7pjasfmm Source: /graph?g0.expr=100+-+%28node_memory_MemFree_bytes%7Bendpoint%21%3D%22target%22%7D+%2B+node_memory_Cached_bytes+%2B+node_memory_Buffers_bytes%29+%2F+node_memory_MemTotal_bytes+%2A+100+%3E+10&g0.tab=1 Labels: - alertname = NodeMemoryUsage - alertName = node.rules - cluster = cls-b3mg1p92 - cluster_type = tke - instance = 10.0.0.157 - job = node-exporter - namespace = kube-system - node = mem - notification = alert-7pjasfmm - pod = tke-node-exporter-vcnjl - service = tke-node-exporter Annotations: - alertName = node.rules - content = cls-b3mg1p92 Of 10.0.0.157 The memory of the node exceeds the alarm threshold 10% , The current memory usage is 34.298334259939 , Please pay attention to and deal with it in time !!! - describe = test tke colony node Node memory alarm - notification = alert-7pjasfmm Source: /graph?g0.expr=100+-+%28node_memory_MemFree_bytes%7Bendpoint%21%3D%22target%22%7D+%2B+node_memory_Cached_bytes+%2B+node_memory_Buffers_bytes%29+%2F+node_memory_MemTotal_bytes+%2A+100+%3E+10&g0.tab=1 Labels: - alertname = NodeMemoryUsage - alertName = node.rules - cluster = cls-b3mg1p92 - cluster_type = tke - instance = 10.0.0.3 - job = node-exporter - namespace = kube-system - node = mem - notification = alert-7pjasfmm - pod = tke-node-exporter-vpcmf - service = tke-node-exporter Annotations: - alertName = node.rules - content = cls-b3mg1p92 Of 10.0.0.3 The memory of the node exceeds the alarm threshold 10% , The current memory usage is 31.307402547932455 , Please pay attention to and deal with it in time !!!
We go to our enterprise wechat prometheus Check whether the alarm occurs , Check whether you can receive the alarm information , It shows that we have successfully passed the self built alertmanager When an alarm occurs, the enterprise wechat succeeds .
6. Email receiving alarm
Next, we change the threshold value of the alarm to memory utilization exceeding 50%, Clustered alertmanagerconfigmap Change to the configuration information of the mailbox , Use the email to accept the alarm and see what it looks like , By default, only one node should send an alarm , Let's test .
From the above promsql Look at the query results , Only 10.0.0.10 Memory usage exceeded 50%, So our email only received 10.0.0.10 The alarm email of this node , Explain through alertmanager Sending alarm to our mailbox has been successful .
边栏推荐
- Following the previous SYSTEMd pit
- Zblog determines whether a plug-in installs the enabled built-in function code
- 跟着Vam一起学习Typescript(第一期)
- 构建跨公链平台解决DApp开发问题
- Example description and case of ansible playbook automated cluster server management
- Contributed code to famous projects for the first time, a little nervous
- Cloud native monitoring practice (2) monitoring and collection of components outside the TKE cluster
- IBM: supporting AI and enterprise digital reshaping in the cloud era with modern architecture
- Introduction to koa (III) koa routing
- Advanced anti DDoS IP solutions and which applications are suitable for use
猜你喜欢
![[leetcode108] convert an ordered array into a binary search tree (medium order traversal)](/img/e1/0fac59a531040d74fd7531e2840eb5.jpg)
[leetcode108] convert an ordered array into a binary search tree (medium order traversal)

Why do you develop middleware when you are young? "You can choose your own way"

Daily algorithm & interview questions, 28 days of special training in large factories - the 15th day (string)

MySQL learning -- table structure of SQL test questions
随机推荐
How important is it to document the project? I was chosen by the top 100 up leaders and stood up again
liver failure! My friend made a programming navigation website!
Learn typescript with VAM (phase 1)
Quick view of product trends in February 2021
Clickhouse high performance column storage core principle
Several cloud products of Tencent cloud have passed IPv6 enabled cloud logo certification
Memory alignment in golang
问题有多大,中台就有多大
[tke] enable CPU static management strategy
[leetcode108] convert an ordered array into a binary search tree (medium order traversal)
Tencent released "warehouse express" and issued "ID card" for each commodity!
A comprehensive understanding of fiber to home FTTH and optical splitter
Future banks need to think about today's structure with tomorrow's thinking
The RTSP video image intelligent analysis platform easynvr cascades to the superior platform through the national standard for playback optimization
Audio knowledge (I)
With the solution, the nickname of the applet suddenly becomes "wechat user", and the avatar cannot be displayed?
Tensor and tensor network background and significance - basic knowledge
Kubernetes 1.20.5 setting up Sentinel
How does the easynvr/easygbs live video platform use Wireshark to capture and analyze data locally?
Tencent monthly security report helps rural revitalization, releases cloud security reports, and jointly builds a joint network security laboratory