当前位置:网站首页>Log collection and analysis platform
Log collection and analysis platform
2022-07-24 06:28:00 【Laughter addiction】
One . Overall architecture

Two 、 Load balancing
1、 The role of load balancing :
Used for flow diversion
2、 How to achieve high availability :
2.1 Hardware level :
network card :bonding, The physical layer is bound with two network cards , Logical level
disk : Adopt disk array ,raid
2.2 At the architecture level :
colony
Different live
3、DNS Load balancing :
It can be parsed into multiple ip Address , Generally speaking, it will be parsed into various ip. But if one of the servers hangs ,DNS Will not immediately put this ip Address removal , It will still be interpreted as hanging up ip, It may cause access failure . Although the client tried again , But it will still affect the customer experience
4、DNS Parsing steps :
1、 Find the cache in the browser
2、 Local hosts file --linux Next in (/etc/hosts) Inside
3、 Find a local domain name server --linux Next in (/etc/resolv.conf)
5、 The agent does load balancing
5.1、 Two ways of agency :
1. Forward agency : Proxy client (vpn)
2. Reverse proxy : proxy server
5.2、 Use nginx The reverse agent does load balancing
1. In the application web Add reverse proxy in front , The security will also be higher , The control of load balancing will also be much easier
2. Reverse proxy , Use keepalived double vip Mutual primary and secondary high availability , Improve resource utilization .
6、 Check the service command of startup and self startup
ls /etc/systemd/system/multi-user.target.wants
3、 ... and 、kafka 2.12
1、 Message middleware :
1.1、 Message middleware : It can also be called message queue , It refers to the efficient and reliable messaging mechanism for platform independent data exchange , And the integration of distributed system based on data communication .. By providing messaging and message queuing models , You can extend process communication in a distributed environment .
1.2、 Two communication modes of message oriented middleware
Point to point
Publish subscribe
1.3、 Common open source message oriented middleware :ActiveMQ、RabbitMQ、RocketMQ、Kafka、ZeroMQ etc. , Among them, the most widely used is RabbitMQ、RocketMQ、Kafka These three paragraphs ..
1.4、kafka The communication mode adopted is publish and subscribe
2、kafka purpose :
2.1 Log collection
2.2 Business decoupling
2.3 Traffic peak clipping
3、 use kafka The benefits of unified log collection
1. It is convenient to locate problems when faults occur
2. Centralized management of logs , Subsequent programs that need logs go directly to kafka Just get the log , Minimize log processing on nginx Influence
4、kafka Common nouns in it
4.1、broker:kafka The node of
4.2、topic: The theme
The topic is the classification of messages , such as nginx,mysql Logs give different topics , It's different types , Consumer subscribed to that topic , You can consume resources in the theme
4.3、partition: Partition
Increase throughput , Improve concurrency
4.4、replica: copy --》 Is a complete partition backup
4.5、broker Quantity if and replica Agreement , It can break n-1 Taiwan machine
4.6、leader and follower
Producers and any one broker All connections are OK ,broker A copy of the current request will be returned leader Information about , Finally, the producer will follow leader Interaction
5、kafka How to ensure high availability :
Multiple broker+ Multiple partition+ Multiple replica
6、ISR: List collection –》 Need to synchronize follower Set
1、 according to ISR Come on , If one follower Changed , Then delete it from this list
2、 If one follower Stuck or synchronization is too slow , It will also ISR In the delete
3、 If a machine goes down , You want to rejoin after subsequent startup ISR, Must follow sync to HW Talent ,
7、 How to ensure data consistency ?
1、 producer (producer) Can pass request.required.acks Set up
ack It can be for 0,1,-1
ack by 0, The producer does not need to accept the response , Send the next one after sending it
ack by 1,leader Once received, it will send a response to the producer , Will send the next
ack by -1,ISR Every copy in the list has been accepted , To respond to producers , Will send the next
2、 Consumer consumption data , Introduced High Water Mark Mechanism , Only consumption ISR The smallest offset in follower copy
8、filebeat( Lightweight log collector ):
Used to put nginx The logs collected inside are sent to kafka Acting as a producer
9、 consumer :
1、 The same consumer group , Consumers inside can only consume one at a time partition
2、 How do consumers know where they are spending , Where to start next time ?
When consumers consume , Will record their consumption offset , Consumption offset can be saved locally , It can also be stored in kafka Inside , stay kafka Create a new theme __consumer_offsets Used to record consumption offset
10、kafka Data storage in :
1. Folder :<topic_name>-< Zone number >
2. every last partition The data of is made up of many segment Stored , every last segment By a index and log The composition of the document ;
3. Separate multiple segment Easy data processing
kafka Data can be cleaned up according to two dimensions :
1、 By size
2、 By time
Any condition is satisfied , Can trigger log cleanup
Four 、zookeeper3.6.3
1、zookeeper:
Distributed application coordination management service : Configuration Management , Domain name management , Distributed data storage , Cluster management
We use this time to kafka Conduct management
2、 The election :
Consistency algorithm ,zab, The minority is subordinate to the majority , Those with more than half of the votes are elected leader
3、 Connect :
1、 The client connects to any zk All can be operated , However, transaction operations such as data addition and modification must be performed in leader Up operation , If the client is connected to follower Transaction operations are performed on ,follower Will return to leader Of ip, Finally, the client is still leader On the operation
2、 If you want to query , It can be connected directly follower Query operation
4、leader and follower The role of
follower: Inquiry and election
leader: Data writing 、 Modify and synchronize
5、 Data synchronization :
1、 As long as more than half of the nodes are synchronized , It means that the data has been written .
2、zookeeper Not strong consistency , It belongs to the final consistency
6、zookeeper colony
1、zk In the cluster , Node survival must be more than half , The cluster can be used normally
2、zk Number of nodes in the cluster , Generally speaking, it is set to an odd number
7、zk stay kafka The role of Li
1. preservation kafka Metadata ,topic,partiton, Copy information
2. The election kafka Inside controller
Choose by preemption controller, Elected kafka controller To manage kafka Copy inside leader and follower, Synchronization and election
5、 ... and 、 Project part
1、 Project name : Log collection and analysis platform
2、 Project environment :
centos7,kafka2.12,nginx,filebeat,zookeeper,python3.6,mysql
3、 Project description :
use nginx Cluster building web service , Collect user access nginx Logs generated by the cluster , adopt filebeat Unified deposit kafka platform , use zookeeper To manage kafka platform , Finally, for the collected nginx Log cleaning , And put it in the database , Achieve the effect of data persistence .
4、 Project steps :
1、 Design and plan the whole cluster architecture , Use nginx Load balancing , use keepalived High availability
2、 build nginx colony , Provide users with web service , Collect user access log information
3、 Set up well filebeat,filebeat As a producer, send the collected logs to kafka
4、 build zookeeper and kafka, use zookeeper To manage kafka
5、 use python programing language , To write consumer , Clean the log
6、 Finally, I will collect ip、 Bandwidth and other valuable information are stored in the database , Achieve persistent data
5、 Project experience :
Through the whole project, I am more familiar with keepalived、zookeeper、kafka And other open source component architecture and related technologies ; Use nginx Do reverse proxy , Load balancing , Compared to traditional dns Domain name resolution , Higher security , And easy to control , At the same time, it also improves the user experience . By building filebeat As a producer and a programmer as a consumer , Yes kafka and zookeeper Have a deeper understanding of ; My ability has been improved , Yes linux Have a further understanding of the architecture
边栏推荐
猜你喜欢
随机推荐
常用工作方法总结(7S、SWOT分析、PDCA循环、SMART原则、6W2H、时间管理、WBS、二八原则)
Use intranet penetration to realize public network access to the Intranet
Do not rent a server, build your own personal business website (how to buy a domain name)
IP notes (7)
异地远程连接在家里的群晖NAS【无公网IP,免费内网穿透】
leetcode剑指offer JZ23:链表中环的入口节点
IP job (1)
手动安装Apache
本地搭建WordPress个人博客,并内网穿透发布上线 (22)
简单三步快速实现内网穿透
Unity 3D frame rate statistics script
一批面试题及答案_20180403最新整理
Do not rent servers, build your own personal business website (1)
leetcode剑指offer jz5 替换空格字符串
List of problems in the re disk guidance of the project
IP笔记(10)
测试经理/测试组长/测试主管面试题
【301】怪诞行为学-可预测的非理性
Maximum value of jz47 gifts (dynamic planning ideas)
Mysql database - SQL summary (remember to pay attention to me! Come on in China!)








