当前位置:网站首页>On the difficulty of developing large im instant messaging system
On the difficulty of developing large im instant messaging system
2022-06-24 23:56:00 【wecloud1314】
This article starts with a simple system example , Slave stand-alone architecture 、 Master slave copy 、 Disaster recovery in the same city 、 Double living in the same city , Then go to another place to double live 、 Different live , from the shallower to the deeper 、 The technical principle and basic implementation idea of remote multi activity disaster recovery architecture of large-scale distributed system are explained step by step , Very suitable for beginners to learn .
In software development ,「 Different live 」 It is a peak of distributed system architecture design , Many people have often heard of it , But few people understand the principle .

What is living more in different places ? Why do you need to live in a different place ? What problem does it solve ? How to solve it ?
These questions , It must be when each program sees the term "live more in different places..." , All want to understand .
I once had the honor of deeply participating in the design and implementation of remote multi live system of a medium-sized Internet company . So today , I'll talk to you about the implementation principle behind multi activity in different places .
Read the article carefully , I believe you will live more in different places , Have a deeper understanding .
What is system availability
If you want to understand different places, live more , We need to start with the principles of Architecture Design .
Today, , We develop a software system , The demand for it is higher and higher , If you know something about 「 Architecture design 」 The requirements of , I know that a good software architecture should follow the following 3 Principles .
They are :
1) High performance ;
2) High availability ;
3) Easy to expand .
among :
1) High performance : It means that the system has more traffic processing capacity , Lower response latency ( for example 1 Seconds to process 10W Concurrent request , Interface response time 5 ms wait );
2) Easy to expand : Indicates that the system is iterating over new functions , Can expand at the least cost , When the system encounters flow pressure , You can... Without changing the code , To expand the system .
and 「 High availability 」 The concept , It looks abstract , How to understand it ?
Usually use 2 To measure :
1) Mean time between failures MTBF(Mean Time Between Failure): Indicates the interval between two failures , That's the system 「 The normal operation 」 The average time , The longer it takes , It shows that the higher the stability of the system ;
2) Recovery time MTTR(Mean Time To Repair): Indicates that after system failure 「 Time to recover 」, The smaller the value , The smaller the impact of the fault on the user .
The relationship between availability and both :
Usability (Availability)= MTBF / (MTBF + MTTR) * 100%
The result of this formula is 「 The proportion 」, Usually we use them 「N individual 9」 To describe the availability of a system .
To achieve 4 individual 9 Availability of the above , The average daily failure time must be controlled within 10 Within seconds .
in other words , Only the time of failure 「 Shorter 」, The higher the availability of the whole system , Every promotion 1 individual 9, Will put forward higher requirements for the system .
We all know , System failure is inevitable , Especially the larger the system , The greater the probability of problems .
These faults are generally reflected in 3 In terms of :
1) Hardware failure :CPU、 Memory 、 disk 、 network card 、 Switch 、 Router ;
2) Software problems : Code Bug、 Version of the iteration ;
3) The force majeure : The earthquake 、 flood 、 fire 、 The war .
These risks can happen at any time . So in the face of failure , Can our system be 「 The fastest 」 The speed of recovery , It becomes the key to availability . Instant messaging development

But how to recover quickly ?
What this article is about 「 Different live 」 framework , To solve this problem , And the proposed efficient solution .
The rest of this article , I'll start with the simplest system , Take you step by step to evolve a support 「 Different live 」 System architecture .
In the process , You will see what usability problems a system will encounter , And why the architecture evolved like this , So as to understand the significance of remote multi live Architecture .
Stand alone architecture
Let's start with the simplest .
Suppose your business is in its infancy , The volume is very small , Then your structure is like this :
The architecture model is very simple , The client requests to come in , Business applications read and write databases , Return results , Very easy to understand .
But it should be noted that , The database here is 「 stand-alone 」 The deployment of , So it has a fatal disadvantage : In the event of an accident ( For example, the disk is damaged 、 Operating system exception 、 Delete data by mistake ), That means all the data is 「 The loss of 」 了 , The loss is huge .
How to avoid this problem ? It's easy for us to think of a plan : Backup .
You can back up the data , Put the database file 「 regular 」cp To another machine . such , Even if the original machine loses data , You can still backup the data 「 recovery 」 Come back , To ensure data security .
Although the implementation of this scheme is relatively simple , But there is 2 A question :
1) Recovery takes time : Business needs to be shut down first , And then recover the data , Downtime depends on the speed of recovery , Service during recovery 「 Unavailable 」;
2) Incomplete data : Because it's a regular backup , The data is definitely not 「 newest 」 Of , Data integrity depends on the backup cycle .
Obviously : The larger your database , It means that the longer the fault recovery time . As we mentioned earlier 「 High availability 」 standard , This plan may even 1 individual 9 Can't reach , Far from meeting our availability requirements .
What better plan is there , It can quickly restore business ? It can also ensure data integrity as much as possible ?
Then you can use this scheme : Master slave copy .
6、 Master slave replica architecture
For the problem of stand-alone architecture in the previous section , You can on another machine , Deploy another database instance , Make this new instance the of the original instance 「 copy 」, Keep both 「 Real time synchronization 」.
We usually call the original instance the master database (master), The new instance is called slave Library (slave).
The advantage of this scheme is that :
1) High data integrity : Master slave replica real-time synchronization , data 「 differences 」 Very small ;
2) Improved fault resistance : Any exceptions in the main library , From the library at any time 「 Switch 」 Give priority to the library , Continued provision of services ;
3) Read performance improvement : Business applications can directly read from the library , Share the main library 「 pressure 」 Reading pressure .
This is a good plan : It not only greatly improves the availability of the database , It also improves the read performance of the system .
Same idea , Yours 「 Business applications 」 You can also deploy one on other machines , Avoid single point . Because business applications are usually 「 No state 」 Of ( It doesn't store data like a database ), So you can deploy directly , It's simple .
边栏推荐
- Solution of IP network broadcasting system in Middle School Campus - Design Guide for Campus Digital IP broadcasting system
- Tongji and Ali won the CVPR best student thesis, lifeifei won the Huang xutao award, and nearly 6000 people attended the offline conference
- Tremblement de terre réel ~ projet associé unicloud
- Understanding openstack network
- Analysis report on development trend and investment forecast of global and Chinese D-leucine industry from 2022 to 2028
- 部门新来的00后真是卷王,工作没两年,跳槽到我们公司起薪18K都快接近我了
- Global and Chinese tetrahydrofurfuryl butyrate industry operation pattern and future prospect report 2022 ~ 2028
- Global and Chinese 3-Chlorobenzaldehyde industry operation mode and future development trend report 2022 ~ 2028
- How does VR panorama make money? Based on the objective analysis of the market from two aspects
- Tiktok actual combat ~ sorting out the short video release process
猜你喜欢
随机推荐
【图数据库性能和场景测试利器LDBC SNB】系列一:数据生成器简介 & 应用于GES服务
Tiktok actual combat ~ sorting out the short video release process
Daily calculation (vowel case conversion)
为什么生命科学企业都在陆续上云?
Hello C (two) -- use of bit operation
Investment analysis and prospect forecast report of global and Chinese triglycine sulfate industry from 2022 to 2028
China CAE industry investment strategic planning and future development analysis report 2022 ~ 2028
Adding, deleting, querying and modifying MySQL tables
Approaching harvest moon:moonbeam DFI Carnival
我的为人处事真的有问题吗?
Annual salary of millions, 7 years of testing experience: stay at a fairly good track, accumulate slowly, wait for the wind to come
Collective例子
Uninstall hero League
怎么把wps表格里某一列有重复项的整行删掉
抖音實戰~項目關聯UniCloud
都2022年了,你还不了解什么是性能测试?
How to resolve the 35 year old crisis? Sharing of 20 years' technical experience of chief architect of Huawei cloud database
Analysis report on the development trend and Prospect of cetamide industry in the world and China from 2022 to 2028
JS listens for page or element scroll events, scrolling to the bottom or top
Hibernate学习2 - 懒加载(延迟加载)、动态SQL参数、缓存








