当前位置：网站首页>On the difficulty of developing large im instant messaging system

On the difficulty of developing large im instant messaging system

2022-06-24 23:56:00 【wecloud1314】

This article starts with a simple system example , Slave stand-alone architecture 、 Master slave copy 、 Disaster recovery in the same city 、 Double living in the same city , Then go to another place to double live 、 Different live , from the shallower to the deeper 、 The technical principle and basic implementation idea of remote multi activity disaster recovery architecture of large-scale distributed system are explained step by step , Very suitable for beginners to learn .

In software development ,「 Different live 」 It is a peak of distributed system architecture design , Many people have often heard of it , But few people understand the principle .

What is living more in different places ？ Why do you need to live in a different place ？ What problem does it solve ？ How to solve it ？

These questions , It must be when each program sees the term "live more in different places..." , All want to understand .

I once had the honor of deeply participating in the design and implementation of remote multi live system of a medium-sized Internet company . So today , I'll talk to you about the implementation principle behind multi activity in different places .

Read the article carefully , I believe you will live more in different places , Have a deeper understanding .

What is system availability

If you want to understand different places, live more , We need to start with the principles of Architecture Design .

Today, , We develop a software system , The demand for it is higher and higher , If you know something about 「 Architecture design 」 The requirements of , I know that a good software architecture should follow the following 3 Principles .

They are ：

1） High performance ;

2） High availability ;

3） Easy to expand .

among ：

1） High performance ： It means that the system has more traffic processing capacity , Lower response latency （ for example 1 Seconds to process 10W Concurrent request , Interface response time 5 ms wait ）;

2） Easy to expand ： Indicates that the system is iterating over new functions , Can expand at the least cost , When the system encounters flow pressure , You can... Without changing the code , To expand the system .

and 「 High availability 」 The concept , It looks abstract , How to understand it ？

Usually use 2 To measure ：

1） Mean time between failures MTBF（Mean Time Between Failure）： Indicates the interval between two failures , That's the system 「 The normal operation 」 The average time , The longer it takes , It shows that the higher the stability of the system ;

2） Recovery time MTTR（Mean Time To Repair）： Indicates that after system failure 「 Time to recover 」, The smaller the value , The smaller the impact of the fault on the user .

The relationship between availability and both ：

Usability （Availability）= MTBF / (MTBF + MTTR) * 100%

The result of this formula is 「 The proportion 」, Usually we use them 「N individual 9」 To describe the availability of a system .

To achieve 4 individual 9 Availability of the above , The average daily failure time must be controlled within 10 Within seconds .

in other words , Only the time of failure 「 Shorter 」, The higher the availability of the whole system , Every promotion 1 individual 9, Will put forward higher requirements for the system .

We all know , System failure is inevitable , Especially the larger the system , The greater the probability of problems .

These faults are generally reflected in 3 In terms of ：

1） Hardware failure ：CPU、 Memory 、 disk 、 network card 、 Switch 、 Router ;

2） Software problems ： Code Bug、 Version of the iteration ;

3） The force majeure ： The earthquake 、 flood 、 fire 、 The war .

These risks can happen at any time . So in the face of failure , Can our system be 「 The fastest 」 The speed of recovery , It becomes the key to availability . Instant messaging development

But how to recover quickly ？

What this article is about 「 Different live 」 framework , To solve this problem , And the proposed efficient solution .

The rest of this article , I'll start with the simplest system , Take you step by step to evolve a support 「 Different live 」 System architecture .

In the process , You will see what usability problems a system will encounter , And why the architecture evolved like this , So as to understand the significance of remote multi live Architecture .

Stand alone architecture

Let's start with the simplest .

Suppose your business is in its infancy , The volume is very small , Then your structure is like this ：

The architecture model is very simple , The client requests to come in , Business applications read and write databases , Return results , Very easy to understand .

But it should be noted that , The database here is 「 stand-alone 」 The deployment of , So it has a fatal disadvantage ： In the event of an accident （ For example, the disk is damaged 、 Operating system exception 、 Delete data by mistake ）, That means all the data is 「 The loss of 」了 , The loss is huge .

How to avoid this problem ？ It's easy for us to think of a plan ： Backup .

You can back up the data , Put the database file 「 regular 」cp To another machine . such , Even if the original machine loses data , You can still backup the data 「 recovery 」 Come back , To ensure data security .

Although the implementation of this scheme is relatively simple , But there is 2 A question ：

1） Recovery takes time ： Business needs to be shut down first , And then recover the data , Downtime depends on the speed of recovery , Service during recovery 「 Unavailable 」;

2） Incomplete data ： Because it's a regular backup , The data is definitely not 「 newest 」 Of , Data integrity depends on the backup cycle .

Obviously ： The larger your database , It means that the longer the fault recovery time . As we mentioned earlier 「 High availability 」 standard , This plan may even 1 individual 9 Can't reach , Far from meeting our availability requirements .

What better plan is there , It can quickly restore business ？ It can also ensure data integrity as much as possible ？

Then you can use this scheme ： Master slave copy .

6、 Master slave replica architecture

For the problem of stand-alone architecture in the previous section , You can on another machine , Deploy another database instance , Make this new instance the of the original instance 「 copy 」, Keep both 「 Real time synchronization 」.

We usually call the original instance the master database （master）, The new instance is called slave Library （slave）.

The advantage of this scheme is that ：

1） High data integrity ： Master slave replica real-time synchronization , data 「 differences 」 Very small ;

2） Improved fault resistance ： Any exceptions in the main library , From the library at any time 「 Switch 」 Give priority to the library , Continued provision of services ;

3） Read performance improvement ： Business applications can directly read from the library , Share the main library 「 pressure 」 Reading pressure .

This is a good plan ： It not only greatly improves the availability of the database , It also improves the read performance of the system .

Same idea , Yours 「 Business applications 」 You can also deploy one on other machines , Avoid single point . Because business applications are usually 「 No state 」 Of （ It doesn't store data like a database ）, So you can deploy directly , It's simple .

原网站

版权声明
本文为[wecloud1314]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/175/202206241905414378.html

当前位置：网站首页>On the difficulty of developing large im instant messaging system

On the difficulty of developing large im instant messaging system

边栏推荐

猜你喜欢

随机推荐