当前位置:网站首页>Supplementary course on basic knowledge of IM development (II): how to design a server-side storage architecture for a large number of image files?

Supplementary course on basic knowledge of IM development (II): how to design a server-side storage architecture for a large number of image files?

2022-06-26 05:16:00 JackJiang-

Friendship tips : The text content is sorted out from the technical sharing of architect Ding Lang , Some of the views can be used to attract jade , May not be a best practice , Comments are welcome .

1、 Preface

A perfect IM The system is usually full of a lot of picture content , Include : The avatars 、 Picture message 、 Photo album 、 Picture expression, etc , So how to store these pictures when designing the server architecture ?

This article shares a typical Web The evolution process of storage and addition of a large number of pictures on the server side in the application , But the basic technical principles and architectural ideas are important for IM The same applies to systems , So you can read according to yourself IM The actual architecture of , Just absorb what is right for you . Some of the views in this article can be used to attract jade , May not be a best practice , Do not superstitious .

actually : Old-fashioned PC End IM in , Business forms such as picture messages , It may be pushed directly through a long connection ( The so-called real-time picture transmission ), Theoretically, this situation does not need to be stored on the server . But today's mainstream mobile terminals IM, Mobile network jitter is large 、 Unstable features and the reality of social sharing anytime, anywhere , The technology of real-time transmission is rarely used . Current mainstream IM All of them are of the kind described in this article : adopt Http Short connection from cloud ( That is, the server )“ Pull ”, The advantage of this way is : Share... Anytime, anywhere 、 Low requirements for network stability ( As long as the uploader uploads once , The server can store for a long time , The next reader passes URL Read and take on demand , When sharing again, just share URL Without having to transfer the entire picture completely again ).

And so on :IM In the system , In fact, there are other small file storage requirements similar to pictures , such as : In voice message AMR Short audio file ( There are some IM What may be used for sound quality in AAC Audio format , For example, Yixin )、 Small video files in the short video function , The storage and use of these files are basically similar to picture files , So it's universal , If these small files can be stored into the image storage architecture , For the overall system architecture ( Especially the storage part ) It is more general . So this article takes image storage as the starting point , But you can actually apply it to the storage of other small files .

Exchange of learning :

- Instant messaging development communication group :320837163[ recommend ]

- Mobile IM Introduction to development :《 Beginner level one is enough : Develop mobile from scratch IM

( This article is published synchronously in :http://www.52im.net/thread-1356-1-1.html

2、 Related articles

▼  Follow IM Data storage architecture , There are the following articles , Maybe it's useful for you :

Tencent original sharing ( One ): How to improve the mobile phone under the mobile network QQ Picture transfer speed and success rate of

The background system storage architecture behind the massive wechat users ( video +PPT) [ Download the attachment ]

Design and practice of hot and cold hierarchical architecture of massive data based on time sequence in wechat background

modern IM Discussion on synchronization and storage of chat messages in the system

▼ IM The development of dry goods series is suitable for IM Develop hot issues reference materials ( This article is the first 11 piece ):

IM The implementation of message delivery guarantee mechanism ( One ): Ensure the reliable delivery of online real-time messages

IM The implementation of message delivery guarantee mechanism ( Two ): Ensure the reliable delivery of offline messages

How to ensure IM Real time message “ Timing ” And “ Uniformity ”?

IM Online status synchronization in single chat and group chat should use “ PUSH ” still “ PULL ”?

IM Group chat is so complicated , How to make sure you don't lose too much ?

A kind of Android End IM The design and implementation of intelligent heartbeat algorithm ( With sample code )

Mobile IM How to pull data to save traffic when logging in ?

Easy to understand : Cluster based mobile terminal IM Access layer load balancing scheme sharing

Talking about mobile terminal IM The principle of multi landing and message roaming

IM Make up lessons for developing basic knowledge ( One ): Understand the preposition correctly HTTP SSO The principle of single sign on Interface

IM Make up lessons for developing basic knowledge ( Two ): How to design a server storage architecture for a large number of image files ?》( this paper )

If you are IM Development beginners , It is strongly recommended to read 《 Beginner level one is enough : Develop mobile from scratch IM》.

3、 Image server architecture in the stand-alone era ( Centralized )

In the start-up period, due to time constraints , The level of developers is very limited .

So it's usually directly in website The directory where the file is located , establish 1 individual upload subdirectories , It is used to save image files uploaded by users :

[1] If subdivided by business , Can be in upload Create different subdirectories under the directory to distinguish , for example :upload\QA,upload\Face etc.

[2] What is saved in the database table is also “upload/qa/test.jpg” Such relative paths ;

[3] The user's access method is as follows :http://www.yourdomain.com/upload/qa/test.jpg

Program uploading and writing methods :

The programmer A By means of web.config Configure the physical directory in D:\Web\yourdomain\upload   And then through stream Write to a file ;

The programmer B adopt Server.MapPath Methods such as , Get the physical directory according to the relative path    And then through stream Write to a file .

The result is :

advantage : It's the easiest thing to do , Without any complicated technology , The file uploaded by the user can be successfully written to the specified directory . It is also very convenient to save database records and access them ;

shortcoming : The upload method is chaotic , It is not conducive to the expansion of the website .

For the original architecture mentioned above , It mainly faces the following problems :

With upload There are more and more files in the directory , If there is insufficient capacity in the partition , It is difficult to expand . Only replace storage devices with larger capacity after shutdown , Then import the old data ;

Deploying a new version ( Before deploying the new version, you need to back up ) And daily backup website When you file , Need to operate at the same time upload Files in directory , If you consider the increase in traffic , The rear deployment consists of multiple Web A load balancing cluster of servers , It will be difficult for cluster nodes to synchronize files in real time .

4、 Image server architecture in the cluster era ( Real time synchronization )

A traditional Web Under the server site , Create a new one called upload The virtual directory of , Due to the flexibility of the virtual directory , It can replace the physical directory to some extent , And compatible with the original image upload and access methods .

The way users access is still :

http://www.yourdomain.com/upload/qa/test.jpg

advantage : More flexible configuration , It is also compatible with the upload and access methods of the old version . Because the virtual directory , You can point to any directory under any local drive letter . thus , You can also access external storage , To expand the capacity of a single machine .

shortcoming : Deployed by multiple Web A cluster of servers , each Web The server ( Cluster nodes ) Between ( Virtual directory ) Need to synchronize files in real time , Due to the limitation of synchronization efficiency and real-time , It is difficult to ensure that the files on each node are completely consistent at a certain time .

The basic architecture is shown in the figure below :

As can be seen from the figure above , Whole Web The server architecture already has “ Scalable 、 High availability ” 了 , The main problems and bottlenecks are focused on file synchronization between multiple servers .

The above architecture can only be used on these machines Web On the server “ The incremental synchronization ”, thus , It does not support the file “ Delete 、 to update ” The operation is synchronized .

The early idea was , Control at the application level , When the user requests in web1 While the server uploads and writes , Also call others synchronously web Upload interface on the server , It's obviously not worth it . So we choose to use Rsync Class software to do timing file synchronization , So as to save “ Repeat the wheel ” Cost of , It also reduces the risk .

Synchronous operation , Generally, there are two classical models , Push-pull model : So-called “ PULL ”, This means polling for updates , The so-called push , It is the initiative after the change “ PUSH ” To other machines . Of course , You can also use an advanced event notification mechanism to complete such actions .

In the scenario of high concurrency writing , Synchronization will lead to efficiency and real-time problems , And a large number of file synchronization also consumes system and bandwidth resources ( Cross network segment is more obvious ).  

5、 The improvement of image server architecture in the cluster era ( Shared memory )

Follow the way of virtual directory , adopt UNC( Network path ) To realize shared storage ( take upload The virtual directory points to UNC).

How users access 1:

http://www.yourdomain.com/upload/qa/test.jpg

How users access 2( Independent domain names can be configured ):

http://img.yourdomain.com/upload/qa/test.jpg

Support UNC Where server Configure an independent domain name on to point to , And configure lightweight web The server , To implement an independent image server .

advantage :  adopt UNC( Network path ) To perform read and write operations , It can avoid problems related to synchronization among multiple servers . Relatively flexible , Capacity expansion is also supported / Expand . Support the configuration of an independent image server and domain name access , It is also fully compatible with the old version of access rules .   

shortcoming : however UNC Configuration is a little cumbersome , And it will cause certain ( Reading, writing and security ) Performance loss . There may be “ A single point of failure ”. If the storage level does not raid Or more advanced disaster recovery measures , It will also cause data loss .

The basic architecture is shown in the figure below :

In the early days many were based on Linux Open source architecture website , If you don't want to sync pictures , May use NFS To achieve . The fact proved that ,NFS In terms of high concurrent read / write and mass storage , There are some problems in efficiency , Not the best choice , So most Internet companies will not use NFS To implement such applications . Of course , It can also be done through Windows Self contained DFS To achieve , The disadvantage is that “ Configure a complex , Efficiency unknown , And lack of information on a large number of actual cases ”. in addition , Some companies also adopt FTP or Samba To achieve .

Several architectures mentioned above , Upload on / When downloading , It's all gone through Web The server ( Although this architecture of shared storage , You can also configure a separate domain name and site to provide image access , But uploading and writing still have to go through Web The application on the server ), This is right Web There is no doubt that the server is causing great pressure . therefore , It is recommended to use an independent image server and an independent domain name , To provide upload and access to user images .

6、 Independent image server / The benefits of an independent domain name

Image access consumes server resources ( Because it will involve the context switching and disk of the operating system I/O operation ). After separation ,Web/App The server can focus more on the ability of dynamic processing .

Independent storage , More convenient for expansion 、 Disaster recovery and data migration ;

browser ( Under the same domain name ) Concurrency policy limits , Performance loss ;

When visiting pictures , The request message always contains cookie Information , It will also cause performance loss ;

Convenient for load balancing of image access request , It is convenient to apply various caching strategies (HTTP Header、Proxy Cache etc. ), It's also easier to move to CDN;

......

We can use Lighttpd perhaps Nginx And so on web Server to build an independent image server .

7、 Our current picture server architecture

The current picture server architecture uses a distributed file system +CDN.

Before building the current picture server architecture , You can put aside web The server , Directly configure a separate image server / domain name .

But it faces the following problems :

What about old picture data ? Can I continue to be compatible with the old picture path access rules ?

An independent image server needs to provide a separate interface for uploading and writing ( service API External release ), How to ensure the safety ?

Empathy , If there are multiple independent image servers , Is to use a scalable shared storage scheme , Or real-time synchronization mechanism ?

Up to the application level ( Non system level ) DFS( for example FastDFS HDFS MogileFs MooseFS、TFS) The popularity of , Simplify the problem : Perform redundant backups 、 Support automatic synchronization 、 Supports linear expansion 、 Clients that support mainstream languages api Upload / download / Delete and other operations , Some support file indexes , Some support provides Web The way to access .

Taking into account the DFS Characteristics , client API Language support ( Need to support C#), Documents and cases , And community support , We finally chose FastDFS To deploy .

The only problem is : May not be compatible with older versions of access rules . If you import the old picture once FastDFS, However, the old image access paths are distributed and stored in various tables of different business databases , It is also very difficult to update as a whole , So it must be compatible with the old version of the access rules . Upgrading an architecture is often more difficult than building a new architecture , It's because it has to be compatible with previous versions .( It's much harder to change engines in the air than to build a plane )

The solution is as follows :

First , Close the old version upload portal ( Avoid data inconsistency caused by continued use ). Pass the old picture data through rsync The tool is migrated to an independent image server at one time ( As described in the figure below Old Image Server). At the front ( Seven tier agent , Such as Haproxy、Nginx) use ACL( Access rule control ), Match the old picture to URL Requests for rules ( Regular ) Match to , The request is then forwarded directly to the specified web Server list , Configure the server in the list to provide pictures ( With Web The way ) Visit your site , And add cache policy . In this way, the old image server can be separated and cached , It is compatible with the old picture access rules and improves the old picture access efficiency , It also avoids the problems caused by real-time synchronization .

The overall structure is as shown in the figure :

8、 Use the 3 Fang CDN The plan

be based on FastDFS Independent image server cluster architecture , Although it has been very mature , But because of the domestic “ North South interconnection ” and IDC Bandwidth cost and other issues ( Pictures are very traffic consuming ), We finally chose the commercial CDN technology , It's also very easy to implement , The principle is very simple , I will just make a brief introduction here .

take img domain name cname To CDN On the domain name specified by the manufacturer , When a user requests access to a picture , By CDN Manufacturers provide intelligence DNS analysis , The most recent ( Of course, there may be other more complex strategies , For example, load conditions 、 Health status, etc ) The service node address is returned to the user , User requests arrive at the specified server node , On this node, similar Squid/Vanish Proxy caching service for , If the path is requested for the first time , Then the image resources will be obtained from the source station and returned to the client browser , If it exists in the cache , Get it directly from the cache and return it to the client browser , Complete the request / The response process .

Due to the use of commercial CDN service , So we didn't consider using Squid/Vanish Build the pre proxy cache from the row .

The entire cluster architecture above , It is very convenient to do horizontal expansion , It can meet the image service needs of large websites in the general vertical field ( Of course , image taobao The possibility of such a large scale is another matter ). After testing , Single station providing picture access Nginx The server ( To the strong E5 Tetranuclear CPU、16G Memory 、SSD), For small static pages ( After compression, it's about 10kb Left and right ) Can carry thousands of concurrent and no pressure . Of course , Because the size of the image itself is much larger than the static page of plain text , The anti concurrency capability of the server that provides image access , It is often limited by the I/O Processing power and IDC Bandwidth provided .Nginx The anti concurrency ability of is still very strong , And the resource occupation is very low , Especially when dealing with static resources , It seems that there is no need to worry too much . According to the actual traffic demand , Through adjustment Nginx Parameters of , Yes Linux Kernel tuning , Adding hierarchical cache strategy and other means can optimize to a greater extent , You can also expand by adding servers or upgrading server configurations , The most direct way is to purchase more advanced storage devices and greater bandwidth , In order to meet the needs of greater traffic .

It is worth mentioning that , stay “ Cloud computing ” The popular moment , Also recommend websites during the period of rapid development , Use “ Cloud storage ” Such a plan , It can help you solve all kinds of storage problems 、 Expand 、 The problem of disaster preparedness , And can do well CDN Speed up . most important of all , The price is not expensive .

summary , About picture server architecture extensions , These issues are generally discussed :

Capacity planning and scaling issues ;

Data synchronization 、 Redundancy and disaster tolerance ;

Cost and reliability of hardware equipment ( It is an ordinary mechanical hard disk , still SSD, Or more high-end storage devices and solutions );

File system selection . According to the file characteristics ( For example, file size 、 The proportion of reading and writing ) Select Yes ext3/4 perhaps NFS/GFS/TFS These are open source ( Distributed ) file system ;

Accelerated access to images . Use commercial CDN Or self built proxy cache 、web Static cache architecture ;

Compatibility of old picture paths and access rules , Application level scalability , Upload and access performance and security, etc .

appendix : more IM Development article

[1] of IM Architecture design :

Talking about IM Architecture design of the system

Brief introduction of mobile terminal IM The pits developed : Architecture design 、 Communication protocol and client

A set of mobile terminals for massive online users IM Architecture design practice sharing ( Including detailed pictures and texts )

A set of original distributed instant messaging (IM) System theoretical framework scheme

From zero to excellence : The evolution process of the technical framework of JD customer service instant messaging system

Mushroom Street im /IM Architecture selection of server development

tencent QQ1.4 Technology challenges and architecture evolution of 100 million online users PPT

Design and practice of hot and cold hierarchical architecture of massive data based on time sequence in wechat background

Wechat technical director talks about architecture : The way of WeChat —— The greatest truths are the simplest ( Speech full text )

How to interpret 《 Wechat technical director talks about architecture : The way of WeChat —— The greatest truths are the simplest 》

Fast fission : Witness the powerful background architecture of wechat 0 To 1 The evolution of ( One )

17 Year's practice : Technical methodology of Tencent's massive products

Mobile IM How to ensure the efficiency of pushing medium and large scale group messages 、 The real time ?

modern IM Discussion on synchronization and storage of chat messages in the system

IM Make up lessons for developing basic knowledge ( Two ): How to design a server storage architecture for a large number of image files ?

>>  More articles of the same kind ……

[2] of IM Safe articles :

Instant messaging security ( One ): Understand and use... Correctly Android End encryption algorithm

Instant messaging security ( Two ): The combination encryption algorithm is discussed in IM Application in

Instant messaging security ( 3、 ... and ): Commonly used encryption and decryption algorithm and communication security

Instant messaging security ( Four ): The example analysis Android The risk of hard key coding in

Instant messaging security ( 5、 ... and ): Symmetric encryption technology in Android Application Practice on the platform

Instant messaging security ( 6、 ... and ): The principle and application of asymmetric encryption technology

Transport layer security protocol SSL/TLS Of Java Platform implementation and Demo demonstration

Combining theory with practice : A typical set of IM Communication protocol design details ( Including security layer design )

Wechat new generation communication security solution : be based on TLS1.3 Of MMTLS Detailed explanation

From Ali OpenIM: Create a safe and reliable instant messaging service technology practice sharing

End to end encryption in real-time audio and video chat (E2EE) How it works

The sharp weapon of mobile secure communication —— End to end encryption (E2EE) Technical details

Web End instant messaging security : Cross site WebSocket Details of hijacking loopholes ( With sample code )

Easy to understand : Master the message transmission security principle of instant messaging

>>  More articles of the same kind ……

[3] IM Develop comprehensive articles :

IM Make up lessons for developing basic knowledge : Understand the preposition correctly HTTP SSO The principle of single sign on Interface

Mobile IM How to ensure the efficiency of pushing medium and large scale group messages 、 The real time ?

Mobile IM The technical problems that development needs to face

Development IM Is it better to use byte stream or character stream to design your own protocol ?

Do you know the mainstream way of voice message chat ?

IM The implementation of message delivery guarantee mechanism ( One ): Ensure the reliable delivery of online real-time messages

IM The implementation of message delivery guarantee mechanism ( Two ): Ensure the reliable delivery of offline messages

How to ensure IM Real time message “ Timing ” And “ Uniformity ”?

A low cost guarantee IM Discussion on the method of message timing

IM Online status synchronization in single chat and group chat should use “ PUSH ” still “ PULL ”?

IM Group chat is so complicated , How to make sure you don't lose too much ?

Talk about mobile IM Optimization of login request in development

Mobile IM How to pull data to save traffic when logging in ?

Talking about mobile terminal IM The principle of multi landing and message roaming

It's completely self-developed IM How to design “ Failure to retry ” Mechanism ?

Easy to understand : Cluster based mobile terminal IM Access layer load balancing scheme sharing

Technical test and analysis of the influence of wechat on the network ( Paper full text )

The principle of instant messaging system 、 Technology and Application ( Technical papers )

Open source IM engineering “ Mushroom street TeamTalk” The status quo of : An open source show with no end

QQ Music team sharing :Android Detailed explanation of image compression technology in ( Part 1 )

QQ Music team sharing :Android Detailed explanation of image compression technology in ( The next part )

Tencent original sharing ( One ): How to improve the mobile phone under the mobile network QQ Picture transfer speed and success rate of

Tencent original sharing ( Two ): How to greatly compress mobile network APP Flow consumption of ( Part 1 )

Tencent original sharing ( Two ): How to greatly compress mobile network APP Flow consumption of ( The next part )

As promised : Wechat's own mobile terminal IM Network layer cross platform component library Mars Officially open source

Social network based Yelp How to achieve lossless compression of massive user images ?

>>  More articles of the same kind ……

( This article is published synchronously in :http://www.52im.net/thread-1356-1-1.html

原网站

版权声明
本文为[JackJiang-]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202180507138523.html