The following article is from iqiyi technical product team , Author iqiyi recommends Zhongtai
With the rise of the wave of deep learning ,embedding Rapid development of Technology .Embedding The enhancement of self expression ability makes it a feasible choice to directly use this technology to generate recommendation list . therefore , utilize embedding The similarity of vectors , take embedding As a recommendation system, the scheme of recall layer is gradually popularized .
In understanding embedding In addition to the commonly used algorithm models generated , For the implementation of recommendation system , It is also very important to understand its engineering practice , This paper will introduce the engineering practice of online vector recall service in iqiyi .
Background overview : The rise of deep learning
The architecture of recommendation system has been mentioned in many books and articles , One of the more classic process descriptions is shown in the figure 1 Shown . From the picture we can see , A recommendation service consists of several modules : Recommended pool 、 User portrait 、 Feature Engineering 、 Recall 、 Sort 、 Strategy, etc. . Throughout the recommendation process , Recall is the first link in the whole process , It delineates a subset from the overall content pool , From which the recommendation system spits out the best to the user . From this point of view , Recall candidates are good or bad , To a large extent, it determines the overall quality of recommendation , The importance of recall is self-evident .
chart 1. Technical architecture diagram of the recommended system
For recalls , There are many familiar methods , The comparison of common models is as follows .
Model | advantage | Insufficient |
---|---|---|
Collaborative filtering | Simple and direct 、 Widely applied | Poor generalization ability , The ability to handle sparse matrices is also poor , It is easy to make the head effect of the recommended results obvious |
Matrix decomposition | Compared with collaborative filtering , The generalization ability and the processing ability of sparse matrix are strengthened | In addition to user historical behavior data , It is difficult to take advantage of other users 、 Item features are enhanced and contextual features |
LR | Turn the recommendation question into something like CTR The dichotomous problem of prediction , Be able to integrate different characteristics of multiple types | The model does not have the ability of feature combination , Poor expression skills |
FM | comparison LR It has the ability of second-order feature intersection | Due to the limitation of combinatorial explosion problem , It is not easy to extend to the third-order feature intersection |
FFM | comparison FM Further strengthen the ability of feature intersection | High training cost |
be based on GBDT The combination model of | Make feature engineering model , It has the ability to combine higher-order features | The training time required for model updating is also long |
LS-PLM | The samples were sliced , Build within each slice LR Model , Make the model structure similar to three-layer neural network | Compared with deep learning, it is too simple |
We can see , The continuous change of models , All in order to strengthen feature combination and selection , And the enhancement of the overall generalization ability , With the rise of the wave of deep learning , These problems are further pushed forward , Deep learning can make the model have both memory ability and generalization ability .
️ notes : Memory ability can be understood as the ability of the model to directly learn and use the co-occurrence frequency of items or features in historical data ; Generalization ability can be understood as the relevance of model transfer features , And the ability to discover the relevance of rare features and final tags that are sparse or even never seen before .
The popularity of deep learning and embedding Technological development , bring embedding With the Strong ability to synthesize information 、 It can convert high-dimensional sparse feature vector into low-dimensional dense feature vector 、 It can reveal the characteristics of content and user similarity through vector operation .embedding It is also an extremely important eigenvector .embedding The enhancement of self expression ability makes direct use of embedding Generating a recommendation list is a viable option . therefore , utilize embedding The similarity of vectors , take embedding As a recommendation system, the scheme of recall layer is gradually popularized . The following figure 2 yes YouTube Recommend using embedding Recall of candidate items .
chart 2 YouTube Vector recall model structure diagram
about embedding There are many algorithm models , If yes embedding Of foundational significance Word2Vec, be based on Word2Vec Promoted in the field of recommendation Item2Vec, There are also broad Item2Vec( Such as DSSM The twin tower model ), There are also some examples of graph embedding technology that introduces more structural information graph embedding( Such as DeepWalk,Node2vec,EGES etc. ), There are many papers and articles about its corresponding implementation , Therefore, we will not talk too much about the algorithm model in this paper , Instead, focus on engineering, how to quickly deploy online vector recall Services , And it can make the recommendation engine efficiently access , Is the figure 2 The engineering implementation of .
Online vector recall service practice
1. Efficiency is the first productivity
In business development , As engineers know , Serving a single business and serving multiple businesses at the same time , The point to consider is that there are many different , The latter requires more abstraction and versatility , And this is also the problem to be solved by recommending Zhongtai , That's the only way , In order to improve the efficiency of service construction , Free up manpower for more business thinking , Instead of repeating the work .
In the development of a recommendation system , It often requires close cooperation between Algorithm Engineers and system R & D engineers , The responsibility of the former is based on the algorithm model available for business implementation , The latter's responsibilities , One is that collaborative algorithm engineers encapsulate the model into services , The two are to develop recommendation engine and other dependent sub services to connect the whole recommendation process in series , Recommended interfaces for downstream calls , Pictured 3 Shown .
chart 3 Recommend the general responsibilities of the system engineer
If we are based on graph 3, Let's look at the access process of vector recall service , You'll get a picture 4 Result . Algorithm Engineers will be based on business characteristics , Write the algorithm and output embedding, meanwhile , System R & D engineers and Algorithm Engineers will work together to build an online vector recall service , For the recommendation engine to call , The recommendation engine is responsible for the recommendation of all users online , Its requirements for performance are very high , Therefore, the performance of vector recall service is also extremely important . For different businesses , If the vector recall service is not universal and service-oriented , Then the algorithm and system development engineers of different businesses need to build repeatedly , And it needs to carry out service operation and maintenance by itself , This undoubtedly increases the burden of recommending development engineers , It is also a loss of development efficiency .
chart 4 Access process of vector recall service
For single business support , We just need to consider how to build a vector recall service , So as to be able to deal with a large number of embedding Data processing top N retrieval , At the same time, ensure the high performance of the service . For the recommendation of middle stage , In the case of serving multiple businesses , On this basis, we need to consider how to simplify the construction, operation and maintenance of this service 、 Self-help 、 Automation and platform .
2. Selection of high-dimensional vector retrieval engine
Approximate nearest neighbor algorithm (ANN) At present, it is to solve the problem of massive high-dimensional vectors TOPN Mainstream algorithms for similar content . and Facebook Realized C++ library faiss It is also often used lib, be based on faiss To implement vector recall service is an option . Besides , Development framework commonly used in the industry vearch and Milvus Database is also one of the optional services , chart 5 It is the comparison of service selection .
Milvus database It is designed for approximate nearest neighbor search of massive feature vectors , Compared with faiss Such a library of operators , It provides a complete vector data update 、 Index and query framework .Milvus The database also supports the use of GPU Speed up index and query , It can greatly improve the performance of single machine . at present Milvus It has also been approved by head machine vision company , Active atmosphere of technology community .
vearch It's a distributed vector search system , Can be used to calculate vector similarity , It can be applied to image recognition 、 Machine learning fields such as video recognition or natural language processing .vearch Is based on faiss Realization , It provides fast vector retrieval function . It provides similar elasticsearch Of restful api, It is convenient to manage and query data and table structure .
chart 5 Selection comparison
Self implementing services are basically similar to the functions of open source frameworks , And basically based on C++ Realization . about 0-1 Service building , And taking full advantage of the efficiency of the community , It is a good choice to develop under the existing service framework . After comprehensively considering the implementation of each service sdk Abundance 、 Performance indicators 、 Open source or not , Is the open source community active , Combined with the business recommendation engine java The proportion of language implementation is getting higher and higher , Final selection Milvus Framework to implement online vector recall service .
3. Specific implementation of vector recall service
When building an online vector recall service , We consider that algorithm students often adjust the algorithm implementation for different scenarios , therefore , Only through encapsulation and built-in some embedding Algorithm implementation , It will make algorithm students feel bound . therefore , We set embedding Model schema, And when the algorithm students create a service on the recommendation Center , Open a specified path for algorithm students to store embedding Model , thereafter , After setting some necessary parameters , Online vector recall service can be started on the recommendation center , And expose the standard interface for the recommendation engine to call . The specific design is shown in the figure 6 Shown , On the vector retrieval engine , It is recommended that China and Taiwan cooperate with iqiyi in-depth learning platform . be based on Milvus frame , We build the upper application , To reduce the construction cost of recommending business development students ; Support data version management , To improve the stability of the business side ; Support multi machine room deployment and service health detection mechanism , Improve the disaster recovery capability at the bottom .
chart 6 The overall architecture of online vector recall service
In use , We also found some problems . for example ,Milvus When building the index of the database CPU The utilization rate will be very high , As a result, the query service is basically unavailable , In response to this question , We also solved , When we update the data version , We'll start another service and update it separately , And when the update is complete , Will replace online services , The online data version will keep the latest 2 A version . Pictured 7 Shown .
chart 7 solve Milvus During index construction CPU The problem of high utilization rate
Considering that the business recommendation engine uses java The proportion of language implementation is getting higher and higher , and Milvus The query interface of the database is grpc Interface oriented , therefore , On top of it, we have dubbo Interface encapsulation , Easy java Service framework for access , This will greatly simplify the difficulty of service access . adopt dubbo Service discovery , It also realizes the non perception of the multi machine room deployment on the business side .
chart 8 Vector recall service dubbo Service encapsulation diagram
Under this service encapsulation and Implementation , When business students want to create online vector recall service, they can reach the threshold free minute level , And support online debugging . stay Milvus(2 nucleus 6G)& dubbo Query service (4 nucleus 12G)88 For example , Yes 600w 64 Query dimension data , When the query QPS stay 3k when ,p99 Delay in 20ms about . The access process of business students is also shown in the figure 9 Shown , No need to package and build vector recall service .
chart 9 The access process after vector recall service
Online reasoning and recall integration
In the above service realization , Achieve the generalization and abstraction of services , When in a business , Recommend system developers to access an online vector recall service , After general self-service operation on the platform , At the minute level, you can get a service and expose a common interface for the recommendation engine to call . But be careful and you'll find ( You can refer to Fig 9), During vector recall , Inquire about query The vector generation of is not included in the service , Still in the recommendation engine module . and query How to generate vectors , A relatively simple way is to use rules or simple algorithms to analyze the existing related information according to the business characteristics embedding Perform weighted generation , The other is to use embedding Generate models , Real time generation after providing input features . obviously , take query The generation of vectors is included in the service , Whether it's for liberating business students , Or for the improvement of the overall recommendation service construction efficiency , It's all more important .
After re evaluating the overall scheme and type selection , First of all, for 0-1 Fast implementation of , Choose a mature package ANN The service cost performance is quite high , This is also the final choice Milvus One of the reasons for framing . With the continuous promotion of service construction , Subsequent impact on performance 、 The optimization speed and fit of functions are considered more , Be similar to YouTube DNN This online vector recall , Hope to encapsulate reasoning and recall , If based on Milvus The database will involve the dependency of new functions and the matching with the community iteration plan , It is not necessarily consistent with the internal demands at this stage . In addition, in the previous practice , We found that Milvus There are still some problems , Of course Milvus The community is also quite active , The existing problems are also being solved step by step . But after a comprehensive assessment , We decided to combine iqiyi deep learning platform , be based on hnsw lib To transform . The reason for fast switching , The main reasons are :
(1) Iqiyi deep learning platform is right for hnsw lib It was packaged , And produced a set similar to Milvus Database services , And upgrade in rapid iteration ;
(2) be based on hnsw lib The encapsulated service is created when the index is created , The service is still available , Can solve Milvus During index creation ,cpu load High query unavailability problem , It can also simplify the overall design ;
(3) be based on hnsw lib In the implementation of similar YouTube DNN Online reasoning recall integration compared to milvus More convenient ( We should be users of wheels , Not someone who makes wheels again and again , Unless there are no wheels available );
(4)hnsw Of ann The algorithm performs best in effect , Pictured 10 Is in 2020-07-12 stay fashion-mnist-784-euclidean Test results on the dataset , The more to the right, the more up, the better ;
(5) In the figure 6 In the design process of , We are based on Milvus The database is used as the underlying image to build up the application , At the beginning of the design, it also retains the consideration of switching to other service frameworks in the future , therefore , Yes Milvus Switching is feasible .
chart 10 2020-07-12 stay fashion-mnist-784-euclidean Test results on the dataset
In order to include the online reasoning part of the vector , We need to take over the training part of the model , But consider the implementation , Algorithm Engineers need to reserve the right to modify the algorithm independently , Therefore, at the model service level, it will be supported to select the built-in model of Zhongtai or make it self implement the algorithm according to the specifications set by Zhongtai . After some adjustment on the original overall structure , The design of online reasoning and recall integration is shown in the figure 11 Shown .
chart 11 Overall design of online vector reasoning recall integrated service
After adjusting the overall scheme , The access process of vector recall service is shown in the figure 12 Shown , here , The recommendation engine doesn't need to care query embedding The generation and implementation of vectors , It only needs to provide the required features .
chart 12 On-line embedding The access process after the integration of reasoning and recall
Summary and prospect
This paper has roughly introduced the engineering practice process of iqiyi recommendation on online vector recall service . For the recommended middle stage , Generalize Services 、 Platform abstraction , Improve the service construction efficiency of the business side , It is also to improve the speed of business iteration , The key link to assist the business side to improve the recommendation effect .
In the process of engineering implementation of online vector recall service , There are some concerns :
(1) Selection of underlying search engine , We think Milvus Database and vearch Each has its own advantages , Such as milvus stay SDK More abundant in , It has better advantages when docking with back-end services in different languages , Its active community also makes some of its problems solved more quickly . and vearch stay embedding When the amount of data is huge ( Such as 6 Billion bars 128 Dimensional vector data ) Would be more applicable , But it will also take more time .
(2) Service performance and stability are top priorities , This is one of the keys to the good operation of online recommendation engine service and the improvement of recommendation effect .
(3) take query embedding The reasoning process and top N Unified recall process , It can maximize the efficiency of business access .
(4) The model will update , Corresponding embedding The data will also be updated , Therefore, the service needs to consider the management of data version , This is also one of the business demands .
(5) Consider real-time embedding Storage of content
Of course , We can continue to optimize the vector recall service , for example :
(1)embedding There are many generation algorithms , You can consider continuously building in the generation model required by the business ;
(2) In the case of different recall rates , Can continue to high QPS Optimize the response performance under .