当前位置:网站首页>Practice of federal learning in Tencent micro vision advertising
Practice of federal learning in Tencent micro vision advertising
2022-06-26 16:00:00 【Tencent big data official】
Sharer : Song Kai Doctor
The arranger : Lin Yizhen
Reading guide :
From the perspective of advertisers , Share the experience and thinking of federal learning practice .
First introduce the background of business and technology selection : Team projects for user growth and cost control , The way is advertising channel , The launch target is divided into pull new 、 Pull two kinds of .
- When pulling new , The characteristics of users in the side end of micro vision are sparse , And advertising platforms accumulate a lot of information , But only limited oCPX Standardized data return .
- When pulling alive , The micro vision side has valuable portrait data such as user behavior sequence , It is complementary to the characteristics of advertising platform , But they can't share data directly and rudely with advertising platforms .
therefore , It is hoped that the micro vision side and the advertising platform side can use the data of both sides , Achieve win-win results , But ensure the security of data without going out of the domain . In this context, our team chose “ Federal learning ”, It provides a solution for multi-party security cooperation .
The article focuses on the following five points :
- Federal learning
- Tencent federal learning platform PowerFL
- The overall business of micro vision advertising
- Advertising federal learning architecture
- Modeling practice and details
One 、 Federal learning
First , Introduction to federal learning (Federated Learning,FL) Leading knowledge of .
1. Federal learning background
Machine learning models are data-driven, But in reality, the data are isolated islands : Between companies 、 Even data cannot be shared between departments ; Direct sharing will violate users' privacy , It also damages the interests of the company .2016 year Google The article is written in input method NLP In the background , A local update model with Android mobile terminal is proposed , This article is generally regarded as the beginning of federal learning . immediately , China's Weizhong bank 、 Tencent and other companies have also done a lot of pioneering work .
The basic definition of federal learning is : In the process of machine learning , All participants can conduct joint modeling with the help of data from other parties . Each party does not need to have direct access to other party's data resources , That is, when the data is not out of the local area , Conduct data joint training safely , Build a shared machine learning model .
2. Two structures of Federated learning
- Centralized federal architecture : Early development includes Google、 Small Banks , Are all such architectures . By a trusted third party ( Central server ) Responsible for encryption policy 、 Model distribution 、 Gradient polymerization, etc .
- Decentralized federal Architecture : Sometimes the two sides cooperate , Can't find a trusted third party , All parties need to participate in peer-to-peer calculation . This architecture requires more encryption and decryption and parameter transmission operations , such as :n When the parties are involved , To be carried out 2n(n-1) Secondary transmission . It can be considered that the encryption and decryption algorithm actually plays the role of a third party .
3. Three categories of federal learning
- Horizontal federal learning : Union of samples , It is applicable to multiple overlapping features , The scene when users overlap less . such as : Two companies with similar businesses , There are many orthogonal users, but the portraits are similar , Horizontal federal learning , It's more like distributed machine learning with data deformation .
- Longitudinal federal learning : The combination of features , Apply to users with more overlap , Scene with less feature overlap . such as : Advertisers and advertising platforms , Hope to combine the characteristics of both sides for training .
- Federal transfer learning : When there is little overlap of characteristics and samples between participants , Consider using , But it's more difficult .
The information of three kinds of federal learning interaction is different , The troubles are also different ; such as : When learning horizontally , The data of all participants is heterogeneous , Therefore, the data are not independent and identically distributed , It is also a research hotspot of federal learning .
At present, vertical federal learning has been implemented in our business , Also exploring federal transfer learning 、 Horizontal and vertical combination .
4. Comparison between federated learning and distributed machine learning
Upper bound of precision : Federated learning is not like optimizing other specific sorting 、 Recall model , It's more like under data security restrictions , To drive the whole modeling . therefore , In theory, share data into distributed machine learning (Distributed Machine Learning,DML) As the upper limit .
Federal learning (FL) And distributed machine learning (DML) Compare
Although some people regard federated learning as a special case of distributed machine learning , But with the general DML comparison , Federal learning still has the following differences :
- There is a restriction that data is not shared ;
- various server Node pair worker Node control is weak ;
- High communication frequency and cost .
Two 、 Tencent federal learning platform Angel PowerFL
Starting with the development of federal learning , Tencent's participation is very high . Include : Make a release 《 Federal learning white paper 2.0》、《 Tencent security federation learning application service white paper 》 etc. ; Infrastructure , Based on Tencent's open source intelligent learning platform Angel(https://github.com/Angel-ML/angel), structure PowerFL, At present, it is internally open source ; In practice , In Finance 、 advertisement 、 Recommended scenarios , There are many attempts and landing .
1. Engineering features
Tencent federal learning platform PowerFL Besides being easy to deploy 、 Good compatibility and other basic requirements of machine learning platform , There are also the following five engineering features :
- Learning architecture : Use a decentralized federated Architecture , Don't rely on third parties ;
- encryption algorithm : Various common homomorphic encryption methods are implemented and improved 、 Symmetric and asymmetric encryption algorithms ;
- Distributed computing : be based on Spark on Angel Distributed machine learning framework ;
- Cross network communication : utilize Pulsar Optimize cross network communication , Enhance stability , Provide multi-party cross network transmission interface ;
- Trusted execution environment :TEE(SGX etc. ) Exploration and support .
2. Algorithm optimization
in addition , Many optimizations have also been made for the algorithm side :
- Ciphertext operation rewriting : be based on C++ GMP Rewrite ciphertext operation Library ;
- Data intersection optimization : Optimize... On both sides and multiple sides respectively , In particular, the multi-party side has been transformed theoretically ( The improved FNP agreement );
- GPU Support : The ciphertext operation part can be used GPU parallel ;
- Model extension support : Support flexible model expansion , You can use Tensorflow、Pytorch Development DNN Model embedding .
It is worth mentioning that , In addition to homomorphic encryption schemes ,PowerFL It also supports secret sharing and differential privacy ( Noise disturbance ) And other federal neural network privacy protection schemes .
3、 ... and 、 The overall business of micro vision advertising
An overall goal of our team is to iteratively optimize the intelligent delivery system , We made efforts from the following three points :
1. Increase the channel of getting customers
Including outer canal purchase 、 Internal soft diversion 、 Self growth ; among , The realization form of outer canal purchase can be subdivided into Marketing API Create advertisements in batches 、RTA Crowd orientation 、sDPA/mDPA Commodity bank 、RTB Real time bidding, etc .
2. Increase material form
In order to undertake Marketing API、RTA, Continuously optimize advertising creativity ; In order to undertake RTB、sDPA/mDPA, Optimize native advertising content ; In order to share with the growing / The recruits echo , Optimized subsidies 、 Red envelopes 、 Strategies or models such as coupons .
3. Growth technology
No matter what RTA、RTB, The core is to optimize the accurate matching between users and materials . We focus on the material 、 user 、 The interaction between the two continues to explore :
- In terms of materials : Including making 、 mining 、 understand 、 The quality control , such as : Selection of content prone to negative feedback 、 Recognition and enhancement of clarity 、 Automatic uploading, downloading and bidding of materials .#
- User side : The portrait side continues to build user portraits , Such as crowd expansion (lookalike)、 User label ; On the operation side uplift、LTV Model propulsion ; Experience side pursues pull bearing integration .
- In terms of flow : The core of advertising decision-making is the management of traffic and cost , Develop a series of strategies here ; At present, reinforcement learning has been tried , To solve the dilemma between traffic and cost .
Four 、 Advertising federal learning architecture
The following describes the role of federal learning in the framework of micro vision advertising : Yes RTA Circle selection of crowd package .
1. Overview of advertising system
First , The following figure shows a simple and universal advertising system : From user equipment ID Your ad request , Reach the advertising system ; Recall through advertising 、 Ad directed filtering RTA、 Rough layout of advertisements 、 Ad detail 、 The advertisement was issued , Finally, the advertising exposure .
2. RTA Advertising structure
then , We put one of them RTA Zoom in on the side frame .RTA The purpose of is to pre judge the user value , Perform crowd orientation 、 Auxiliary quality bid .
- RTA Advertising request origination , User equipment ID Arrive at the experimental platform ;
- Through the distribution strategy of the channel and ID Mapping distinguish , Use the strategy of revitalizing historical users to undertake 、 Non historical users use the pull new strategy to undertake ;
- What federal learning decides is RTA-DMP Side , Import in the form of crowd package DMP, Conduct crowd orientation and stratification .
3. Federated learning coarse-grained framework
here , Let's introduce the coarse-grained framework of Federated learning :
- The micro vision side provides users with ID、 portrait 、Label, The advertising platform side provides users ID、 portrait ;
- Safety sample alignment (Private Set Intersection,PSI) Get user intersection , Start federal learning collaborative training ;
- After model evaluation , The two sides cooperate to extract the full amount of user features and export , And score all users ;
- Finally return the result to RTA-DMP.
In the fifth part, we will disassemble .
5、 ... and 、 Modeling practice and details
1. Pilot operation
Compared to pulling alive , Laxin is more eager to use federal learning , Because the features in the end are more sparse , Many users have only user devices ID; therefore , Give priority to cutting in and pulling new , The pilot work includes :
1.1 Fit the target : Four task model
- Main task : Primary startup and secondary retention rate , namely T Rilaxin ,T+1 Day active open micro vision APP Proportion of retained users .
- Deputy task : Primary startup and secondary retention costs 、 Effective new costs 、 Effective newly added proportion ; among , User addition effectiveness has been modeled , According to behavior such as residence time , Give a probability score .
1.2 Micro vision unilateral data exploration and feature engineering
- Sample and sampling : Find out the sample size , Determine the sampling strategy .
- Features and models :ID Class characteristics 、 Behavioral sequence characteristics ; Use DNN Model .
- Develop offline metrics consistent with online performance : After exploration ,Group-AUC Is a good offline indicator ,Group That is, user layering .Group-AUC Positively correlated with online performance , And AUC More sensitive .
2. model training
Finish the preparations , The micro vision side began to conduct joint federal learning modeling with the advertising platform side .
2.1 Iterative process of Federated model training
(1) Data alignment : Determine a common set of samples for collaborative training {id}, There are two ways
- Plaintext : Fast , Exchange between billion and billion levels , Just a few minutes ~ Ten minutes , But this method is not safe , Because both parties only want to confirm the public assembly part , I don't want to reveal my complement ; Use a trusted environment (TEE), Ensure security in clear text .
- Ciphertext : Slow speed , More expensive 10 More than times the time , Because it involves a lot of encryption and decryption operations and collisions ; We currently use this strategy , With the help of Since the research PowerFL Platform implementation .
(2) Multi feature engineering
- Longitudinal federal learning : The features on both sides are independent , Divide and rule , such as : Standardization of features 、 completion .
- Horizontal federal learning : Acquisition of partial statistics , We need to obtain the total distribution of the whole feature , Still use federated learning communication to solve data synchronization .
(3) Joint training
- Determine the computing environment 、 Storage resources .
- Communications ( What physical quantity carries , Like gradients 、embedding).
(4) Offline assessment
(5) Online assessment
2.2 be based on DNN Federal model of (FL-DNN)
Micro vision side and advertising platform AMS Side to side training multitasking DNN Model , Multi task structure from sample strategy 、 Modify simple implementation methods such as loss function , Evolve to MMoE ; Engineering is based on Horovod parallel .
2.3 FL-DNN Iterative process of model parameters
(1) initialization :A(host,AMS Side )、B(guest, Micro vision side ) Initialize their respective networks respectively ( Write it down as and
) Parameters of
、
, Interaction layer parameters
, Record the learning rate as
, Record the noise as
、
、
;
(2) Forward propagation :( Indicates homomorphic encryption )
- A Side calculation : Calculation
; Encrypted to get
( That is to say A Side output embedding), Send it to B.
- B Side calculation : Do the same embedding Compute to generate
, Is symbolic symmetry , remember
; receive
, And calculate
, And then calculate
And send it to A.
- A Side receiving
, Decryption is
; Calculation
And send it to B
- B Side receiving
, subtract
, obtain
. In interactive networks
Next spread , obtain
, Calculate the loss function
.
(3) Back propagation
- B Side calculation : Loss function versus parameter
、
Derivation , Get the gradient
、
; Calculation
, And send it to A.
- A Side receiving
And decrypt ; Calculation
, encryption
, Send two quantities to B.
- B Side receiving
and
; Calculate the relative gradient of the loss function
, And will
Send to A.
- A Side receiving
And decrypt .
(4) Gradient update :A、B、I Update the gradient respectively , Complete a round of iterations :
This structure is similar to recall 、 The twin towers commonly used for rough row seem similar , But in fact, the design principles are not the same . The twin tower structure is often criticized embedding It's too late to interact , So there are many improved versions , such as MVKE Model ( tencent ), hold embedding The interaction time is early . In longitudinal federal learning ,A On the side It can be on the first floor , It didn't even change ( That is, only feature encryption ) Hand over to B Side , In this way, in principle, there is no problem of interaction timing .
2.4 FL-DNN Special case of model parameter iteration : Unilateral features
B(guest Side ) With no or too weak characteristics , Only user equipment can be provided ID、label, The iterative process of the above parameters degenerates to no The situation of , Readers can try to write down the parameter update process .
In the actual , Because of the amount of data 、 Feature coverage 、 Cross loss and other problems , To guarantee DNN Well trained , The following two cases are superimposed :
- B No features on the side :<id,label>+ <id, features>;
- B Side features :<id,label,features> + <id, features>.
3. Online services
All participants can only get their own model parameters , The prediction needs the cooperation of all parties :
(1) Send a request : User equipment ID, Touch... Respectively A、B;
(2)embedding Calculation
- A Side calculation
, encryption
;
- B Side calculation
;
(3)label Calculation
- A The side will
issue B Side ;
- B Side calculation label
;
- B Side decryption gets y.
4. Effect display
In connection with Tencent guangdiantong AMS In cooperation with , Relative micro vision training alone , Federal learning makes Group-AUC +0.025; The main objective is 3 All sub goals are positively correlated and improved . The main target is the primary startup and secondary retention rate ( After conversion of coverage ) promote +4.7PP. After the first edition went online, all indicators were significantly improved , Full volume has been released . The second version of the iteration also achieved GAUC Significant improvement , Is experimenting with small traffic .
The following figure shows the effective reduction of primary startup and secondary retention costs ( Orange ):
5. In iteration
5.1 Pull a new model
Promote federal collaboration with other channels , But the team is unable to maintain a federated model on every launch platform . Preliminary attempts will be made with AMS Model of platform joint training , Put it on other platforms and pull it new . But because the data is heterogeneous ( Sample distribution deviation ) Wait for a reason , This model is inferior to base Model ( One side of micro vision ) good ; In addition, there is a conflict of interest among the delivery platforms , We all hope that advertisers will focus on their own traffic , therefore , We are trying to combine horizontal and vertical : Micro vision and advertising platform are vertical , Advertising platforms are horizontal , It is expected to start from the tripartite Federation cooperation , Currently, the idea of Federated migration is being iterated .
5.2 Pull live model
And AMS Platform cooperation , After the federal model gets through , We want to reuse it on the pull live model . Because user activation is multi-objective 、 How interested 、 The case of a sequence of different behaviors , We focus on timeliness and model innovation , Based on MMoE-Mind-transformer Exploration of model .
5.3 Iteration is difficult
(1) Efficiency and stability
- Improve data alignment speed : To increase the speed of ciphertext intersection , Hash bucket division to achieve simple parallelism and speed up .
- Compress training time : Incremental training to do finetune; Similar results to full training , Half the time .
(2) Interpretability and debug difficult : Neither side of the Federation can see each other's raw data , Even sometimes both sides hide their neural network structures . This ensures the security of certificate data ; But from an iterative perspective , Problem location is more difficult .
(3) The difficulty of multi-party Federation modeling
- Joint modeling with multiple cooperative advertising platforms , There is a conflict of interest , And Google FedAvg The scene is different .
- Joint modeling with other business units , Such as WeChat 、 Search has strong and powerful features , But the other person has no motive .
- There are technologies / Network stability / Communication costs .
6、 ... and 、Q&A
Q1. TEE( Trusted execution environment ) Is it necessary in federal learning tasks ? What scenario will be based on TEE To complete the task ? The project currently introduced is based on TEE Calculated ?
A1. Not currently used TEE Environmental Science , If you use TEE You can directly operate in plaintext , No need for a lot of encryption operations ; because TEE In the environment, even if plaintext operation is guaranteed , Data is also secure and invisible to each other ; At present, no matter the data is handed over 、 model training ( gradient 、embedding) All ciphertext operations .
Q2. The first step in federated learning is data alignment , Do you need to maintain the mapping table ?
A2: No need to maintain the mapping table , Factor one billion users plus characteristics , The amount of mapping table data reaches hundreds G Level , In fact, it is a waste of resources ; The actual operation of sample alignment is in sequence , From the advertising platform side ID, It's in the agreed order from top to bottom , That is, there is no need to maintain kv The mapping relation of .
Q3. Serving( Online services ) when , Need to take each other ( Advertising platform ) Characteristics of , How about this delay ?
A3. Serving The delay is caused by communication , The advertising platform trains the model on the side of the advertising platform on its own machine , The micro vision side trains the model of the micro vision side on its own machine , The final interaction is also interaction embedding.
Q4. In all cases ,B Side (guest Side ) Provide label Are they all necessary ?
A4. B Side (guest Side , Micro vision side ), Because the data cannot be out of the field, it will not provide label Give each other , see “FL-DNN Iterative process of model parameters ” The formula in this chapter shows that , The gradient is at B Side calculation completed , The other party cannot know label.
Q5. After using federal learning Group-AUC increase +0.025, Not used before federal learning Group-AUC How much is the ?
A5. The value is not very direct , Sample definitions in different scenarios 、 The fitting target changes, that is, change ; Originally from 0.70 Level , mention 0.72-0.73 Level .
Q6. Tencent sent it some time ago MKVE The full name of the paper is ?
A6. 2021-tencent-Mixture of Virtual-Kernel Experts for Multi-Objective User Profile Modeling.
Q7. FL-DNN Modeling requires a third party , How to trust third parties ?
A7. In fact, according to the decentralized architecture, there is no need for a third party , It can be undertaken by a series of encryption and decryption algorithms .
Q8. If both sides are TEE Execution environment , Are all the data exchanged in the network in clear text ?
A8. Yes , In plain text .
Q9. Federal framework and RTA combination , It's an offline crowd package , Or online real-time estimation ?
A9. After exploration , The real-time importance of pulling the new side is not high , Is to import the offline crowd package into DMP, Give again RTA docking ; Pull the active side because you want to catch the user's interest change in a short time , There are real-time requirements , At present, we are studying .
边栏推荐
猜你喜欢
Svg animation around the earth JS special effects
Everyone is a scientist free gas experience Mint love crash
[CEPH] Lock Notes of cephfs
『C语言』题集 of ⑩
5000字解析:实战化场景下的容器安全攻防之道
NFT Platform Security Guide (2)
Stepn débutant et avancé
Svg savage animation code
feil_ The working directory on the left of uvission4 disappears
李飞飞团队将ViT用在机器人身上,规划推理最高提速512倍,还cue了何恺明的MAE...
随机推荐
selenium chrome 禁用js 禁用图片
零知识 QAP 问题的转化
Svg canvas canvas drag
Solana capacity expansion mechanism analysis (2): an extreme attempt to sacrifice availability for efficiency | catchervc research
STEPN 新手入門及進階
JS events
Simple use of tensor
C语言读取数据
Common properties of XOR and addition
Have you ever had a Kindle with a keyboard?
【leetcode】48. Rotate image
01 backpack DP
Transaction input data of Ethereum
canvas三个圆点闪烁动画
李飞飞团队将ViT用在机器人身上,规划推理最高提速512倍,还cue了何恺明的MAE...
NFT交易原理分析(1)
音视频学习(二)——帧率、码流和分辨率
How do I open an account on my mobile phone? Is online account opening safe?
8 自定义评估函数
手机上怎么开户?在线开户安全么?