【 title 】3D Equivariant Molecular Graph Pretraining

【 The author team 】Rui Jiao, Jiaqi Han, Wenbing Huang, Yu Rong, Yang Liu

【 Time of publication 】2021/07/20

【 machine structure 】 tsinghua 、AIR、 tencent

【 Thesis link 】https://arxiv.org/pdf/2207.08824v2.pdf

Pre training of unlabeled molecular characterization models is the basis of various applications . Traditional methods mainly deal with two-dimensional molecular graphs , And only focus on two-dimensional tasks , This makes their pre training model unable to describe three-dimensional geometric features , Therefore, there are defects in the downstream 3D tasks . This paper solves the problem of three-dimensional molecular pre training in a complete and novel sense . The author first proposes to use a model based on equivariant energy as the backbone of pre training , It has the advantage of satisfying the symmetry of three-dimensional space . then , This paper develops a node level pre training loss for force field prediction , This paper further uses Riemann - Gaussian distribution to ensure that the loss is E(3) Invariant , So that it has stronger robustness . Besides , This paper also uses the graph level noise prediction task to further promote the final performance . In this paper, two challenging 3D benchmarks are evaluated from large-scale 3D data sets GEOM-QM9 Pre trained model . The experimental results show that this method has better effect than the most advanced pre training method , The effectiveness of the author's design of each proposed component is verified .

The picture above shows 3D-MGP(3D Molecular Graph Pretraining) Overview , It is divided into node level isovariant force field prediction task and graph level invariant noise prediction task .

Isovariant force field prediction task : Add noise to the coordinates of each node , Then estimate the virtual force field that pulls the noise coordinate back to the clean coordinate . In this paper, according to certain conditional distribution , from X Take a noise sample from X~, Then the noise samples are replaced with equivariant GNN Model φEGN in , And substitute it into the force field prediction loss function obtained according to the gradient of the energy model .

Invariant noise prediction task : Identify how much noise is added to its input . Here, it shares the same with node tasks EGN skeleton , Generate exactly the same invariant nodes and graph level embedding . For the input X and X~, First, through φEGN Get their graph level embedding respectively u and u~, Scalar projection heads are not used here φProj Calculate the energy , Instead, use the classification header φScale, It takes the original conformation u And disturbed conformation u~ Graph level embedded connections as input , Last , stay logits And calculate the cross entropy loss between labels , The tag is the current input sampling noise level .

The above figure shows the energy landscape of different pre training models , It can be found that the energy landscape of this method converges smoothly to the original conformation , This means that the observed conformation corresponds to a transferable state with a local minimum energy on the projected conformation plane . However , Based on two-dimensional pre training model , Such as EdgePred、AttrPred and GraphCL Provides a rugged landscape , It may be because the knowledge they gained from the pre training process does not conform to the basic energy distribution .base Method no pre training , The output is a flat surface , Because it only learns less knowledge from small data .

The above table shows all pre training methods in downstream tasks MD17 The result on , The underlined numbers indicate the previous SOTA, The number in bold indicates the best result at present , This article summarizes 3 spot :

Q1. 3D-MGP How is the overall performance ?

A1: 3D-MGP The best performance has been achieved in most cases , Check the average in the last column of the table MAE, Can better prove its universal effectiveness . Especially in the prediction of force field ,3D-MGP It has more significant advantages than other methods , It may be because the design of node level force prediction in this paper can be extended to the real force field distribution after fine-tuning .

Q2. 3D-aware Is the pre training task always helpful ?

A1: And Base comparison ,3D-MGP stay MD17 There have been meaningful improvements in . Interestingly ,PosPred stay MD17 Usually performs well , Even if its three-dimensional prediction target is very simple .

Q3. How does the traditional two-dimensional method perform in three-dimensional tasks ?

Most two-dimensional methods are MD17 Force field prediction and QM9 It is difficult to predict the attributes on . for example ,InfoGraph The average of predictive power MAE by 0.3742, Than training from scratch Base Method (0.2086) It's a lot worse . As the current technical level ,GraphCL In most cases, it can enhance Base, But this enhancement is not as good as that in this article 3D-MGP. Even though GraphMVP Three dimensional information has been considered , But its performance is almost better than 3D-MGP Bad , because GraphMVP Only three-dimensional geometry is used as supplementary information , Focus on two-dimensional tasks .

The figure above shows the right MD17 Results of ablation studies .

First , This article examines node level tasks ( namely EFP) And graph level tasks ( namely INP) The contribution of . It turns out that ,EFP and INP Both improve the performance alone , Their combination leads to more accurate predictions .

secondly , In order to evaluate the proposed Riemann - The importance of Gaussian distribution , In this paper, the distribution is relaxed to Gaussian distribution p(X~ | X)=N(X,σI), Double violation E(3) invariance . It turns out that , This relaxation will cause some performance damage .

Besides , This paper analyzes the necessity of energy based modeling . The force field is not derived as the gradient of the energy model , It can also be applied directly EGNN The equivariant output of as EFP Loss of predictive force field signals , This is called a direct force field (Direct Force). The last column of the above table reports , In this way, we can get higher MAE. From an algorithmic point of view , Energy based strategies can better capture global patterns , This leads to better performance , First, gather the embeddedness of all nodes as energy , Then calculate the gradient of energy as the force field .

 

Innovation points

  • This paper presents a general self supervised pre training framework for molecular three-dimensional tasks . It includes a node level isovariant force field prediction (EFP) And a graph level invariant noise prediction (INP) Mission , To jointly extract geometric information from large-scale three-dimensional molecular data sets .
  • stay MD17 and QM9 Experiments on show that the proposed method is superior to the traditional two-dimensional correspondence method . It also provides the necessary ablation 、 Visualization and Analysis