The picture above shows 3D-MGP(3D Molecular Graph Pretraining) Overview , It is divided into node level isovariant force field prediction task and graph level invariant noise prediction task .
Isovariant force field prediction task : Add noise to the coordinates of each node , Then estimate the virtual force field that pulls the noise coordinate back to the clean coordinate . In this paper, according to certain conditional distribution , from X Take a noise sample from X~, Then the noise samples are replaced with equivariant GNN Model φEGN in , And substitute it into the force field prediction loss function obtained according to the gradient of the energy model .
Invariant noise prediction task : Identify how much noise is added to its input . Here, it shares the same with node tasks EGN skeleton , Generate exactly the same invariant nodes and graph level embedding . For the input X and X~, First, through φEGN Get their graph level embedding respectively u and u~, Scalar projection heads are not used here φProj Calculate the energy , Instead, use the classification header φScale, It takes the original conformation u And disturbed conformation u~ Graph level embedded connections as input , Last , stay logits And calculate the cross entropy loss between labels , The tag is the current input sampling noise level .

The above figure shows the energy landscape of different pre training models , It can be found that the energy landscape of this method converges smoothly to the original conformation , This means that the observed conformation corresponds to a transferable state with a local minimum energy on the projected conformation plane . However , Based on two-dimensional pre training model , Such as EdgePred、AttrPred and GraphCL Provides a rugged landscape , It may be because the knowledge they gained from the pre training process does not conform to the basic energy distribution .base Method no pre training , The output is a flat surface , Because it only learns less knowledge from small data .

The above table shows all pre training methods in downstream tasks MD17 The result on , The underlined numbers indicate the previous SOTA, The number in bold indicates the best result at present , This article summarizes 3 spot :
Q1. 3D-MGP How is the overall performance ?
A1: 3D-MGP The best performance has been achieved in most cases , Check the average in the last column of the table MAE, Can better prove its universal effectiveness . Especially in the prediction of force field ,3D-MGP It has more significant advantages than other methods , It may be because the design of node level force prediction in this paper can be extended to the real force field distribution after fine-tuning .
Q2. 3D-aware Is the pre training task always helpful ?
A1: And Base comparison ,3D-MGP stay MD17 There have been meaningful improvements in . Interestingly ,PosPred stay MD17 Usually performs well , Even if its three-dimensional prediction target is very simple .
Q3. How does the traditional two-dimensional method perform in three-dimensional tasks ?
Most two-dimensional methods are MD17 Force field prediction and QM9 It is difficult to predict the attributes on . for example ,InfoGraph The average of predictive power MAE by 0.3742, Than training from scratch Base Method (0.2086) It's a lot worse . As the current technical level ,GraphCL In most cases, it can enhance Base, But this enhancement is not as good as that in this article 3D-MGP. Even though GraphMVP Three dimensional information has been considered , But its performance is almost better than 3D-MGP Bad , because GraphMVP Only three-dimensional geometry is used as supplementary information , Focus on two-dimensional tasks .

The figure above shows the right MD17 Results of ablation studies .
First , This article examines node level tasks ( namely EFP) And graph level tasks ( namely INP) The contribution of . It turns out that ,EFP and INP Both improve the performance alone , Their combination leads to more accurate predictions .
secondly , In order to evaluate the proposed Riemann - The importance of Gaussian distribution , In this paper, the distribution is relaxed to Gaussian distribution p(X~ | X)=N(X,σI), Double violation E(3) invariance . It turns out that , This relaxation will cause some performance damage .
Besides , This paper analyzes the necessity of energy based modeling . The force field is not derived as the gradient of the energy model , It can also be applied directly EGNN The equivariant output of as EFP Loss of predictive force field signals , This is called a direct force field (Direct Force). The last column of the above table reports , In this way, we can get higher MAE. From an algorithmic point of view , Energy based strategies can better capture global patterns , This leads to better performance , First, gather the embeddedness of all nodes as energy , Then calculate the gradient of energy as the force field .
Innovation points
- This paper presents a general self supervised pre training framework for molecular three-dimensional tasks . It includes a node level isovariant force field prediction (EFP) And a graph level invariant noise prediction (INP) Mission , To jointly extract geometric information from large-scale three-dimensional molecular data sets .
- stay MD17 and QM9 Experiments on show that the proposed method is superior to the traditional two-dimensional correspondence method . It also provides the necessary ablation 、 Visualization and Analysis










![[ssm] exception handling](/img/bb/2669d2a3ee725aa4ab8443aed0a8be.png)