当前位置:网站首页>The first public available pytorch version alphafold2 is reproduced, and Columbia University is open source openfold, with more than 1000 stars

The first public available pytorch version alphafold2 is reproduced, and Columbia University is open source openfold, with more than 1000 stars

2022-06-24 20:17:00 Opencv school

 Click on the above ↑↑↑“OpenCV School ” Source... Pay attention to me : official account   Almost Human    to grant authorization 

AlphaFold2 yes 2021 year AI for Science The most dazzling star in the field . Now? , Someone is here. PyTorch It is reproduced in , And already in GitHub The open source . This recovery is now comparable in performance to the original AlphaFold2, And calculating force 、 Storage requirements are more public friendly .

just , Assistant professor of systems biology, Columbia University Mohammed AlQuraishi Announce on twitter , They trained a new one called OpenFold Model of , The model is AlphaFold2 Trainable PyTorch Duplicate version .Mohammed AlQuraishi Also said , This is the first one available to the public AlphaFold2 Reappear .

AlphaFold2 Protein structure can be predicted periodically with atomic accuracy , Technically, multi sequence alignment and deep learning algorithm are used to design , Combined with the physical and biological knowledge of protein structure, the prediction effect is improved . It has achieved 2/3 The outstanding achievement of protein structure prediction was listed on the 《 natural 》 The magazine . What's more surprising is ,DeepMind The team not only opened the model , Will also AlphaFold2 The forecast data is made into a free and open data set .

However , Open source doesn't mean you can use 、 To use . Actually ,AlphaFold2 The deployment of software system is very difficult , And high requirements for hardware 、 The data set download cycle is long 、 Large space , Every one of them makes ordinary developers flinch . therefore , The open source community has been working hard to achieve AlphaFold2 The available version of .

This time Columbia University Mohammed AlQuraishi Realized by professors and others OpenFold The total training time is about 100000 A100 Hours , But around 3000 It will be reached within hours 90% The accuracy of .

OpenFold With the original AlphaFold2 The accuracy of this method is quite , Even slightly better , May be because OpenFold Your training set is a little bigger :

OpenFold The main advantage of is that the reasoning speed is significantly improved , For shorter protein sequences ,OpenFold The speed of reasoning can reach AlphaFold2 Twice as many . in addition , Due to the use of custom CUDA kernel ,OpenFold With less memory, you can infer longer protein sequences .

OpenFold Introduce

OpenFold It almost reproduces the original open source reasoning code (v2.0.1) All functions of , Except for those that tend to be eliminated 「 Model integration 」 function , This function is available in DeepMind I didn't perform well in my ablation test .

Whether or not DeepSpeed,OpenFold Can be used with full accuracy or bfloat16 Training . In order to achieve AlphaFold2 The original performance of , The team trained from scratch OpenFold, Model weights and training data have been published publicly . among , The training data contains approximately 400000 Share MSA and PDB70 Template file .OpenFold Also supports the use of AlphaFold The official parameters for protein reasoning .

Compared with other implementations ,OpenFold Has the following advantages :

  • Short sequence reasoning : Accelerated in GPU The upper inference is less than 1500 The speed of the chain of amino acid residues ;
  • Long sequence reasoning : Low memory attention achieved through this study (low-memory attention) Reasoning about very long chains ,OpenFold Can be in a single A100 Up forecast exceed 4000 Sequence structure of residues , With the help of CPU offload Even longer sequences can be predicted ;
  • Memory efficient during training and reasoning , stay FastFold Customization based on kernel modification CUDA The core of attention , The use of GPU The memory is better than the equivalent FastFold And existing PyTorch Less realization 4 Times and 5 times ;
  • Efficiently align scripts : The team uses the original AlphaFold HHblits/JackHMMER pipeline Or with MMseqs2 Of ColabFold, Millions of alignments have been generated .

Linux Installation and use of the system

The development team provides a local installation Miniconda、 establish conda A virtual environment 、 Install all Python Dependencies and download scripts for useful resources , Including two sets of model parameters .

Run the following command :

scripts/install_third_party_dependencies.sh

Use the following command to activate the environment :

source scripts/activate_conda_env.sh

Disable command :

source scripts/deactivate_conda_env.sh

In the active environment , compile OpenFold Of CUDA kernel

python3 setup.py install

stay / usr/bin Install under path HH-suite:

# scripts/install_hh_suite.sh

Use the following command to download for training OpenFold and AlphaFold The database of :

bash scripts/download_data.sh data/

If you want to use a set of DeepMind One or more sequences are reasoned by the pre training parameters of , You can run the following code :

python3 run_pretrained_openfold.py \
    fasta_dir \
    data/pdb_mmcif/mmcif_files/ \
    --uniref90_database_path data/uniref90/uniref90.fasta \
    --mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path data/pdb70/pdb70 \
    --uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --output_dir ./ \
    --bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --model_device "cuda:0" \
    --jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
    --hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
    --hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
    --kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign
    --config_preset "model_1_ptm"
    --openfold_checkpoint_path openfold/resources/openfold_params/finetuning_2_ptm.pt

For more details, see GitHub:https://github.com/aqlaboratory/openfold

Reference link :

https://cloud.tencent.com/developer/article/1861192

https://twitter.com/MoAlQuraishi

原网站

版权声明
本文为[Opencv school]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206241900534363.html