当前位置:网站首页>The first public available pytorch version alphafold2 is reproduced, and Columbia University is open source openfold, with more than 1000 stars
The first public available pytorch version alphafold2 is reproduced, and Columbia University is open source openfold, with more than 1000 stars
2022-06-24 20:17:00 【Opencv school】
Click on the above ↑↑↑“OpenCV School ” Source... Pay attention to me : official account Almost Human to grant authorization
AlphaFold2 yes 2021 year AI for Science The most dazzling star in the field . Now? , Someone is here. PyTorch It is reproduced in , And already in GitHub The open source . This recovery is now comparable in performance to the original AlphaFold2, And calculating force 、 Storage requirements are more public friendly .
just , Assistant professor of systems biology, Columbia University Mohammed AlQuraishi Announce on twitter , They trained a new one called OpenFold Model of , The model is AlphaFold2 Trainable PyTorch Duplicate version .Mohammed AlQuraishi Also said , This is the first one available to the public AlphaFold2 Reappear .
AlphaFold2 Protein structure can be predicted periodically with atomic accuracy , Technically, multi sequence alignment and deep learning algorithm are used to design , Combined with the physical and biological knowledge of protein structure, the prediction effect is improved . It has achieved 2/3 The outstanding achievement of protein structure prediction was listed on the 《 natural 》 The magazine . What's more surprising is ,DeepMind The team not only opened the model , Will also AlphaFold2 The forecast data is made into a free and open data set .
However , Open source doesn't mean you can use 、 To use . Actually ,AlphaFold2 The deployment of software system is very difficult , And high requirements for hardware 、 The data set download cycle is long 、 Large space , Every one of them makes ordinary developers flinch . therefore , The open source community has been working hard to achieve AlphaFold2 The available version of .
This time Columbia University Mohammed AlQuraishi Realized by professors and others OpenFold The total training time is about 100000 A100 Hours , But around 3000 It will be reached within hours 90% The accuracy of .
OpenFold With the original AlphaFold2 The accuracy of this method is quite , Even slightly better , May be because OpenFold Your training set is a little bigger :
OpenFold The main advantage of is that the reasoning speed is significantly improved , For shorter protein sequences ,OpenFold The speed of reasoning can reach AlphaFold2 Twice as many . in addition , Due to the use of custom CUDA kernel ,OpenFold With less memory, you can infer longer protein sequences .
OpenFold Introduce
OpenFold It almost reproduces the original open source reasoning code (v2.0.1) All functions of , Except for those that tend to be eliminated 「 Model integration 」 function , This function is available in DeepMind I didn't perform well in my ablation test .
Whether or not DeepSpeed,OpenFold Can be used with full accuracy or bfloat16 Training . In order to achieve AlphaFold2 The original performance of , The team trained from scratch OpenFold, Model weights and training data have been published publicly . among , The training data contains approximately 400000 Share MSA and PDB70 Template file .OpenFold Also supports the use of AlphaFold The official parameters for protein reasoning .
Compared with other implementations ,OpenFold Has the following advantages :
- Short sequence reasoning : Accelerated in GPU The upper inference is less than 1500 The speed of the chain of amino acid residues ;
- Long sequence reasoning : Low memory attention achieved through this study (low-memory attention) Reasoning about very long chains ,OpenFold Can be in a single A100 Up forecast exceed 4000 Sequence structure of residues , With the help of CPU offload Even longer sequences can be predicted ;
- Memory efficient during training and reasoning , stay FastFold Customization based on kernel modification CUDA The core of attention , The use of GPU The memory is better than the equivalent FastFold And existing PyTorch Less realization 4 Times and 5 times ;
- Efficiently align scripts : The team uses the original AlphaFold HHblits/JackHMMER pipeline Or with MMseqs2 Of ColabFold, Millions of alignments have been generated .
Linux Installation and use of the system
The development team provides a local installation Miniconda、 establish conda A virtual environment 、 Install all Python Dependencies and download scripts for useful resources , Including two sets of model parameters .
Run the following command :
scripts/install_third_party_dependencies.sh
Use the following command to activate the environment :
source scripts/activate_conda_env.sh
Disable command :
source scripts/deactivate_conda_env.sh
In the active environment , compile OpenFold Of CUDA kernel
python3 setup.py install
stay / usr/bin Install under path HH-suite:
# scripts/install_hh_suite.sh
Use the following command to download for training OpenFold and AlphaFold The database of :
bash scripts/download_data.sh data/
If you want to use a set of DeepMind One or more sequences are reasoned by the pre training parameters of , You can run the following code :
python3 run_pretrained_openfold.py \
fasta_dir \
data/pdb_mmcif/mmcif_files/ \
--uniref90_database_path data/uniref90/uniref90.fasta \
--mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path data/pdb70/pdb70 \
--uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--output_dir ./ \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--model_device "cuda:0" \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign
--config_preset "model_1_ptm"
--openfold_checkpoint_path openfold/resources/openfold_params/finetuning_2_ptm.pt
For more details, see GitHub:https://github.com/aqlaboratory/openfold
Reference link :
https://cloud.tencent.com/developer/article/1861192
https://twitter.com/MoAlQuraishi
边栏推荐
- 数字孪生行业案例:智慧港口数字化
- Nodered has no return value after successfully inserting into the database (the request cannot be ended)
- Write a positive integer to the node and return a floating-point number multiplied by 0.85 when reading the node
- Stackoverflow 年度报告 2022:开发者最喜爱的数据库是什么?
- Database index can improve query efficiency. Ask what will improve, what is the difference between inapplicable index and index use, and what will happen.
- What are the functions of IBPs open source form designer?
- First understand redis' data structure - string
- Docker installing Oracle
- [cann document express issue 06] first knowledge of tbe DSL operator development
- [suggested collection] time series prediction application and paper summary
猜你喜欢
R for Data Science (notes) -- data transformation (used by filter)
Mq-2 smoke concentration sensor (STM32F103)
网络安全审查办公室对知网启动网络安全审查,称其“掌握大量重要数据及敏感信息”
对“宁王”边卖边买,高瓴资本“高抛低吸”已套现数十亿
Download steps of STM32 firmware library
Volcano becomes spark default batch scheduler
What about the Golden Angel of thunder one? Golden Angel mission details
Why is the executor thread pool framework introduced
Geoscience remote sensing data collection online
Some small requirements for SQL Engine for domestic database manufacturers
随机推荐
VXLAN 与 MPLS:从数据中心到城域以太网
Programmers spend most of their time not writing code, but...
UART communication (STM32F103 library function)
OpenVINO2022 Dev Tools安装与使用
Teach you how to cancel computer hibernation
Pingcap was selected as the "voice of customers" of Gartner cloud database in 2022, and won the highest score of "outstanding performer"
Confirm whether the host is a large terminal or a small terminal
JMeter environment deployment
Teach you how to view the number of connected people on WiFi in detail how to view the number of connected people on WiFi
How does the video platform import the old database into the new database?
Power efficiency test
思源笔记工具栏中的按钮名称变成了 undefined,有人遇到过吗?
Microsoft Office Excel 2013 2016 graphic tutorial on how to enable macro function
To open the registry
Behind Tiantian Jianbao storm: tens of millions in arrears, APP shutdown, and the founder's premeditated plan to run away?
Kubernetes cluster deployment
Nodered has no return value after successfully inserting into the database (the request cannot be ended)
An accident caused by a MySQL misoperation cannot be withstood by High Availability!
[cann document express issue 05] let you know what operators are
60 divine vs Code plug-ins!!