当前位置:网站首页>The first public available pytorch version alphafold2 is reproduced, and Columbia University is open source openfold, with more than 1000 stars
The first public available pytorch version alphafold2 is reproduced, and Columbia University is open source openfold, with more than 1000 stars
2022-06-24 20:17:00 【Opencv school】
Click on the above ↑↑↑“OpenCV School ” Source... Pay attention to me : official account Almost Human to grant authorization AlphaFold2 yes 2021 year AI for Science The most dazzling star in the field . Now? , Someone is here. PyTorch It is reproduced in , And already in GitHub The open source . This recovery is now comparable in performance to the original AlphaFold2, And calculating force 、 Storage requirements are more public friendly .
just , Assistant professor of systems biology, Columbia University Mohammed AlQuraishi Announce on twitter , They trained a new one called OpenFold Model of , The model is AlphaFold2 Trainable PyTorch Duplicate version .Mohammed AlQuraishi Also said , This is the first one available to the public AlphaFold2 Reappear .
AlphaFold2 Protein structure can be predicted periodically with atomic accuracy , Technically, multi sequence alignment and deep learning algorithm are used to design , Combined with the physical and biological knowledge of protein structure, the prediction effect is improved . It has achieved 2/3 The outstanding achievement of protein structure prediction was listed on the 《 natural 》 The magazine . What's more surprising is ,DeepMind The team not only opened the model , Will also AlphaFold2 The forecast data is made into a free and open data set .
However , Open source doesn't mean you can use 、 To use . Actually ,AlphaFold2 The deployment of software system is very difficult , And high requirements for hardware 、 The data set download cycle is long 、 Large space , Every one of them makes ordinary developers flinch . therefore , The open source community has been working hard to achieve AlphaFold2 The available version of .
This time Columbia University Mohammed AlQuraishi Realized by professors and others OpenFold The total training time is about 100000 A100 Hours , But around 3000 It will be reached within hours 90% The accuracy of .
OpenFold With the original AlphaFold2 The accuracy of this method is quite , Even slightly better , May be because OpenFold Your training set is a little bigger :
OpenFold The main advantage of is that the reasoning speed is significantly improved , For shorter protein sequences ,OpenFold The speed of reasoning can reach AlphaFold2 Twice as many . in addition , Due to the use of custom CUDA kernel ,OpenFold With less memory, you can infer longer protein sequences .
OpenFold Introduce
OpenFold It almost reproduces the original open source reasoning code (v2.0.1) All functions of , Except for those that tend to be eliminated 「 Model integration 」 function , This function is available in DeepMind I didn't perform well in my ablation test .
Whether or not DeepSpeed,OpenFold Can be used with full accuracy or bfloat16 Training . In order to achieve AlphaFold2 The original performance of , The team trained from scratch OpenFold, Model weights and training data have been published publicly . among , The training data contains approximately 400000 Share MSA and PDB70 Template file .OpenFold Also supports the use of AlphaFold The official parameters for protein reasoning .
Compared with other implementations ,OpenFold Has the following advantages :
- Short sequence reasoning : Accelerated in GPU The upper inference is less than 1500 The speed of the chain of amino acid residues ;
- Long sequence reasoning : Low memory attention achieved through this study (low-memory attention) Reasoning about very long chains ,OpenFold Can be in a single A100 Up forecast exceed 4000 Sequence structure of residues , With the help of CPU offload Even longer sequences can be predicted ;
- Memory efficient during training and reasoning , stay FastFold Customization based on kernel modification CUDA The core of attention , The use of GPU The memory is better than the equivalent FastFold And existing PyTorch Less realization 4 Times and 5 times ;
- Efficiently align scripts : The team uses the original AlphaFold HHblits/JackHMMER pipeline Or with MMseqs2 Of ColabFold, Millions of alignments have been generated .
Linux Installation and use of the system
The development team provides a local installation Miniconda、 establish conda A virtual environment 、 Install all Python Dependencies and download scripts for useful resources , Including two sets of model parameters .
Run the following command :
scripts/install_third_party_dependencies.shUse the following command to activate the environment :
source scripts/activate_conda_env.shDisable command :
source scripts/deactivate_conda_env.shIn the active environment , compile OpenFold Of CUDA kernel
python3 setup.py installstay / usr/bin Install under path HH-suite:
# scripts/install_hh_suite.shUse the following command to download for training OpenFold and AlphaFold The database of :
bash scripts/download_data.sh data/If you want to use a set of DeepMind One or more sequences are reasoned by the pre training parameters of , You can run the following code :
python3 run_pretrained_openfold.py \
fasta_dir \
data/pdb_mmcif/mmcif_files/ \
--uniref90_database_path data/uniref90/uniref90.fasta \
--mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path data/pdb70/pdb70 \
--uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--output_dir ./ \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--model_device "cuda:0" \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign
--config_preset "model_1_ptm"
--openfold_checkpoint_path openfold/resources/openfold_params/finetuning_2_ptm.ptFor more details, see GitHub:https://github.com/aqlaboratory/openfold
Reference link :
https://cloud.tencent.com/developer/article/1861192
https://twitter.com/MoAlQuraishi
边栏推荐
- Technology implementation | Apache Doris cold and hot data storage (I)
- To open the registry
- 视频平台如何将旧数据库导入到新数据库?
- lol手游之任务进度条精准计算
- What is showcase? What should showcase pay attention to?
- Error in Android connection database query statement
- unity实战之lol技能释放范围
- Showcase是什么?Showcase需要注意什么?
- 【Go語言刷題篇】Go從0到入門4:切片的高級用法、初級複習與Map入門學習
- Capacitive inching touch switch module control (stm32f103c8t6)
猜你喜欢

Power supply noise analysis

Where are Xiaomi mobile phone's favorite SMS and how to delete them

Test drive citus 11.0 beta (official blog)

Capacitive inching touch switch module control (stm32f103c8t6)

What is CNN (convolutional neural network)

An accident caused by a MySQL misoperation cannot be withstood by High Availability!

Kubernetes cluster deployment

Five day summary of software testing

苹果不差钱,但做内容“没底气”

Todesk remote control, detailed introduction and tutorial
随机推荐
Teach you how to cancel computer hibernation
实现基于Socket自定义的redis简单客户端
首个大众可用PyTorch版AlphaFold2复现,哥大开源OpenFold,star量破千
网络安全审查办公室对知网启动网络安全审查,称其“掌握大量重要数据及敏感信息”
宅男救不了元宇宙
An accident caused by a MySQL misoperation cannot be withstood by High Availability!
Uninstall tool v3.5.10.5670 single file portable official version
Bytebase 加入阿裏雲 PolarDB 開源數據庫社區
Error in Android connection database query statement
Capacitive inching touch switch module control (stm32f103c8t6)
[go language questions] go from 0 to entry 4: advanced usage of slice, elementary review and introduction to map
苹果不差钱,但做内容“没底气”
Why is the executor thread pool framework introduced
Kubernetes cluster deployment
The difference between the lazy man mode and the hungry man mode
[go Language brossage] go from 0 to Getting started 4: Advanced use of slice, Primary Review and Map Getting started Learning
First understand redis' data structure - string
Full link service tracking implementation scheme
托管服务与SASE,纵享网络与安全融合 | 一期一会回顾
Comparative analysis of arrayblockingqueue and linkedblockingqueue