当前位置:网站首页>Why are life science enterprises on the cloud in succession?
Why are life science enterprises on the cloud in succession?
2022-06-25 18:51:00 【Alibaba cloud developers】
brief introduction : Abstract : This article will start with the huge demand for computing power from the current situation of the life science industry , Show the current needs and pain points of the industry in the infrastructure layer , Answer why high-performance computing on the cloud will greatly contribute to the rapid development of life science enterprises .
writing | Alibaba cloud elastic high performance computing team
The life science industry is ushering in a golden age of development . The development of medicine and people's pursuit of health , It is rapidly transforming into a new driving force for the development of the whole industrial chain of life science , High performance computing HPC It plays a very important role in life science research . meanwhile , With the rapid development of life science industry , We can see , The industry cloud has become an irresistible trend .
Thanks to the flexibility and convenience of the cloud , An industry's urgent demand for cloud computing is often inseparable from its rapid development , Tradition IT The stock of 、 deliver 、 The long process of deployment determines that it cannot meet the soaring demand of the fast-growing industry IT demand .
This article will start with the huge demand for computing power from the current situation of the life science industry , Show what needs and pain points the industry is facing at the infrastructure level , Answer why high-performance computing on the cloud will greatly contribute to the rapid development of life science enterprises .
One 、 The demand of life science for computing power : Large scale 、 High performance 、 Rich in types
at present , There are two main scenarios in the life science industry , Namely Computer Assist in drug design and gene sequencing .
1、 Computer aided drug development
21 Since the 20th century , Due to the increasing complexity of the disease , The number of drug targets gradually decreases , The difficulty and cost of new drug research and development have increased significantly , At the same time, the success rate of new drug research and development in the world shows an obvious downward trend . Research and development of innovative drugs is the key for pharmaceutical enterprises to build core competitiveness and sustainable development , Drug research and development is a high investment 、 High technology 、 High risk 、 Long term systems engineering . Pharmaceutical companies began to seek AI、 Big data and other computer technologies assist drug research and development .
The whole process of drug research and development
The birth of new drugs usually requires drug discovery 、 Preclinical study 、 After clinical trial and approval , Finally, it can be approved for listing . Find... At the target 、 Drug discovery stage such as compound synthesis , And preclinical research stages such as compound screening , It often needs the powerful computing power of high-performance computing to accelerate the R & D process to assist drug design .
When predicting the protein structure in the target discovery process , There are existing prediction schemes based on molecular dynamics and plane waves , It's also based on AI for Science Solutions for .
The former is high performance computing HPC Typical application scenarios , Yes VASP、Gromacs And other mature software solutions , The simulation results are obtained through calculation . The scenario , The scale of simulation problem is proportional to the amount of computing resources .
meanwhile , The industry is also emerging AlphaFold2 Wait for solutions , adopt utilize AI Technology to establish the relationship between protein sequence and structure , Continuously learn the known sequence and structure, and then predict the protein structure . With the support of powerful algorithms and computing power ,DeepMind Reduce computing time from months to hours . As the scale of network model parameters increases , The demand for computing power is becoming higher and higher .
Three dimensional structure of protein AI forecast
similarly , In virtual compound screening , Pharmaceutical companies usually need millions of levels of molecular and protein structures for docking . Each ligand molecule requires computational resources to obtain docking fractions , So as to screen out the molecules that can be used for experimental verification of activity , Faced with a large number of ligand molecular libraries , It requires a huge amount of computing power to support the docking of molecular and protein structures . obviously , The computing power of a single machine is very difficult to be competent for such a large-scale virtual screening task , So use high-performance computing HPC It is very important for the cluster to carry out large-scale virtual screening tasks .
Lead compound discovery process
Find... At the target 、 In the process of compound screening and compound synthesis , Different calculation modes 、 Parameters and software , The requirements for computing resources are often different . Especially with AI The introduction of , Put forward higher requirements for diversified allocation of multiple resources .
2、 Gene sequencing
The business process of gene sequencing mainly includes sample on the computer ( Sequencer )、 Sequencing file generation 、 Gene sequence alignment and result analysis ( Computer ), And deliver the result data and report to each scientific research and medical institution . among , Gene sequence alignment and analysis are extremely time-consuming , Involving a large number of professional software in the field of bioscience , The computing power performance and scheme optimization of computing resources play a vital role in the R & D efficiency of shin .
Gene sequencing business process
Typical for gene sequencing WGS( Human genome sequencing ) technological process , Because it involves library index construction 、reads comparison 、 Sort 、 duplicate removal 、BQSR Correction and Caller Etc , Various methods 、 Complicated process , And different steps correspond to BWA、GATK And other different software and parameters , Different Shengxin software may correspond to different concurrency capabilities and performance , Different screening tasks have different effects on the diversity and scale of computing resources , Not only do you need flexible computing resources , You also need to have a variety of instance configurations .
Second generation gene sequencing WGS Sequencing process
Two 、 Pain points and challenges faced by life science at the infrastructure level
It turns out that most life science enterprises are self built offline IDC The way of the computer room . On the whole , Of life science enterprises IT Infrastructure mainly faces Fixed resource scale 、 The construction period is long and The operation and maintenance cost of hardware resources is high Three big questions , The specific performance is as follows :
1、 Fixed resources , Unable to meet the needs of business growth and resource diversity
1.1 The scale of computing power is fixed , Affect the business growth rate
The enterprise is building tradition IDC At the beginning of the , The scale of resources is often clearly planned , Therefore, the task throughput of the whole cluster is fixed . For the cyclical new drug R & D and sequencing business , Different R & D cycles and R & D tasks require different resources , So it usually happens : During the peak period, tasks are queued because they are waiting for resources , In the trough period, there is the problem of idle resources , This is Flexible computing resources are needed to handle business .
1.2 Resource allocation is fixed , Unable to meet the needs of resource diversity
Local IDC The computing resources of the computer room are planned in the early stage , The allocation of resources is limited , This will cause the traditional sequencing method to complete the execution of different sequencing steps with the same resources , Unable to flexibly change configuration , It leads to a large waste of computing resources . However, as mentioned earlier , The computing resources it needs are flexible and multilateral .
1.3 Fixed storage capacity , Unable to meet the growing storage needs of users
For growing storage sizes , Shengxin enterprises are facing great pressure on offline storage equipment operation and maintenance and storage equipment procurement costs , How to achieve high efficiency 、 Security 、 Stable 、 Cost effective and sustainable storage solutions , It is also a big problem faced by life science enterprises .
With Study on protein structure For example , Generally speaking, there are X Ray crystallography 、 Nuclear magnetic resonance and freeze electron microscopy were used to determine protein structure . Take the freezing electron microscope as an example , The electron microscopic data of a single sample are generally 10TB Level , The local data volume of the enterprise is PB level . meanwhile , Bioinformatics research data contains a large number of reference library data 、 Sample data and intermediate data files . among , The whole process data of single human whole genome sequencing is up to 1TB size , Due to the periodicity and particularity of Shengxin data , Usually, the storage capacity of local data of Shengxin enterprises reaches PB Level .
2、 The construction period is long , Impact on business growth
2.1 Long lead time , Unable to meet the user's time effective demand for instant use
Tradition IDC Building , Generally, you need to go through project approval 、 The bidding 、 Procurement and delivery processes , It often takes months or even a year to build . It is necessary to evaluate the scale of follow-up business in the project initiation process 、 Define the resource construction plan , This is for the fast-growing business , Such a long construction period will become the bottleneck of the fast-growing business .
2.2 The iteration of hardware resource selection is slow , Unable to meet the user's escalating resource needs
In tradition IDC Under construction , It is often difficult for enterprises to quickly obtain the hardware resources of the latest architecture , These resources can often bring considerable acceleration to the business .
for example , Compare with Volta framework , Ying Wei Da A100 The single precision training of the architecture can provide up to 20 Double acceleration , This is important for AI Technology to accelerate protein structure prediction , Is a great help .
And for WGS Sequence , be based on GPU or FPGA There are also a large number of selection and verification processes in the research and development of heterogeneous acceleration solutions . Offline IDC Under construction , It's not just about CPU/GPU/FPGA And other products , Choose the right hardware specification , There is also a need to assess the evolution of the business architecture , This will be a huge challenge for all kinds of life science enterprises in building resources .
3 The operation and maintenance cost is high
Offline IDC The operation and maintenance of the computer room also requires a large amount of human investment . In addition to cluster computing resource management 、 Scheduling of computing tasks 、 Beyond user rights management , Calculate the stability of the resource itself , In particular, the hardware failure will have a serious impact on the business progress . If the task is terminated due to downtime during calculation , In the absence of checkpoint Can only be recalculated . Besides , Offline storage also needs to consider disaster recovery , Avoid data loss caused by hardware failure . therefore , Management of computing resources 、 Resource stability 、 Data disaster recovery and other work need a special operation and maintenance team to be responsible for , It also increases the cost .
At present , Because traditional IDC There are resource constraints on the infrastructure provided 、 Long lead time 、 Resources are inelastic 、 The iterative upgrade of hardware resources is slow 、 High operation and maintenance cost , More and more life science enterprises are turning to be more flexible 、 Stable 、 Cost effective cloud based high-performance computing solutions , To accelerate the innovative development of the business .
3、 ... and 、 Alibaba cloud EHPC Life science solutions
Alibaba cloud believes that , High performance computing on the cloud is currently HPC The best way to build and use . For the relevant needs of the life science industry , Alibaba cloud relies on its worldwide computing power and the industry-leading DPCA architecture , Provide high-performance computing public cloud solutions 、 Hybrid cloud solutions 、 Large memory instance performance optimization solution 、 Containerized solutions 、 Pharmacy AI Solutions, etc , It can cover the needs of different scenarios in the industry , And has the following advantages :
(1) Rich computing power , Buy on demand : Alibaba cloud It operates in four continents around the world 27 A public cloud region 、84 Usable area ; On the cloud Automatic scaling capability supports scheduling across data centers , The types of computing resources that meet the requirements of large-scale parallel jobs can also be flexibly configured according to the scheduler queue to support multi specification heterogeneous computing power , And large memory 、 High main frequency and other specifications CPU example ;
(2) Stretch and stretch , Authors efficiency : Alibaba cloud elastic high-performance computing E-HPC The platform can be dynamically created / Delete calculation node , Flexibly configure the scaling strategy , Elastic billing based on actual load , The price of preemptive instances is as low as 1 fold , Reduce customer use costs , Improve operation quality and speed ;
(3) Simple operation and maintenance , Let enterprises focus on core business development : Full compatibility HPC Business , Automatically set up a cluster , Provide job performance analysis , Based on clusters respectively 、 example 、 Process and other dimensions locate hot spots , Support visual output of job report , Provide user 、 Mission 、 Consumption composition of queue and other dimensions ;
(4) New technology empowerment , Enjoy the bonus quickly :IaaS layer , Alibaba cloud continues to iterate over the latest computing power ,SaaS and PaaS There are hundreds of third-party partners integrating with Alibaba cloud , Enable life science enterprises to quickly obtain relevant technical services . Alibaba cloud's rich ecology and continuous iterative technical capabilities on the cloud , Help enterprises enjoy the whole process of technical services and the latest technology dividends .
Alibaba cloud's high-performance computing is now already It is widely used in industrial simulation (CAD/CAE)、 chip design (EDA)、 Biomedical materials 、 Energy exploration and public services .
Shenzhen technology uses the cost optimization strategy of flexible supply , Combined with the price of preemptive instances , With 30% Cost to complete the delivery of massive resources . At the same time, cloud elastic high-performance computing E-HPC Features of automatic operation and maintenance , It reduces the operation and maintenance cost of Shenshi technology , Improve the efficiency of cluster management .
Shengting medical, a life medicine enterprise, has optimized the tradition by going to the cloud IDC Data reliability of cluster 、 Operation and maintenance cost and efficiency , The efficiency of gene comparison and analysis has been improved 70%. Alibaba cloud's high-performance computing team also combines Slurm Business workflow dependency Combined with automatic scaling , Reduce the waste of invalid computing resources , It effectively reduces the use cost .
Please click the following link to enter “ Alibaba cloud life science best practices ” Learn more about the scheme and case details on the topic page :https://developer.aliyun.com/topic/life_science_best_practice
Link to the original text :https://developer.aliyun.com/article/972632?
Copyright notice : The content of this article is contributed by alicloud real name registered users , The copyright belongs to the original author , Alicloud developer community does not own its copyright , It also does not bear the corresponding legal liability . Please check the specific rules 《 Alicloud developer community user service agreement 》 and 《 Alibaba cloud developer community intellectual property protection guidelines 》. If you find any suspected plagiarism in this community , Fill in the infringement complaint form to report , Once verified , The community will immediately delete the suspected infringement content .
边栏推荐
- QQ机器人闪照转发/撤回消息转发【最新beta2版本】
- Training of long and difficult sentences in postgraduate entrance examination day89
- Training of long and difficult sentences in postgraduate entrance examination day83
- 158 Bar _ Modèle Power bi utilise Dax + SVG pour créer des diagrammes d'affaires presque toutes les possibilités
- RMAN backup database_ catalogue
- [deeply understand tcapulusdb technology] transaction execution of document acceptance
- Do you want to know how new investors open accounts? Is online account opening safe?
- 正则表达式总结
- 158_ Model_ Power Bi uses DAX + SVG to open up almost all possibilities for making business charts
- mysql视图讲解
猜你喜欢
JVM | runtime data area (heap space)
Kotlin Compose 终结toDo项目 点击可以编辑修改todo
Analysis on the development trend of China's intense pulsed light equipment industry in 2021: the market scale is growing, and the proportion of imported brands is large [figure]
Overview and trend analysis of China's foreign direct investment industry in 2020 [figure]
[deeply understand tcapulusdb technology] one click installation of tmonitor background
Analysis on employment compensation of 2021 college graduates: the average monthly starting salary of doctors, masters, undergraduates and junior colleges is 14823 yuan, 10113 yuan, 5825 yuan and 3910
[elt.zip] openharmony paper Club - witness file compression system erofs
Current situation and development suggestions of China's green PPP project industry: the investment scale is expanding, and the existing problems need to be improved to further promote the healthy dev
Redis configuration (Internet access, password)
[in depth understanding of tcapulusdb technology] tcapulusdb operation and maintenance
随机推荐
Detailed explanation of oauth2 - Introduction (I)
User management and permissions
Kotlin Compose 终结toDo项目 点击可以编辑修改todo
网络安全检测与防范 测试题(五)
Training of long and difficult sentences in postgraduate entrance examination day84
Move graph explorer to jupyterab: use ges4jupyter to connect ges and explore graphs
Kwai 616 war report was launched, and the essence was thrown away for the second time to lead the new wave. Fast brand jumped to the top 3 of the hot list
C generic class case
Apifox简单了解——WEB端测试的集大成者
Training of long and difficult sentences in postgraduate entrance examination day92
mysql视图讲解
TCP/IP 测试题(三)
Huawei cloud SRE deterministic operation and maintenance special issue (the first issue)
Training of long and difficult sentences in postgraduate entrance examination day81
[in depth understanding of tcapulusdb technology] business guide for creating doc acceptance
Regular expression summary
solidity日期工具
Analysis on employment compensation of 2021 college graduates: the average monthly starting salary of doctors, masters, undergraduates and junior colleges is 14823 yuan, 10113 yuan, 5825 yuan and 3910
Analysis on market scale and supply of China's needle coke industry in 2020 [figure]
QQ机器人官方插件加载配置方法【beta2版本】