当前位置:网站首页>Interpretation of the paper: "i4mc deep: intelligent prediction of N4 methylcytosine sites using deep learning methods with chemical properties"
Interpretation of the paper: "i4mc deep: intelligent prediction of N4 methylcytosine sites using deep learning methods with chemical properties"
2022-07-23 12:22:00 【Windy Street】
i4mC-Deep: An Intelligent Predictor of N4-Methylcytosine Sites Using a Deep Learning Approach with Chemical Properties
The article links :https://www.mdpi.com/2073-4425/12/8/1117
DOI:https://doi.org/10.3390/genes12081117
Periodical :Genes( Three District )
Influencing factors :4.096
Release time :2021 year 7 month 23 Japan
The server :http://nsclbio.jbnu.ac.kr/tools/i4mC-Deep/
Supplementary documents : https://www.mdpi.com/article/10.3390/genes12081117/s1
Code and data :https://github.com/waleed551/i4mC-Deep
1. An overview of the article
DNA suffer N4- Methylcytosine (4mC) Epigenetic modification of molecules .N4- Methylcytosine in DNA Play an important role in repair and replication , Protect the host DNA Free from degradation , Adjust the DNA expression . The current experimental technology is expensive and laborious . Traditional machine learning methods rely on manually extracted features , But the new method saves time and computing costs by using learning features . In this study , We proposed i4mC-Deep, This is a convolutional neural network (CNN) Intelligent predictor , Predictable DNA In the sample 4mC Modification site . extract DNN Nucleotide chemical characteristics and nucleotide density characteristics of the sequence , As CNN Input data for . The results of the proposed method are better than several state-of-the-art predictors . use i4mC-Deep Methods analyze the underground ryegrass DNA, Compared with traditional prediction , Accuracy rate (ACC) Improved 3.9%,MCC Improved 10.5% .
2. background
lately , Some computing tools have been developed to identify 4mC site , Include iDNA4mC,4mCPred,4mCPred-SVM and SOMM4mC. All these tools are based on machine learning technology and handmade functions .iDNA4mC Using nucleotide chemical properties and nucleotide frequency as feature vectors, combined with support vector machine (SVM) To detect 4mC site .4mCPred and 4mCPred-SVM Support vector machine is also used , But there are different characteristics ,4mCPred Using two feature coding techniques , That is, position specific trinucleotide tendency (PSTNP) And the electron of trinucleotide - Ion interaction ;4mCPred-SVM Apply four features to 4mC Combined prediction of loci , namely K-mer Dinucleotide frequency 、 Single nucleotide binary coding 、 Dinucleotide binary coding and local position specific dinucleotide frequency .SOMM4mC The classical first-order and second-order Markov models are used to predict 4mC Epigenetic modification sites , And shows better performance than the other tools mentioned above . Besides ,4mCCNN and DeepTorrent It's a technology based on deep learning .4mCCNN use One-hot Encoded data representation and Convolutional Neural Networks .DeepTorrent Four methods with convolution and LSTM Layer feature extraction technology . Previous deep learning models used complex structures , Parameters and calculation amount are added .
In this study , The author uses a convolutional neural network (CNN) To develop an accurate and effective computing tool .CNN Include : Convolution layer (convolution)、 Batch normalization layer (batch normalization)、 Flattened layer (Flatten)、 Lost layer (Dropout) And full connection layer (Dense), Convolution layer is used to automatically extract encoded DNA Important features in sequences . The author uses the chemical properties of nucleotides (NCP) And nucleotide density (ND) Method code input DNA Sequence , Then use batch normalization and Dropout Control over fitting , Finally, the full connection layer will DNA The sequence is divided into 4mC Site and non - 4mC site . Use 10 Multiple cross validation techniques to evaluate i4mC-Deep,i4mC-Deep The results are better than previous tools .i4mC-Deep The structure of is as shown in the figure 1 Shown . The author also developed a free online web server .
2. data
Data sets play a very important role in developing efficient and reliable computing tools . The author makes use of 6 There are different kinds of prokaryotes and eukaryotes 、Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterraneus, and Geobacter pickeringii. The data of . These datasets are using MethSMRT Database building . The benchmark dataset includes 1554、1769、1978、388、906 and 569 Positive and negative samples . Each sequence in the six datasets has a central cytosine , The length is 41 Base .
3. Method
3.1 Feature code
- Chemical properties of nucleotides (nucleotide chemical properties,NCP)

- Nucleotide density (nucleotide density,ND)
DNA Frequency information of each nucleotide in the sequence .
3.2 Model
Parameter selection range :
The best parameters : The convolution layer is 2, The size of the two-layer filter is 8, The filling amount of the two layers is “same”, The size of the two-tier kernel is 3, The loss probability is 0.3.
application l2 Regularization and dropout Regularization to avoid over fitting of the network , Use learning rate is 0.001 Of Adam Optimizer ,batch size The best is 32, Set the number of iterations (epochs) by 200, It can be stopped in advance .
4. result
4.1 Comparison with other most advanced methods



4.2 Sequence analysis
t-SNE visualization :
Heat map in electron catastrophe analysis :

The effect of mutation on prediction probability :





5.Web The server
link :http://nsclbio.jbnu.ac.kr/tools/i4mC-Deep/

边栏推荐
- Notes | Baidu flying plasma AI talent Creation Camp: detailed explanation of deep learning model training and key parameter tuning
- Gartner调查研究:中国的数字化发展较之世界水平如何?高性能计算能否占据主导地位?
- Use pyod to detect outliers
- 保存实质审查请求书出现Schema校验失败的解决方法
- 建设“绿色计算”,解读“智算中心”
- 线性规划之Google OR-Tools 简介与实战
- Nt68661 screen parameter upgrade-rk3128-start up and upgrade screen parameters yourself
- NVIDIA NVIDIA released H100 GPU, and the water-cooled server is adapted on the road
- 以不太严谨但是有逻辑的数学理论---剖析VIO之预积分
- “東數西算”下數據中心的液冷GPU服務器如何發展?
猜你喜欢

Notes | Baidu flying plasma AI talent Creation Camp: data acquisition and processing (mainly CV tasks)

NVIDIA NVIDIA released H100 GPU, and the water-cooled server is adapted on the road

Practical convolution correlation trick

Development and deployment of steel defect detection using paddlex yolov3 of propeller

Tips for using textviewdidchange of uitextview

Gartner research: how is China's digital development compared with the world level? Can high-performance computing dominate?

2021 TOP10 development trend of information science. Deep learning? Convolutional neural network?

编码器的一点理解

数据挖掘场景-发票虚开

Comment se développe le serveur GPU refroidi à l'eau dans le Centre de données dans le cadre de l'informatique est - Ouest?
随机推荐
Check the sandbox file in the real app
高分子物理名词解释归纳
Data mining scenario - false invoice
How to develop the computing power and AI intelligent chips in the data center of "digital computing in the East and digital computing in the west"?
G2o installation path record -- for uninstallation
Interpretation of the paper: attention based multi label neural network for comprehensive prediction and interpretation of 12 widely existing RNA modifications
NLP natural language processing - Introduction to machine learning and natural language processing (2)
Using or tools to solve the path planning problem with capacity constraints (CVRP)
Using pycaret: low code, automated machine learning framework to solve regression problems
《高分子合成工艺》简答题答案
单片机学习笔记1--资料下载、环境搭建(基于百问网STM32F103系列教程)
论文解读:《提高N7-甲基鸟苷(m7G)位点预测性能的迭代特征表示方法》
Notes | Baidu flying plasma AI talent Creation Camp: data acquisition and processing (mainly CV tasks)
Use pyod to detect outliers
论文解读:《BERT4Bitter:一种基于transformer(BERT)双向编码器表示用于改善苦肽预测的基础模型》
利用pycaret:低代码,自动化机器学习框架解决分类问题
3D image classification of lung CT scan using propeller
NLP natural language processing - Introduction to machine learning and natural language processing (I)
论文解读:《i4mC-Deep: 利用具有化学特性的深度学习方法,对 N4-甲基胞嘧啶位点进行智能预测》
高电压技术基础知识