当前位置:网站首页>Paper reading (56):muti features predction of protein translational modification sites (task)
Paper reading (56):muti features predction of protein translational modification sites (task)
2022-06-23 18:04:00 【Inge】
List of articles
1 introduce
1.1 subject
1.2 summary
Post translational modification (Post translational modification, PTM) It plays an important role in biological processing . Potential post-translational modifications consist of central sites and adjacent amino acid residues , They are basic protein sequence residues , It helps to exert their biological functions , It is also helpful to understand the molecular mechanism that is the basis of protein design and drug design . The existing modification site prediction algorithms often have low stability and accuracy And so on .
This paper combines the physics of protein 、 chemical 、 Statistical and biological characteristics , A new framework is proposed to predict the post-translational modification sites of proteins . call Multilayer neural network and support vector machine To predict potential modification sites with selected characteristics , These characteristics include the composition of amino acid residues 、 Of protein fragments E-H Description and AAIndex Several properties in the database . Consider possible redundant information , In the processing step, the feature selection . Experimental results show that , The proposed method can improve the accuracy of the classification problem .
1.3 Bib
@article{
Bao:2017:14531460,
author = {
Wen Zheng Bao and Chang-An Yuan and You Hua Zhang and Kyungsook Han and Asoke K Nandi and Barry Honig and De-Shuang Huang},
title = {
Mutli-features prediction of protein translational modification sites},
journal = {
{
IEEE}/{
ACM} Transactions on Computational Biology and Bioinformatics},
volume = {
15},
number = {
5},
pages = {
1453--1460},
year = {
2017},
doi = {
10.1109/TCBB.2017.2752703}
}
2 Method
2.1 Data sets
The function of a protein depends on its spatial conformation . therefore , The spatial structure of protein fragments may be helpful to analyze and identify the characteristics of potential modification sites .
Experimental data sets yes PTM Benchmark data set for the prediction field :
1) A well-known database in the field of protein post-translational modification CPLM. The database contains 2500 Multiple lysine succinylation sites and as positive samples 24000 Non succinylation sites as negative samples , From 896 Protein sequences . All the above protein fragments and polypeptide sequences are from UniProt, This is a famous protein database in the field of bioinformatics . It has been used for enzyme specificity (ES) And protein - Protein binding sites (PPB) The study of .
2) be used for Predict a variety of protein sequences K-PTM Type of modification site Framework , It contains 6394 Potential modification sites , These loci are considered to come from 27 Tuple peptide like . Yes 1750 Samples do not belong to the four K-PTM Any one of the types ,3895 Samples belong to a kind of K-PTM,740 The samples belong to two kinds PTM type ,9 The samples belong to three kinds PTM type , All four types do not .
3) Post - translational modification of fragment data sets . Lysine acetylation site datasets for three species , Including Homo sapiens 、 House mouse and Saccharomyces cerevisiae , From multiple sources , Include PhosphoSite、UniProtKB/Swiss-Prot、UbiProt and SCUD, These are well-known databases in the field of proteomics . Because ubiquitin seems to be attached to lysine residues of proteins to some extent . therefore , In our work, we only considered lysine ubiquitination in the above three species . The original data set includes 11547 Protein sequences covering different species ; In these sequences , exceed 8000 One from H.sapiens, about 3300 One from M.musculus, exceed 4500 One from S.cerevisiae. Remove 3 After the redundant protein fragments of the samples , Extract to 3 Multiple samples of samples , Among them are 6323 Share H.sapiens sample 、2342 Share M.musculus Samples and 7863 Share S.cerevisiaes sample . after , Randomly selected from each data set of three species 20 Three proteins form a separate test set , The rest 6303、2322 and 7843 Three proteins were used to construct the training set .
2.2 feature selection
Generally speaking , The types of protein characteristics can reach 4 More than ten thousand . These various types of features , Including amino acid composition model (AAC) Pseudo amino acid composition model (PseAAC) And other relevant information about protein characteristics [26]. However , These characteristics are difficult to effectively and accurately describe the interaction between predicted modification sites and adjacent amino acid residues . therefore , This paper introduces a typical 、 Special features , It has the ability to describe protein peptides .
First , When it comes to the composition of amino acid residues , Many researchers in bioinformatics and computational biology usually use the statistical information of protein sequences . These characteristics only describe the potential modification of the statistical aspects . Of course , In such feature sets , The selection of key features can be seen as a difficult task .
Found to have 20 Amino acid residues in 3 Class special structural elements : screw 、 There is a tendency to be swallowed up in chains and spirals . These functions are selected from PSIPRED. PSIPRED The developers of e.g .
Consider effectively α \alpha α Helix and β \beta β Chain distribution , We use it E-H Sequence description Represents the predicted protein fragment . The following table contains E-H Several features described . From the above characteristics , Both basic features and new features can describe the E and H Type of statistics . Because all the above features contain some redundant information and noise . therefore , The selected features are shown in the following table .


The most popular and well-known amino acid signature index is AAindex, It is a digitally indexed website database , Various biology including amino acid residues 、 Physical and chemical properties and characteristics of other forms of protein sequences . meanwhile ,AAindex Contains information on three protein properties :AAindex1、AAindex2 and AAindex3 [27-29]. therefore , The characteristics of several amino acids were used in this study .
边栏推荐
- The draganddrop framework, a new member of jetpack, greatly simplifies the development of drag and drop gestures!
- Script to view the execution of SQLSERVER database stored procedures
- Paper reading (47):dtfd-mil: double tier feature interpretation multiple instance learning for histopathology
- 暂停更新公告—行走的皮卡丘
- iMeta | 南农沈其荣团队发布微生物网络分析和可视化R包ggClusterNet
- What is the personal finance interest rate in 2022? How do individuals choose financial products?
- Single fire wire design series article 10: expanding application - single fire switch realizes double control
- console. Log() is an asynchronous operation???
- 一元二次方程到规范场
- Counter attack and defense (1): counter sample generation in image domain
猜你喜欢
![[win10 vs2019 opencv4.6 configuration reference]](/img/51/62fb26123561b65f127304ede834a2.png)
[win10 vs2019 opencv4.6 configuration reference]

Paper reading (47):dtfd-mil: double tier feature interpretation multiple instance learning for histopathology

全局组织结构控制之抢滩登陆

csdn涨薪秘籍之Jenkins集成allure测试报告全套教程

Self supervised learning (SSL)

论文阅读 (47):DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology..

【ESP8266-01s】獲取天氣,城市,北京時間

客服系统搭建教程_宝塔面板下安装使用方式_可对接公众号_支持APP/h5多租户运营...

esp8266-01s 不能连接华为路由器解决方法
![[esp8266-01s] get weather, city, Beijing time](/img/8f/89e6f0d482f482ed462f1ebd53616d.png)
[esp8266-01s] get weather, city, Beijing time
随机推荐
【win10 VS2019 opencv4.6 配置参考】
Goframe framework: basic auth Middleware
Answer 03: why can Smith circle "allow left string and right parallel"?
Call face recognition exception
Goframe framework: fast implementation of service end flow limiting Middleware
. Net cloud native architect training camp (responsibility chain mode) -- learning notes
org.apache.ibatis.binding.BindingException: Invalid bound statement (not found):...
Goframe framework: graceful closing process
How to design a seckill system?
JSON - learning notes (message converter, etc.)
Theory of technology that must be learned by chip manufacturers (4-1) clock technology and reset Technology
Go unit test
Analysis of three battery capacity monitoring schemes
How to quickly obtain and analyze the housing price in your city?
论文阅读 (58):Research and Implementation of Global Path Planning for Unmanned Surface Vehicle Based...
High availability solution practice of mongodb advanced applications (4)
论文阅读 (48):A Library of Optimization Algorithms for Organizational Design
MySQL的 安装、配置、卸载
Detailed explanation of ssl/tls principle and packet capturing
Async/await