当前位置:网站首页>K-nucleotide frequencies (KNF) or k-mer frequencies
K-nucleotide frequencies (KNF) or k-mer frequencies
2022-07-23 12:21:00 【Windy Street】
K Nucleotide frequency (KNF,k-nucleotide frequencies) or K-mer frequency
KNF Describes the existence of k Frequencies of all possible polynucleotides of nucleotides . If k=2, Then the calculated frequency is dinucleotide ( namely AA、AT、AG、AC、……TT), common 42=16 Kind of ; If k=3, Then the calculated frequency is dinucleotide ( namely AAA、AAT、AAG、AAC、……TTT), common 43=64 Kind of ; And so on .
K-mer The frequency method is the same as above .
Method 1 :
# Extract nucleotide type ( Permutation and combination )
from itertools import product
def nucleotide_type(k):
z = []
for i in product('ACGT', repeat = k): # The cartesian product ( There is a sampling arrangement to put back )
z.append(''.join(i)) # hold ('A,A,A') Into a (AAA) form
return z
# Number statistics of base pairs
def char_count(sequence,num,k):
n = 0
char = nucleotide_type(k) # Call the extract nucleotide type module
for i in range(len(sequence)-k+1): # Count the number of corresponding characters
if sequence[i:i+k] == char[num]:
n += 1
return n/(len(sequence)-k+1) # Return frequency ( Number of occurrences / The total number of times ) The total number of times = Sequence length - Take a few bases +1
def feature(seq,k):
list = []
for i in range(4**k): # Take value according to the number of nucleotide types ( Two 、 3、 ... and 、 Tetranucleotides cycle separately 16、64、256 Time )
list.append(char_count(seq,i,k))
return (list)
# Call feature code line by line
def Sequence_replacement(sequ,k):
sequen = [None]*len(sequ)
for i in range(len(sequ)):
s = sequ[i]
sequen[i] = feature(s,k)
return sequen
# Call with specific data
feature_knf = Sequence_replacement(data,k) #data For specific data ,k Set the value of as needed
Method 2 :
# First, divide the data into K-mer form
def Kmers_funct(seq,x):
X = [None]*len(seq) # If the data has only one sequence , This definition is not necessary
for i in range(len(seq)): # If the data has only one sequence , This cycle is not needed
a = seq[i]
t=0
l=[]
for index in range(len(a)-x+1):
t=a[index:index+x]
if (len(t))==x:
l.append(t)
X[i] = l
return X # See the specific return needs , Or directly :return X
# Extract nucleotide type ( Permutation and combination )
from itertools import product
def nucleotide_type(k):
z = []
for i in product('ACGU', repeat = k): # The cartesian product ( There is a sampling arrangement to put back )
z.append(''.join(i)) # hold ('A,A,A') Into a (AAA) form
return z
# Definition K-mer Frequency module
def Kmers_frequency(seq,x):
X = []
char = nucleotide_type(x) # Call extract nucleotide type ( Permutation and combination ) Code
for i in range(len(seq)):
s = seq[i]
frequence = []
for a in char:
number = s.count(a) # Count the number of characters in turn
char_frequence = number/(len(s)-k+1) # Calculate the frequency
frequence.append(char_frequence)
X.append(frequence)
return X
# call K-mer Module code
feature_kmer = Kmers_frequency(data,k)
# use k-mer The generated data calls K-mer Frequency module
feature_kmer_frequency = Kmers_frequency(feature_kmer,k)
边栏推荐
- 怎么建立数据分析思维
- Matplotlib Usage Summary
- 保存实质审查请求书出现Schema校验失败的解决方法
- Summary of common mathematical knowledge
- After the VR project of ue4.24 is packaged, the handle controller does not appear
- Eigen multi version library installation
- Interpretation of the paper: a convolutional neural network for identifying N6 methyladenine sites in rice genome using dinucleotide one hot encoder
- #under指令
- 笔记 | 百度飞浆AI达人创造营:深度学习模型训练和关键参数调优详解
- High level API of propeller realizes image rain removal
猜你喜欢

The data set needed to generate yolov3 from the existing voc207 data set, and the places that need to be modified to officially start the debugging program

Deep learning neural network

机器学习/深度学习必备数学知识

利用google or-tools 求解逻辑难题:斑马问题

Notes | Baidu flying plasma AI talent Creation Camp: How did amazing ideas come into being?

google or-tools的复杂排班程序深度解读

单片机学习笔记5--STM32时钟系统(基于百问网STM32F103系列教程)

ARM架构与编程1--LED闪烁(基于百问网ARM架构与编程教程视频)

Using or tools to solve path planning problem (VRP)

论文解读:《基于注意力的多标签神经网络用于12种广泛存在的RNA修饰的综合预测和解释》
随机推荐
使用PyOD来进行异常值检测
《数据中心白皮书 2022》“东数西算”下数据中心高性能计算的六大趋势八大技术
google or-tools的复杂排班程序深度解读
Eigen多版本库安装
单片机学习笔记4--GPIO(基于百问网STM32F103系列教程)
High level API of propeller to realize face key point detection
常用数学知识汇总
利用or-tools来求解带容量限制的路径规划问题(CVRP)
单片机学习笔记5--STM32时钟系统(基于百问网STM32F103系列教程)
利用google or-tools 求解数独难题
for循环
Yolov3关键代码解读
The data set needed to generate yolov3 from the existing voc207 data set, and the places that need to be modified to officially start the debugging program
Solution to schema verification failure in saving substantive examination request
笔记 | 百度飞浆AI达人创造营:让人拍案叫绝的创意都是如何诞生的?
ARM架构与编程1--LED闪烁(基于百问网ARM架构与编程教程视频)
Practical convolution correlation trick
论文解读:《开发和验证深度学习系统对黄斑裂孔的病因进行分类并预测解剖结果》
Introduction and practice of Google or tools for linear programming
从已有VOC2007数据集生成yolov3所需要的数据集,以及正式开始调试程序需要修改的地方