当前位置:网站首页>2021-10-26 metagenome analysis (personal note 2)
2021-10-26 metagenome analysis (personal note 2)
2022-06-21 18:40:00 【Illegal account 247188】
Code involves software 、 All databases are absolute paths
source /home/dengqr/miniconda3/bin/activate
conda config --set auto_activate_base true
find *log |wc -l
# The backup data 【 The original data does not move original】
cp -r 00data 00data2
# Data last confirmed
ls -l | grep ".gz$" > 1.txt
View the list of virtual environments
conda env list
Creating a virtual environment , Anti pollution environmental variables , If there is software in Solving environment Steps cannot be installed for hours , You can create a new environment
conda create -n meta
Loading environment
conda activate meta
### Quality assessment fastqc
# = For the specified version ,-c Specify the installation source , Can speed up installation
# -y To agree to install
conda install fastqc=0.11.9 -c bioconda -y
fastqc -v
### Summary of evaluation report multiqc
# notes 1.7 by Python2 Environmental Science ,1.8/9 The new version needs Python3 Environment
conda install multiqc=1.9 -c bioconda -y
multiqc --version
### Quality control process kneaddata
conda install kneaddata=0.7.4 -c bioconda -y
kneaddata --version
trimmomatic -version # 0.39
bowtie2 --version # 2.4.2
db= /home/dengqr/dataset/metagenome/ # It's direct here cd Go to the directory you want to put
# View available databases
kneaddata_database
# Including the human genome bowtie2/bmtagger、 Human transcriptome 、 Ribosome RNA And the mouse genome
# Download the human genome bowtie2 Indexes 3.44 GB
mkdir -p $/home/dengqr/dataset/metagenome/kneaddata/human_genome
kneaddata_database --download human_genome bowtie2 $/home/dengqr/dataset/metagenome/kneaddata/human_genome
# Database download is slow or failed , The appendix contains links to Baidu cloud and domestic backup
Download to $/home/dengqr/home/dengqr/dataset/metagenome/kneaddata/
#mv /home/dengqr/$/home/dengqr/dataset/metagenome/kneaddata/ /home/dengqr/dataset/metagenome/ 【 Completed 】
## 1.2 ( Optional )FastQC Quality assessment
# The software version shall be recorded when using the software for the first time , The method of writing must be clear
fastqc --version # 0.11.8
# time Statistics of running time ,fastqc Quality assessment
# *.gz For raw data ,-t Specify multithreading
time fastqc seq/*.gz -t 2 ▼ #32 Threads time= 27 minute tip: Add... To the top time fastqc -o 01fastqc 00data/*.gz -t 32 multiqc take fastqc Multiple reports of generate a single consolidated report , Methods batch view and compare # Record the software version multiqc --version # 1.5 # Arrangement seq Under the table of contents fastqc The report , Output multiqc_report.html to result/qc Catalog multiqc -d seq/ -o result/qc ▼ multiqc -d 01fastqc/ -o 02fastqc_result/ Look at the right side result/qc Directory multiqc_report.html # Remove the host # index catalogue "/home/dengqr/dataset/metagenome/kneaddata/human_genome/hg37dec_v0.1.1.bt2" kneaddata -h # After de hosting, the two ends do not match —— Sequence rename Check first zcat 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz |head -n 6 zcat 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz |head -n 6 cp 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz 00datatrain\ cp 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz 00datatrain\ zcat 00datatrain/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz |head -n 6 zcat 00datatrain/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz |head -n 6 ( Optional ) Sequence rename , solve NCBI SRA Data double ended ID The problem of duplicate names , See [《MPB: Random macrogenome sequencing data quality control and analysis process and common problems of de hosting 》](https://mp.weixin.qq.com/s/ovL4TwalqZvwx5qWb5fsYA). gunzip 00datatrain/*.gz sed -i '1~4 s/$/\\1/g' 00datatrain/*R2_001.fastq sed -i '1~4 s/$/\\2/g' 00datatrain/*R1_001.fastq # Check again whether the label of the sample is duplicated zcat 00datatrain/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq |head -n 6 zcat 00datatrain/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq |head -n 6 # As a result, compression saves space gzip seq/*.fq # pigz It's a parallel version gzip, Not installed for use gzip pigz seq/*.fq time kneaddata -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \ -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \ -o 03qc -v -t 32 --remove-intermediate-output \ --reorder --bowtie2-options "--very-sensitive --dovetail" \ -db kneaddata/human_genome ### Java Mismatch —— reinstall Java Running environment If an error occurs Unrecognized option: -d64, Install java solve : conda install -c cyclus java-jdk /home/dengqr/miniconda2/bin/trimmomatic type trimmomatic type fastqc "/home/dengqr/miniconda3/share/trimmomatic-0.39-2/" time kneaddata -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \ -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \ -o 03qc -v -t 32 --remove-intermediate-output \ -db kneaddata/human_genome time kneaddata -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \ -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \ -o 03qc -v -t 32 --remove-intermediate-output \ --trimmomatic home/dengqr/miniconda2/bin/trimmomatic \ --reorder --bowtie2-options "--very-sensitive --dovetail" \ -db kneaddata/human_genome time kneaddata -t 40 -v \ -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \ -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \ -o 03qc/ \ --trimmomatic /home/dengqr/miniconda3/share/trimmomatic-0.39-2/ \ --max-memory 80g \ --trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" \ -db kneaddata/human_genome/ \ --bowtie2-options "--very-sensitive --dovetail --reoeder" \ --remove-intermediate-output "/home/dengqr/miniconda2/bin/trimmomatic-0.33.jar" "/home/dengqr/miniconda3/bin/trimmomatic" ▼ time kneaddata \ -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \ -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \ -o temp/qc -v -t 40 --remove-intermediate-output \ --trimmomatic /home/dengqr/miniconda3/share/trimmomatic-0.39-2/ \ --trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" \ --reorder --bowtie2-options "--very-sensitive --dovetail" \ -db kneaddata/human_genome ▼ time kneaddata \ -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \ -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \ -o temp1/qc -v -t 40 --remove-intermediate-output \ --trimmomatic /home/dengqr/miniconda3/share/trimmomatic-0.39-2/ \ --trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" \ --reorder --bowtie2-options "--very-sensitive --dovetail" \ -db kneaddata/human_genome ▼ # use kneaddata Accessory tools kneaddata_read_count_table kneaddata_read_count_table --input temp1/qc \ --output temp1/kneaddata.txt # Filter key result Columns cut -f 1,2,4,12,13 temp1/kneaddata.txt | sed 's/_1_kneaddata//' > temp1/qc/sum.txt cat temp1/qc/sum.txt # Quality control results fastqc temp1/qc/*_1_kneaddata_paired_*.fastq -t 2 -o temp1 multiqc -d temp1/ -o temp1/ fastqc temp1/qc/*R2_001_kneaddata_paired_*.fastq -t 2 -o temp1 multiqc -d temp1/ -o temp1/ OSCC35A_20211015NA_AGGCAGAA_S156_L002_ "/home/dengqr/dataset/metagenome/00data/Control105A_R1_001.fastq.gz" # Multitasking runs in parallel → Remember informed consent # hit will cite Promise to reference parallel software parallel parallel --citation parallel -j 3 --xapply "echo 00data/{1}_R1_001.fastq.gz 00data/{1}_R2_001.fastq.gz" ::: `tail -n+2 metadata.txt|cut -f1` time parallel -j 2 --xapply \ "kneaddata -i 00data/{1}_R1_001.fastq.gz \ -i 00data/{1}_R2_001.fastq.gz \ -o temp/qc -v -t 40 --remove-intermediate-output \ --trimmomatic /home/dengqr/miniconda3/share/trimmomatic-0.39-2/ \ --trimmomatic-options 'SLIDINGWINDOW:4:20 MINLEN:50' \ --reorder --bowtie2-options '--very-sensitive --dovetail' \ -db kneaddata/human_genome" ::: `tail -n+2 metadata.txt|cut -f1` ## # install conda-pack conda install -c conda-forge conda-pack #conda The installation was unsuccessful and available pip install # pip conda-pack # Package and export in the installed environment conda pack -n humann2 -o humann2.tar.gz # download wget -c http://210.75.224.110/db/humann2/humann2.tar.gz # Create a new folder to store humann2 Environmental Science "/home/dengqr/miniconda3/envs/" mkdir -p /home/dengqr/miniconda3/envs/humann2 tar -xzf humann2.tar.gz -C /home/dengqr/miniconda3/envs/humann2 # Activate the environment source /home/dengqr/miniconda3/envs/humann2/bin/activate Test whether the process is available time=20 minute humann2_test humann2 --version Switch to cd mi 3/en /hu humann2_databases --download utility_mapping full /home/dengqr/miniconda3/envs/humann2 # Microbial pan genome 5.37 GB 【 To complete the path 】【 Create a folder manually 】 humann2_databases --download chocophlan full /home/dengqr/miniconda3/envs/humann2/chocophlan # Functional genes diamond Indexes 10.3 GB 【 Pay attention to the folder location 】 humann2_databases --download uniref uniref90_diamond /home/dengqr/miniconda3/envs/humann2 # Alternate Download 【 Pay attention to switching 】 → Select Baidu cloud https://github.com/YongxinLiu/MicrobiomeStatPlot/blob/master/Data/BigDataDownlaodList.md Pull it in /home/dengqr/miniconda3/envs/humann2 tar xvzf uniref90_annotated_1_1.tar.gz # Check whether the data frame is configured humann2_config mkdir temp/concat # The two ends are merged into a single file for i in `tail -n+2 metadata.txt|cut -f1`;do cat temp/qc/${i}_1_kneaddata_paired_?.fastq \ > temp/concat/${i}.fq; done # Check the number and size of samples ls -sh temp/concat/*.fq 边栏推荐
- Day20Qt多个窗体的切换思路2021-10-31
- AWS device shadow usage
- Node output mode
- Module import method of node
- 剑指Offer 56. 删除链表的重复节点
- Day16QtQLabel2021-10-22
- 最佳网络数据抓包工具mitmproxy
- Disclose the design idea of MVVM framework supporting Baidu search, feed and applet, and San core personnel have made great efforts to build it
- With mitmdump, don't throw it away, Charles
- 力扣160. 相交链表
猜你喜欢
随机推荐
基于mitmproxy的录制回放接口测试工具
LeetCode 1108 IP地址无效化[暴力] HERODING的LeetCode之路
雷军的千亿失误?
Node modular management
2022 high altitude installation, maintenance and removal work license question bank and simulated examination
协同过滤(Collaborative Filtering)
Character processing of node
力扣239. 滑动窗口最大值
epoll+threadpool高并发网络IO模型的实现
最佳网络数据抓包工具mitmproxy
力扣461.汉明距离
Typescript的通用类型检查
【微服务|Nacos】快速实现nacos的配置中心功能,并完成配置更新和版本迭代
Day15Qt中字符串的常用操作2021-10-20
字節跳動提出輕量級高效新型網絡MoCoViT,在分類、檢測等CV任務上性能優於GhostNet、MobileNetV3!
Byte Jump propose un nouveau type de réseau léger et efficace, mobovit, qui surpasse GhostNet et mobilenetv3 dans la classification, la détection et d'autres tâches CV!
使用tidevice启动WDA
In the new season, China Super League and Guoan are moving forward amid thorns
Node的模块导入方式
左右两侧垂直带序号的时间轴








