首页 » Web前端 » phpsection轮回技巧_9个模块40余款软件老司机辣评|16S分析流程软件数据库合集

phpsection轮回技巧_9个模块40余款软件老司机辣评|16S分析流程软件数据库合集

访客 2024-12-02 0

扫一扫用手机浏览

文章目录 [+]

16S测序,也即是扩增子测序,由于其“短平快”、“物美价廉”的特点,目前可谓是科研事情者们最为喜闻乐见的高通量测序类型了。

由于其数据量很小,越来越多没有HPC的宝宝们都可以用小通量的做事器乃至是好的条记本来自己作数据剖析了。

phpsection轮回技巧_9个模块40余款软件老司机辣评|16S分析流程软件数据库合集

也因此,扩增子的软件层出不穷,从集成的傻瓜式剖析软件,到各种办理特定小问题的软件和小工具,各类各样上百种。
这里就给大家盘点一些主流的软件和数据库,并稍作点评,欢迎补充、示正。

phpsection轮回技巧_9个模块40余款软件老司机辣评|16S分析流程软件数据库合集
(图片来自网络侵删)

01

流程集成

1、QIIME

QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME includes demultiplexing and quality filtering, OTU picking, taxonomic assignment, and phylogenetic reconstruction, and diversity analyses and visualizations.

最新版本:

QIIME2(2018年1月1日后QIIME1将不再支持和更新)

参考文献:PMID:20383131

下载地址:

https://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc

官网地址:

QIIME2: https://docs.qiime2.org/2017.8/

QIIME1: http://qiime.org/

流程示例地址:

https://docs.qiime2.org/2017.8/tutorials/moving-pictures/

2、Mothur

Mothur is currently the most cited bioinformatics tool for analyzing 16S rRNA gene sequences. Step inside the wiki and user forum and learn how you can use mothur to process data generated by Sanger, PacBio, IonTorrent, 454, and Illumina (MiSeq/HiSeq).

最新版本:Version 1.39.5

参考文献:PMID:19801464

下载地址:

https://github.com/mothur/mothur/releases/tag/v1.39.5

官网地址:

https://www.mothur.org/

流程示例地址:

https://www.mothur.org/wiki/MiSeq_SOP

3、Usearch

USEARCH is a unique sequence analysis tool with thousands of users world-wide, which combines many different algorithms into a single package with outstanding documentation and support.

最新版本:Version 10

参考文献:PMID:20709691

下载地址:

http://drive5.com/usearch/download.html

官网地址:

http://drive5.com/usearch/

4、FunGene

Functional Gene Pipeline Scripts contains a set of python scripts that allows to run one or more individual tools offered by RDP FunGene Pipeline. These tools are offered a modular fashion allowing researchers to choose the appropriate subset based on their needs.

最新版本:Version 9.3

参考文献:PMID:24101916

官网地址:

http://fungene.cme.msu.edu/

流程示例地址:

http://fungene.cme.msu.edu/FunGenePipeline/

5、SILVAngs

SILVAngs is a data analysis service for ribosomal RNA gene (rDNA) amplicon reads from high-throughput sequencing approaches based on an automatic software pipeline. It uses the SILVA rDNA databases, taxonomies, and alignments as a reference. It facilitates the classification of rDNA reads and provides a wealth of results (tables, graphs and sequence files) for download.

最新版本:Version 9.3

参考文献:PMID:23193283

官网地址:

https://www.arb-silva.de/ngs/

流程示例地址:

https://www.arbsilva.de/ngs/#demo:

老司机点评:在扩增子数据剖析中,剖析点相对成熟,软件繁多,盘点下来不止百种。
逐一安装又摧残浪费蹂躏资源又摧残浪费蹂躏韶光,打包了多种软件的流程式软件备受青睐。
这个中最为有名的便是QIIME和Mothur, 基本上可能用到的剖析点大多都打包进去了。
老牌聚类软件usearch不落人后,也将数据前处理、OTU聚类、物种注释、多样性剖析等一并打包进去,虽则不像qiime中花样繁多,基本上的剖析也够了,唯一可惜的是64位版本收费。
一些数据库如RDP和SILVA等也纷纭动作,如SILVAngs的在线剖析平台,FunGene的功能基因剖析流程,RDP自己的rdpipeline(http://pyro.cme.msu.edu/)等,这里不一一列举。

02

数据质控

1、FastQC

A quality control tool for high throughput sequence data.

最新版本:Version 0.11.5

参考文献:PMID:22312429

下载地址:

https://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc

官网地址:

https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

2、Trimmomatic

A flexible trimmer for Illumina Sequence Data

最新版本:Version 0.36

参考文献:PMID:24695404

下载地址:

http://www.usadellab.org/cms/?page=trimmomatic

3、QIIME split_libraries_fastq.py

软件地址:http://qiime.org/

命令利用解释:

http://qiime.org/scripts/split_libraries_fastq.html

老司机点评:扩增子的数据质控在剖析的好几个地方都会用到,从原始数据下机,先要经历质控的环节,序列首先截掉接头、barcode、引物,做个质量评价和过滤,根据 PE reads的overlap拼接在一起,然后还要经历拼接后序列的质控,去除低质量、读N、过段序列,然后才能用于后续的聚类和注释剖析。
这里把质控的部分都放到一块来写。
FastQC这个软件在《NGS数据格式蜕变简史》里面先容过,基本上是原始数据质控的标配了。
Trimmomatic是一个划动窗口的过滤和截断软件,对illumina这种序列尾部质量显著低落的很有用。
拼接后序列的过滤在QIIME中有自编脚本,可调用实行。

03

Reads拼接

1、FLASH

A very fast and accurate software tool to merge paired-end reads from NGS experiments.

最新版本:Version 1.2.11

参考文献:PMID:21903629

下载地址:

https://sourceforge.net/projects/flashpage/files/

官网地址:

https://ccb.jhu.edu/software/FLASH/

2、PEAR

An ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.

最新版本:Version 0.9.8

参考文献:PMID: 24142950

下载地址:

https://sco.hits.org/exelixis/web/software/pear/downloads.html

官网地址:

https://sco.hits.org/exelixis/web/software/pear/

3、PANDAseq

PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing.

最新版本:Version 2.11

参考文献:PMID:22333067

下载地址:

https://github.com/neufeld/pandaseq/releases/tag/v2.11

官网地址:

http://neufeldserver.uwaterloo.ca/~apmasell/pandaseq_man1.html

4、fastq-jion

Command-line tools for processing biological sequencing data

参考文献:

Command-line tools for processing biological sequencing data

官网地址:

https://github.com/ExpressionAnalysis/ea-utils/blob/wiki/FastqJoin.md

老司机点评:目前最为主流的拼接软件仍为flash,但如果扩增片段过长或过短时,flash拼接效果可能不尽如人意,针对这些情形用pear或pandaseq拼接可能会有惊喜。

fastq-join是打包在qiime中的拼接软件,在qiime中运行join_paired_ends.py默认调用fastq-join,可选其他软件如SeqPrep(https://github.com/jstjohn/SeqPrep),后者目前还比较少在文献中涌现,运算速率上这两个软件还是不错的。

04

嵌合体去除

1、DECIPHER

DECIPHER is a software toolset that can be used for deciphering and managing biological sequences efficiently using the R programming language. DECIPHER's Find Chimeras web tool can be used to uncover chimeras hidden in 16S rRNA sequences.

最新版本:Version 2.2.0

参考文献:PMID:22101057

下载地址:

http://decipher.cee.wisc.edu/Download.html

官网地址:

http://decipher.cee.wisc.edu/index.html

2、ChimeraSlayer

ChimeraSlayer uses BLAST to identify potential chimera parents and computes the optimal branching alignment of the query against two parents. An input with the pynast aligned representative sequences is suggested.

最新版本:Version 2.2.0

参考文献:PMID:21212162

下载地址:

https://sourceforge.net/projects/microbiomeutil/files/

官网地址:

http://microbiomeutil.sourceforge.net/#A_CS

3、VSEARCH

VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.

最新版本:Version 2.4.4

参考文献:PMID: 27781170

下载地址:

https://github.com/torognes/vsearch/releases

官网地址:

https://github.com/torognes/vsearch

4、UCHIME2

UCHIME2 and UCHIME are algorithms for detecting chimeric sequences.

最新版本:Version 4.2

参考文献:

doi: https://doi.org/10.1101/074252

下载地址:

http://drive5.com/uchime/uchime_download.html

官网地址:

http://drive5.com/usearch/manual/uchime_algo.html

5、usearch61

usearch61 performs both de novo (abundance based) chimera and reference based detection. With usearch61, unclustered sequences should be used as input rather than a representative sequence set, as these sequences need to be clustered to get abundance data.

参考文献:PMID:20709691

下载地址:

http://drive5.com/usearch/download.html

官网地址:

http://drive5.com/usearch/usearch_docs.html

老司机点评:嵌合体的去除紧张是de novo和基于参考库两种方法,结合了两种方法的usearch61被打包在qiime中(identify_chimeric_seqs.py),是目前主流的方法之一。
但是把稳,上面说到过,usearch的64位版本是收费的!
前几年专门用uchime去嵌合体也运用较多,但现在官网上已指出不推举单独安装uchime,推举直接下载usearch。
VSEARCH是作为替代usearch的开源软件推出的,与usearch运算速率不分高下,是mothur中嵌合体去除和聚类的推举行法,建议大家可以试试。
ChimeraSlayer运算速率较慢,DECIPHER已经在uchime官网里被吊打,这里不做推举。

05

OTU聚类

1、UCLUST

UCLUST creates “seeds” of sequences which generate clusters based on percent identity. Uclust_ref, as uclust, but takes a reference database to use as seeds. New clusters can be toggled on or off.

参考文献:PMID:20709691

下载地址:

http://www.drive5.com/uclust/downloads1_2_22q.html

官网地址:

https://www.drive5.com/usearch/manual/uclust_algo.html

2、Uparse

UPARSE is a method for generating clusters (OTUs) from next-generation sequencing reads of marker genes such as 16S rRNA, the fungal ITS region and the COI gene.

参考文献:PMID:23955772

下载地址:

http://www.drive5.com/usearch/manual/cmd_cluster_otus.html

官网地址:

https://www.drive5.com/uparse/

3、CD-HIT

CD-HIT is a very widely used clustering program, which applies a “longest-sequence-first list removal algorithm” to cluster sequences.

最新版本:Version 4.6.8

参考文献:PMID:23060610

下载地址:

https://github.com/weizhongli/cdhit/releases

官网地址:

http://weizhongli-lab.org/cd-hit/

4、Mothur

For the Mothur method, the clustering algorithm may be specified as nearest-neighbor, furthest-neighbor, or average-neighbor. The default algorithm is furthest-neighbor.

详见第一部分先容

5、Oclust

A pipeline for clustering long 16S rRNA sequencing reads, or any sequences, into OTUs.

参考文献:PMID: 26434730

下载地址:

https://github.com/oscar-franzen/oclust/

官网地址:

https://omictools.com/oclust-tool

老司机点评:OTUs聚类的方法有非常多,紧张分为启示式算法和层次聚类算法两种,前者有uparse、uclust、CD-HIT等,后者如mothur和oclust等。
从运用情形来看,目前主流上的聚类软件还是以uparse、uclust、mothur几种为主。
上面提到的软件,大多都有打包在qiime中,默认聚类软件是uclust(pick_otus.py)。
末了列出的Oclust主打基于三代Pacbio长序列的聚类,鉴于目前二代测序独领风骚的局势,目前运用尚且较少。

06

物种注释

1、Greengenes

A 16S rRNA gene database addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies.

最新版本:Version 13.5

参考文献:PMID: 16820507

下载地址:

http://greengenes.secondgenome.com/downloads/database/13_5

官网地址:

http://greengenes.secondgenome.com/

2、Silva

SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).

最新版本:SILVA 128

参考文献:PMID:23193283

下载地址:

https://www.arbsilva.de/documentation/release-128/

官网地址:

https://www.arb-silva.de/

3、RDP

RDP provides quality-controlled, aligned and annotated Bacterial and Archaeal 16S rRNA sequences, and Fungal 28S rRNA sequences, and a suite of analysis tools to the scientific community.

最新版本:Version 11.5

参考文献:PMID: 24288368

下载地址:

http://rdp.cme.msu.edu/misc/rel10info.jsp

官网地址:

http://rdp.cme.msu.edu/index.jsp

4、Unite

UNITE is a user-friendly Nordic ITS Ectomycorrhiza Database designed to provide a stable and reliable platform for sequence-borne identification of ectomycorrhizal asco- and basidiomycetes, including only high-quality sequences of well identified fungi.

最新版本:Version 7.2

参考文献:PMID:15869663

下载地址:https://unite.ut.ee/repository.php

官网地址:https://unite.ut.ee/

5、FunGene

Functional Gene Pipeline Scripts contains a set of python scripts that allows to run one or more individual tools offered by RDP FunGene Pipeline. These tools are offered a modular fashion allowing researchers to choose the appropriate subset based on their needs.

最新版本:Version 9.3

参考文献:PMID: 24101916

官网地址:http://fungene.cme.msu.edu/

老司机点评:扩增子剖析中,16S序列注释以Greegene、Silva和 RDP为主,早期Greegene用的最多,当然这与打包在QIIME中密不可分,2013年5月后就一贯没有更新,做剖析的童鞋纷纭转去用Silva注释,Silva基本上每年还是都有更新的,好玩的是,后面我们会讲到两个比较有名的功能预测软件,PICRUSt须要与Greengene合营利用,Tax4fun推举与Silva合营利用。
其余,真菌ITS注释紧张还是运用Unite数据库。
功能基因早期用NT库注释效果惨不忍睹,近几年Fungene不断完善,基本上是功能基因扩增子测序物种注释的不二选择了。

07

序列比对

1、PyNAST

PyNAST is a reimplementation of the NAST sequence aligner, which has become a popular tool for adding new 16s rRNA sequences to existing 16s rRNA alignments.

最新版本:PyNAST 1.0

参考文献:PMID: 19914921

下载地址:

http://biocore.github.io/pynast/install.html

官网地址:

http://biocore.github.io/pynast/

2、Muscle

MUSCLE is an alignment method which stands for MUltiple Sequence Comparison by Log-Expectation. On average, MUSCLE is cited by ten new papers every day.

最新版本:Version 3.8.31

参考文献:PMID:15034147

下载地址:

http://www.drive5.com/muscle/downloads.htm

官网地址:

http://www.drive5.com/muscle/

3、Mafft

MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.

最新版本:Version 7.310

参考文献:PMID: 12136088

下载地址:

http://mafft.cbrc.jp/alignment/software/#Download%20and%20Installation

官网地址:

http://mafft.cbrc.jp/alignment/software/

4、Infernal

Infernal (\"大众INFERence of RNA ALignment\"大众) is for searching DNA sequence databases for RNA structure and sequence similarities.

最新版本:Version 1.1.2

参考文献:PMID: 24008419

下载地址:

http://eddylab.org/infernal/#Downloads

官网地址:

http://eddylab.org/infernal/

老司机点评:几款序列比对软件都打包在了QIIME中,调用 即可得到。
几款软件中,Pynast和Infernal类似,都是基于参考库比对,但Infernal运行速率要慢得多,运用也少很多。
Muscle和Mafft都是不依赖于参考库的全局比对软件,muscle号称每天产出十篇文章,虽然这个数字不但是微生物组的运用,但也不可谓不广泛,mafft与之类似,有测评软件显示mafft比对准确性高,但速率上没什么上风,目前对付没有好的参考库的序列比对时(如功能基因等),这俩方法都有运用。

08

功能预测

1、PICRUSt

PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.

最新版本:PICRUSt 1.1.2

参考文献:PMID:23975157

下载地址:

https://github.com/picrust/picrust

官网地址:

http://picrust.github.io/picrust/

2、Tax4Fun

Tax4Fun is a open-source R package that predicts the functional capabilities of microbial communities based on 16S datasets. Tax4Fun is applicable to output as obtained from the SILVAngs web server or the application of QIIME against the SILVA database.

参考文献:PMID:25957349

下载地址:

http://tax4fun.gobics.de/#Download

官网地址:http://tax4fun.gobics.de/

3、FAPROTAX

FAPROTAX is a database that maps prokaryotic clades (e.g. genera or species) to established metabolic or other ecologically relevant functions, using the current literature on cultured strains.

最新版本:FAPROTAX 1.1

参考文献:PMID:28812567

下载地址:

http://www.zoology.ubc.ca/louca/FAPROTAX/lib/php/index.php?section=Download

官网地址:

http://www.zoology.ubc.ca/louca/FAPROTAX/lib/php/index.php

4、FUNGuild

An open annotation tool for parsing fungal community datasets by ecological guild.

参考文献:

https://doi.org/10.1016/j.funeco.2015.06.006

下载地址:

https://github.com/UMNFuN/FUNGuild.git

官网地址:

http://www.stbates.org/guilds/app.php

老司机点评:由于扩增子本身是对物种层面的剖析,如能实现对其功能的预测,能办理的科学问题就多了。
目前来说,功能预测软件仍以PICRUSt运用最多,但随着大家对古菌、真菌等多种非细菌群体的关注和注释数据库的更迭,其他软件运用也变多了。
比如,上面我们说到,随着注释数据库的变更,Tax4Fun运用增多;专注于于环境样本的生物地球化学循环过程的FAPROTAX,真菌功能预测的FUNGuild等。

09

常用作图及统计软件

1、根本作图类

R ggplot2:

https://cran.rproject.org/web/packages/ggplot2/

Perl SVG: https://metacpan.org/pod/SVG

Python matplotlib: https://matplotlib.org/

QIIME: http://qiime.org/

2、物种统计及可视化

STAMP: kiwi.cs.dal.ca/Software/STAMP

LefSE:

http://huttenhower.sph.harvard.edu/galaxy/

Metastat: http://clovr.org/docs/metastats/

QIIME: http://qiime.org/

3、多样性剖析

QIIME:http://qiime.org/

Mothur: https://www.mothur.org/

Usearch: http://drive5.com/usearch/

4、系统发生树可视化

GraPhlAn:http://huttenhower.org/galaxy/

iTOL: https://itol.embl.de/

5、环境因子剖析

R vegan:

https://cran.r-project.org/web/packages/vegan/

Canoco5: http://www.canoco5.com/

6、网络互作剖析

Cytoscape: http://www.cytoscape.org/

Gephi:https://gephi.org/

老司机点评:这部分给大家列了一些常见的软件,一样平常来说,如果得到了物种注释后的otu_table和序列比对后构建的发生树rep_phylo.tre,根本的剖析部分就已经做完了,后续剖析紧张是基于物种统计及展示、组间比较(多样性--alpha_div,群落构造--beta_div等)、关联剖析(网络互作、环境因子等),根据需求可能还会有功能预测剖析等,结合其他验证类实验阐明微生物多样性变革干系联的科学问题。

/End.

欢迎转发到朋友圈!

相关文章

语言本体调查,探寻语言发展的奥秘

语言是人类社会不可或缺的交流工具,它承载着丰富的文化内涵和智慧结晶。为了更好地了解和把握语言发展的规律,语言学家们开展了大量的语言...

Web前端 2024-12-29 阅读1 评论0

语言栏启动项,智能时代的沟通利器

随着科技的飞速发展,人工智能已经逐渐渗透到我们的日常生活中。而在这其中,语言栏启动项作为一种新兴的技术,正逐渐成为人们沟通的得力助...

Web前端 2024-12-29 阅读1 评论0

语言治疗,呵护言语,重拾沟通之美

语言治疗,简称“语疗”,是一门专注于改善患者言语、语言和沟通障碍的医学领域。在我国,随着社会的发展和生活节奏的加快,越来越多的人受...

Web前端 2024-12-29 阅读1 评论0

语言播报技术,未来沟通的桥梁

随着科技的飞速发展,人工智能逐渐成为我们生活中不可或缺的一部分。在众多人工智能应用中,语言播报技术以其独特的魅力,正逐渐改变着我们...

Web前端 2024-12-29 阅读0 评论0