Introduction

The two most commonly used methods for microbial identification and genotyping are targeted marker gene (amplicon) sequencing (e.g., prokaryotic 16S rRNA gene, eukaryotic 18S rRNA gene, and fungal ITS region) and shotgun metagenomic sequencing. The most used target gene for bacterial identification is the highly conserved 16S rRNA (or 16S rDNA), which is the gold standard in microbial genotyping [1; 2], and has been successfully used in characterizing microbial communities associated with various milieus including soil, water sources and the human gut [3].

The Bioinformatics Core Research Facility provides a 16S rRNA sequencing data analysis pipeline facilitated by snakemake workflow management system [4]. The pipeline, which is centered at DADA2 [5] and phyloseq [6] R packages in conjunction with figaro [7], mafft [8] and fasttree [9; 10] packages, can deliver

  • taxonomical analysis and species identification,
  • phylogenetic tree construction,
  • alpha and beta diversity analyses,
  • differential abundance analysis, etc.

In combination with various custom R scripts, this pipeline is highly versatile, and custom graphs and statistical analysis, such as univariate and multivariate analyses for single or selected taxa, canonical correspondence analysis, table and heatmap of differentially abundant taxa from interesting contrasts, can also be provided.

In addition to the above, we also assist in the writing of methods section and addressing follow-up questions regarding the analysis. If our work contributes sufficiently to warrant co-authorships, we help prepare figures of publication-quality for manuscripts.

Example Figures

Some example figures are shown below from analyzing publicly available 16S sequencing data previously published by Martínez et al. [11] from Dr. Amanda Ramer-Tait lab at UNL’s Department of Food Science and Technology.

Relative abundance plot

Relative abundance of the top 500 most abundant taxa in samples is shown below. Relative abundance is more suited to draw comparison between samples, conditions, even time points, than absolute abundance. More detailed ranks, such as order, family or species, can also be plotted if needed.

Alpha diversity plot

Alpha diversity refers to the diversity of species within a particular area or ecosystem, expressed by the number of species (species richness) present there. There are many metrics to measure alpha diversity, including Observed, Chao1, ACE, Shannon, Simpson, InvSimpson, Fisher. Google “alpha diversity metrics” for more info.

Below shows the alpha diversity of samples using Chao1 metric.

Beta diversity plot

Beta diversity refers to the difference in diversity of species between two or more ecosystems in an area, expressed as the total number of species that are unique to each of the ecosystems being compared. Choices of Bray Crutis distance and Weighted unifrac distance are available for comparision, choose one that suits your study the best.

PCA

The PCA plot shows the first two principal components of transformed (vsd) count data for all samples. PCA is a quick and easy way to check if any of the samples are outliers, and to check which variable is the major force in separating the samples if multiple sources of variance exist. A 3D PCA figure can also be generated for your own exploration, but high resolution 2D PCA figures are better suited for publication.

Top differentiated taxa

If there are contrasts (comparisons) that are of particular interest, differentially abundant taxa can be screened out. The following table shows the significantly differentiated taxa between groups AB-Parent and BA-Parent.

Significantly differentiated families

The families and phyla of above mentioned taxa are plotted below for an overview. Other ranks can also be plotted if needed.

Heatmap of top differentiated taxa

Heatmap of above mentioned taxa. This is a great way to visualize the clustering of samples and taxa using normalized count data.

Bibliography

This page is created using rmarkdown [12] and knitr [13; 14]. DT [15] was used to create the interactive data table. Heatmaps were made using pheatmap [16]. Data wrangling and formatting were done mostly with tidyverse [17] with the support of data.table [18]. Figures were made with ggplot2 [19], gridExtra [20], RColorBrewer [21] and rgl [22].

Citations were made with knitcitations [23].

This page was generated on 2021-08-18 15:58:30.

[1] G. C. Baker, J. J. Smith, and D. A. Cowan. “Review and re-analysis of domain-specific 16S primers”. In: J Microbiol Methods 55.3 (2003), pp. 541-55. ISSN: 0167-7012 (Print) 0167-7012 (Linking). DOI: 10.1016/j.mimet.2003.08.009. URL: https://www.ncbi.nlm.nih.gov/pubmed/14607398.

[2] J. Pel, A. Leung, W. W. Y. Choi, M. Despotovic, et al. “Rapid and highly-specific generation of targeted DNA sequencing libraries enabled by linking capture probes with universal primers”. In: PLoS One 13.12 (2018), p. e0208283. ISSN: 1932-6203 (Electronic) 1932-6203 (Linking). DOI: 10.1371/journal.pone.0208283. URL: https://www.ncbi.nlm.nih.gov/pubmed/30517195.

[3] R. Bharti and D. G. Grimm. “Current challenges and best-practice protocols for microbiome analysis”. In: Brief Bioinform 22.1 (2021), pp. 178-193. ISSN: 1477-4054 (Electronic) 1467-5463 (Linking). DOI: 10.1093/bib/bbz155. URL: https://www.ncbi.nlm.nih.gov/pubmed/31848574.

[4] F. Mölder, K. Jablonski, B. Letcher, M. Hall, et al. “Sustainable data analysis with Snakemake [version 1; peer review: 1 approved, 1 approved with reservations]”. In: F1000Research 10.33 (2021). DOI: 10.12688/f1000research.29032.1. URL: http://openr.es/or2.

[5] B. J. Callahan, P. J. McMurdie, M. J. Rosen, A. W. Han, et al. “DADA2: High-resolution sample inference from Illumina amplicon data”. In: Nature Methods 13 (2016), pp. 581-583. DOI: 10.1038/nmeth.3869.

[6] P. J. McMurdie and S. Holmes. “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data”. In: PLoS ONE 8.4 (2013), p. e61217. URL: http://dx.plos.org/10.1371/journal.pone.0061217.

[7] M. M. Weinstein, A. Prem, M. Jin, S. Tang, et al. “FIGARO: An efficient and objective tool for optimizing microbiome rRNA gene trimming parameters”. In: bioRxiv (2019), p. 610394. DOI: 10.1101/610394. URL: https://www.biorxiv.org/content/biorxiv/early/2019/04/16/610394.full.pdf.

[8] K. Katoh and D. M. Standley. “MAFFT multiple sequence alignment software version 7: improvements in performance and usability”. In: Mol Biol Evol 30.4 (2013), pp. 772-80. ISSN: 1537-1719 (Electronic) 0737-4038 (Linking). DOI: 10.1093/molbev/mst010. URL: https://www.ncbi.nlm.nih.gov/pubmed/23329690.

[9] M. N. Price, P. S. Dehal, and A. P. Arkin. “FastTree: computing large minimum evolution trees with profiles instead of a distance matrix”. In: Mol Biol Evol 26.7 (2009), pp. 1641-50. ISSN: 1537-1719 (Electronic) 0737-4038 (Linking). DOI: 10.1093/molbev/msp077. URL: https://www.ncbi.nlm.nih.gov/pubmed/19377059.

[10] M. N. Price, P. S. Dehal, and A. P. Arkin. “FastTree 2–approximately maximum-likelihood trees for large alignments”. In: PLoS One 5.3 (2010), p. e9490. ISSN: 1932-6203 (Electronic) 1932-6203 (Linking). DOI: 10.1371/journal.pone.0009490. URL: https://www.ncbi.nlm.nih.gov/pubmed/20224823.

[11] I. Martinez, M. X. Maldonado-Gomez, J. C. Gomes-Neto, H. Kittana, et al. “Experimental evaluation of the importance of colonization history in early-life gut microbiota assembly”. In: Elife 7 (2018). ISSN: 2050-084X (Electronic) 2050-084X (Linking). DOI: 10.7554/eLife.36521. URL: https://www.ncbi.nlm.nih.gov/pubmed/30226190.

[12] J. Allaire, Y. Xie, J. McPherson, J. Luraschi, et al. rmarkdown: Dynamic Documents for R. R package version 2.9. 2021. URL: https://github.com/rstudio/rmarkdown.

[13] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.33. 2021. URL: https://yihui.org/knitr/.

[14] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.

[15] Y. Xie, J. Cheng, and X. Tan. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.18. 2021. URL: https://CRAN.R-project.org/package=DT.

[16] R. Kolde. pheatmap: Pretty Heatmaps. R package version 1.0.12. 2019. URL: https://CRAN.R-project.org/package=pheatmap.

[17] H. Wickham, M. Averick, J. Bryan, W. Chang, et al. “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43 (2019), p. 1686. DOI: 10.21105/joss.01686.

[18] M. Dowle and A. Srinivasan. data.table: Extension of ‘data.frame’. R package version 1.14.0. 2021. URL: https://CRAN.R-project.org/package=data.table.

[19] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. URL: https://ggplot2.tidyverse.org.

[20] B. Auguie. gridExtra: Miscellaneous Functions for “Grid” Graphics. R package version 2.3. 2017. URL: https://CRAN.R-project.org/package=gridExtra.

[21] E. Neuwirth. RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. 2014. URL: https://CRAN.R-project.org/package=RColorBrewer.

[22] D. Murdoch and D. Adler. rgl: 3D Visualization Using OpenGL. R package version 0.107.10. 2021. URL: https://CRAN.R-project.org/package=rgl.

[23] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.12. 2021. URL: https://CRAN.R-project.org/package=knitcitations.