RNA-Seq : Differential Gene Expression

Introduction

Since its development over a decade ago, RNA Sequencing has become ubiquitous in the study and profiling of transcriptome and has been applied in biomedical research such as diagnosing and profiling disease, quantifying changes in gene expression, identifying novel virulence factors and studying host-pathogen immune interactions and agricultural research such as identifying genes and pathways in response to environmental stresses and identying gene functions and genes that are responsible for particular phenotypes [1; 2; 3].

At the Bioinformatics Core Research Facility at the University of Nebraska-Lincoln (UNL), we offer Differential Gene Expression (DGE) analysis service, which includes:

  • QC and pre-processing of data.
  • Mapping to reference genome and/or transcriptome.
  • Tables for reads before and after pre-processing and alignment.
  • A MultiQC report of the QC, pre-processing and alignment steps.
  • Analysis in Bioconductor/R.
  • Raw and normalized read counts for genes/transcripts.
  • DGE analysis.
  • PCA plot of samples.
  • Heatmap of differentially expressed genes.
  • Annotated DGE results including GO terms and gene descriptions (if available).

In addition to the above, we also assist in the writing of methods section and addressing follow-up questions regarding the analysis. If our work contributes sufficiently to warrant co-authorships, we help prepare figures of publication-quality for manuscripts.

Example figures

Figures in this section were generated from our recent work with Dr. Hiep Vu from UNL’s Department of Animal Science.

The study was on the mechanisms of live-attenuated porcine reproductive and respiratory syndrome virus (PRRSV) persistence in pigs’ inguinal lymphoid tissue - we detected 6404 differentially expressed genes between control and infected pigs.

Genes involved in innate immune responses and chemokines and receptors associated with T-cell homing to lymphoid tissues were down regulated while genes associated with T-cell exhaustion and anti-apoptotic pathway were upregulated. Collectively, the data suggested that the live-attenuated PRRSV strain establishes a pro-survival microenvironment in lymphoid tissue by suppressing innate immune responses, T-cell homing, and preventing cell apoptosis.

Sample-to-sample distances

This plot shows the euclidean distance of samples and how they are hierarchically clustered using the regularized log transformed count data generated with DESeq2 [4].

PCA

The PCA plot shows the first two principal components of the regularized log transformed count data for the control and infected pigs.

MA plot

MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.

Volcano plot

Volcano plot is another way to visualize the DGE results. Each point represents a feature. The colored points are differentially expressed genes with alpha = 0.05 and log2FC = 1 (blue : down-regulated, red : up-regulated). Top 10 up- and down-regulated genes are labeled.

Heatmap - Z-scores

Heatmaps are a great way to visualize the clustering of samples using normalized count data.

This heatmap shows the row-wise Z-scores of regularized log transformed count data for all differentially expressed genes.

This heatmap shows the row-wise Z-scores of regularized log transformed count data for the top 20 differentially expressed genes.

Top features

The following table shows the top 200 features sorted by BH adjusted p-values.

Enriched Pathways

This barplot shows the number of features enriched in a pathway and the pathway’s -log10 p-value using results of pathway enrichment analysis generated from KOBAS3.

Bibliography

This page is created using rmarkdown [5] and knitr [6; 7]. DT [8] was used to create the interactive data table. Heatmaps were made using pheatmap [9]. Data wrangling and formatting were done mostly with tidyverse [10], with the support of readxl [11] and data.table [12]. Figures were made with ggplot2 [13] and the ancillary packages ggrepel [14], extrafont [15], gridExtra [16] and RColorBrewer [17].

Citations were made with knitcitations [18].

This page was generated on 2021-08-18 15:51:38.

[1] R. Lowe, N. Shirley, M. Bleackley, S. Dolan, et al. “Transcriptomics technologies”. In: PLoS Computational Biology 13.5 (May. 2017), p. e1005457. ISSN: 15537358. DOI: 10.1371/journal.pcbi.1005457. URL: https://doi.org/10.1371/journal.pcbi.1005457.t001.

[2] R. Stark, M. Grzelak, and J. Hadfield. RNA sequencing: the teenage years. Nov. 2019. DOI: 10.1038/s41576-019-0150-2. URL: www.nature.com/nrg.

[3] Z. Wang, M. Gerstein, and M. Snyder. RNA-Seq: A revolutionary tool for transcriptomics. Jan. 2009. DOI: 10.1038/nrg2484. URL: www.nature.com/reviews/genetics.

[4] M. I. Love, W. Huber, and S. Anders. “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2”. In: Genome Biology 15 (12 2014), p. 550. DOI: 10.1186/s13059-014-0550-8.

[5] J. Allaire, Y. Xie, J. McPherson, J. Luraschi, et al. rmarkdown: Dynamic Documents for R. R package version 2.9. 2021. URL: https://github.com/rstudio/rmarkdown.

[6] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.33. 2021. URL: https://yihui.org/knitr/.

[7] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.

[8] Y. Xie, J. Cheng, and X. Tan. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.18. 2021. URL: https://CRAN.R-project.org/package=DT.

[9] R. Kolde. pheatmap: Pretty Heatmaps. R package version 1.0.12. 2019. URL: https://CRAN.R-project.org/package=pheatmap.

[10] H. Wickham, M. Averick, J. Bryan, W. Chang, et al. “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43 (2019), p. 1686. DOI: 10.21105/joss.01686.

[11] H. Wickham and J. Bryan. readxl: Read Excel Files. R package version 1.3.1. 2019. URL: https://CRAN.R-project.org/package=readxl.

[12] M. Dowle and A. Srinivasan. data.table: Extension of ‘data.frame’. R package version 1.14.0. 2021. URL: https://CRAN.R-project.org/package=data.table.

[13] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. URL: https://ggplot2.tidyverse.org.

[14] K. Slowikowski. ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’. R package version 0.9.1. 2021. URL: https://CRAN.R-project.org/package=ggrepel.

[15] W. Chang,. extrafont: Tools for using fonts. R package version 0.17. 2014. URL: https://CRAN.R-project.org/package=extrafont.

[16] B. Auguie. gridExtra: Miscellaneous Functions for “Grid” Graphics. R package version 2.3. 2017. URL: https://CRAN.R-project.org/package=gridExtra.

[17] E. Neuwirth. RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. 2014. URL: https://CRAN.R-project.org/package=RColorBrewer.

[18] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.12. 2021. URL: https://CRAN.R-project.org/package=knitcitations.