This is the first part of the Bioc2016 workshop “Analysis of single-cell RNA-seq data with R and Bioconductor.”
In this part we will cover single-cell RNA-Seq quality control (QC) and normalization with the scone package. The package is available on Github and will be submitted to Bioconductor in the near future.
Single-cell RNA sequencing (scRNA-Seq) technologies are opening the way for transcriptome-wide profiling across diverse and complex mammalian tissues, facilitating unbiased identification of novel cell sub-populations and their functional roles. As in other high-throughput assays, a fraction of the heterogeneity observed in scRNA-Seq data results from batch effects and other technical artifacts. In particular, these protocols’ reliance on minuscule amounts of starting mRNA can lead to widespread “drop-out effects,” in which expressed transcripts are missed. Due to the biases inherent to these assays, data normalization is an essential step prior to any downstream analyses. Furthermore, due to wide-range of scRNA-Seq study designs used in the field, we cannot expect to find a one-size-fits-all solution to these problems.
scone
supports a rational, data-driven framework for assessing the efficacy of various normalization workflows, encouraging users to explore trade-offs inherent to their data set prior to finalizing a data normalization strategy. We provide an interface for running multiple normalization workflows in parallel. We also offer tools for ranking workflows and visualizing trade-offs. We import some common normalization modules used in traditional bulk sequencing, and provide support for integrating user-specified normalization modules.
The first two parts of this workshop will analyze a small set of cells lying along a developmental trajectory. We will start from raw data objects obtained from a standard transcriptome alignment pipeline. Raw data and important reference data can be loaded directly from the workshop package.
library(bioc2016singlecell)
## Load Example Data
data(ws_input)
ls()
## [1] "batch" "bio" "breaks" "cl"
## [5] "cl3" "cl7" "cm" "cont_levels"
## [9] "contrMat" "counts" "de" "defaultMar"
## [13] "fit" "genesDendro" "genesF" "genesOneAll"
## [17] "genesPairs" "hk" "manual" "mast_res"
## [21] "norm" "norm_logcounts" "pca" "plotCMar"
## [25] "qc" "res1" "res2" "res3"
## [29] "rs" "sca" "se" "wh_rm"
The counts
data frame contains feature-level read counts from tophat alignments of 96 single-cell libraries to the mm10 reference genome (Trapnell, Pachter, and Salzberg 2009). qc
contains library alignment metrics obtained from Picard.
colnames(qc)
## [1] "NREADS" "NALIGNED" "RALIGN"
## [4] "TOTAL_DUP" "PRIMER" "PCT_RIBOSOMAL_BASES"
## [7] "PCT_CODING_BASES" "PCT_UTR_BASES" "PCT_INTRONIC_BASES"
## [10] "PCT_INTERGENIC_BASES" "PCT_MRNA_BASES" "MEDIAN_CV_COVERAGE"
## [13] "MEDIAN_5PRIME_BIAS" "MEDIAN_3PRIME_BIAS"
de
and hk
are positive and negative control gene sets derived from population studies. batch
and bio
are factors labeling batch and time point respectively. Consider the joint distribution of these factors:
## Joint distribution of batches and biological conditions (time after induction)
table(batch,bio)
## bio
## batch time0 time1 time2 time3 time4 time5
## Y01 10 0 0 0 0 0
## Y02 0 0 0 6 0 0
## Y03 0 0 0 8 0 0
## Y04 8 0 0 0 0 0
## Y05A 0 10 0 0 0 0
## Y05B 0 9 0 0 0 0
## Y06 0 0 5 0 0 0
## Y07A 0 0 9 0 0 0
## Y07B 0 0 10 0 0 0
## Y08 0 6 0 0 0 0
## Y10 0 0 0 0 2 0
## Y11 0 0 0 0 4 0
## Y12A 0 0 0 0 0 2
## Y12B 0 0 0 0 0 2
## Y13 0 0 0 0 0 5
Notice that each time-point is composed of multiple batches. This nested design is common among scRNA-Seq studies due to limitations on the number of cells that can be harvested/sequenced concurrently.
We will first visualize cell level quality readouts, starting with the ratio of reads aligned to the genome:
# Color scheme
cc <- c(brewer.pal(9, "Set1"), brewer.pal(8, "Set2"),brewer.pal(9,"Set3"))
# Barplot of read proportion mapping to the genome
barplot(qc$RALIGN[order(batch)], col=cc[batch][order(batch)], border=cc[batch][order(batch)], main="Percentage of mapped reads, colored by batch")
legend("bottomleft", legend=levels(batch), fill=cc,cex=0.4)
We see that there aren’t important differences between batches, while there are a few cells of particularly low alignment efficiency. These cells can be removed via filtering procedures. We can alternatively consider the number of reads in each library:
# Barplot of total read number
barplot(qc$NREADS[order(batch)], col=cc[batch][order(batch)], border=cc[batch][order(batch)], main="Total number of reads, colored by batch")
legend("topright", legend=levels(batch), fill=cc, cex=0.4)
We see that coverage varies substantially between batches, and though some of this technical variability may be addressed with filtering, we will still need to carefully normalize the data to make batches comparable.
It can be very helpful to visualize distributions of single metrics, but it’s important to note that these metrics are often correlated. Sometimes it may be more useful to consider principal components of the quality matrix, identifying principal factors of library variation:
qpc = prcomp(qc,center = TRUE,scale. = TRUE)
barplot(cumsum(qpc$sdev^2)/sum(qpc$sdev^2), border="gray", xlab="PC", ylab="proportion of variance", main="Quality PCA")
Even though 14 quality metrics have been quantified, PCA shows us that only a small number of PCs are needed to described a majority of the variance (e.g. 3 to explain 74%). We will now visualize the distribution of the first PC in the context of batch:
barplot(qpc$x[,1][order(batch)], col=cc[batch][order(batch)], border=cc[batch][order(batch)], main="Quality PC1, colored by batch")
legend("bottomleft", legend=levels(batch), fill=cc, cex=0.8)
This PC appears to correlate with batch (particularly Y04), but it also highlights 5 outlier libraries that are significantly different from the rest. While we won’t discuss the application of these PCs to filtering in this workshop, we will show how they can be applied to normalization.
Next we consider a uniquely single-cell problem: drop-outs. hk
contains a list of genes that are believed to be ubiquitously and uniformly expressed across the target tissue. Because we assume these genes are expressed in all cells, we can label all zeroes as drop-out events. Below we model detection failures as a logistic function of mean expression, in line with the standard logistic model for drop-outs employed by the field.
# Mean log10(x+1) expression
mu_obs = rowMeans(log10(counts[hk,]+1))
# Drop-outs
drop_outs = counts[hk,] == 0
# Logistic Regression Model of Failure
ref.glms = list()
for (si in 1:dim(drop_outs)[2]){
fit = glm(cbind(drop_outs[,si],1 - drop_outs[,si]) ~ mu_obs,family=binomial(logit))
ref.glms[[si]] = fit$coefficients
}
The list ref.glm
contains the intercept and slope of each fit. We can now visualize the fit curves and the corresponding Area Under the Curves (AUCs):
par(mfrow=c(1,2))
# Plot Failure Curves and Calculate AUC
plot(NULL, main = "False Negative Rate Curves",ylim = c(0,1),xlim = c(0,6), ylab = "Failure Probability", xlab = "Mean log10 Expression")
x = (0:60)/10
AUC = NULL
for(si in 1:ncol(counts)){
y = 1/(exp(-ref.glms[[si]][1] - ref.glms[[si]][2] * x) + 1)
AUC[si] = sum(y)/10
lines(x, 1/(exp(-ref.glms[[si]][1] - ref.glms[[si]][2] * x) + 1), type = 'l', lwd = 2, col = cc[batch][si])
}
# Barplot of FNR AUC
barplot(AUC[order(batch)], col=cc[batch][order(batch)], border=cc[batch][order(batch)], main="FNR AUC, colored by batch")
legend("topleft", legend=levels(batch), fill=cc, cex=0.4)
You may have noticed that the 5 outlier cells seen here are the same as the 5 outliers highlighted in the PCA of quality metrics above. Drop-out summaries such as the AUC can very useful for assessing single-cell library quality.
scone
WorkflowSo far we have only described potential problems with our data set. Now we will take steps to address them! The basic qc and normalization pipeline we will use today will allow us to:
metric_sample_filter
function.scone
function.biplot_colored
function.The SCONE’s normalization workflow template is composed of 3 modules:
metric_sample_filter
The most basic sample filtering function in scone
is the metric_sample_filter
. It takes as input an expression matrix. The output depends on arguments provided, but generally consists of a list of 4 logicals designating each sample as having failed (TRUE) or passed (FALSE) a threshold-based filter on 4 metrics
ralign
argument.gene_filter
argument.pos_controls
argument.If required arguments are missing for any of the 4, the function will simply return NA instead of the corresponding logical.
mfilt_report <- metric_sample_filter(expr = counts,
nreads = qc$NREADS,ralign = qc$RALIGN,
suff_nreads = 10^5,
suff_ralign = 90,
pos_controls = hk,
zcut = 3,mixture = FALSE, plot = TRUE)
In the call above, we have set the following parameters:
Let’s take a closer look at the computation behind the ralign filter. In selecting the threshold value 90, metric_sample_filter
is taking 4 values into account:
hard_ralign
, the default “hard” threshold at 15 - rather forgiving…zcut
) times the standard deviation below the mean ralign
value.zcut
) times the MAD below the median ralign
value.suff_ralign
, the sufficient threshold set to 90.hist(qc$RALIGN, breaks = 0:100)
# Hard threshold
abline(v = 15, col = "yellow", lwd = 2)
# 3 (zcut) standard deviations below the mean ralign value
abline(v = mean(qc$RALIGN) - 3*sd(qc$RALIGN), col = "green", lwd = 2)
# 3 (zcut) MADs below the median ralign value
abline(v = median(qc$RALIGN) - 3*mad(qc$RALIGN), col = "red", lwd = 2)
# Sufficient threshold
abline(v = 90, col = "grey", lwd = 2)
# Final threshold is the minimum of 1) the sufficient threshold and 2) the max of all others
thresh = min(90,max(c(15,mean(qc$RALIGN) - 3*sd(qc$RALIGN),median(qc$RALIGN) - 3*mad(qc$RALIGN))))
abline(v = thresh, col = "blue", lwd = 2, lty = 2)
legend("topleft",legend = c("Hard","SD","MAD","Sufficient","Final"),lwd = 2, col = c("yellow","green","red","grey","blue"),lty = c(1,1,1,1,2), cex = .5)
We see here that the 3rd threshold exceeds the sufficient threshold, and so metric_sample_filter
settles for the sufficient threshold. Note that if mixture=TRUE
an additional criterion is included: bi-modal metric distributions are fit with to a two component mixture model, and a threshold is defined with respect to the mean and standard deviation of the “better” component. As metric_sample_filter
relies on a maximum of candidate thresholds, we recommend users treat this function as a liberal filter.
With metric_sample_filter
output in hand, filtering out the poor samples is fairly straightforward:
# Which thresholds are missing? (breadth)
is_na_filt = unlist(lapply(is.na(mfilt_report),any))
# Identify samples failing any threshold
m_sampfilter = !apply(simplify2array(mfilt_report[!is_na_filt]),1,any)
# Filter Samples
fcounts = counts[,m_sampfilter]
fqc = qc[m_sampfilter,]
fbatch = batch[m_sampfilter]
fbio = bio[m_sampfilter]
# Simple gene filter
filterCount <- function(counts, nRead=5, nCell=5){
filter <- apply(counts, 1, function(x) length(x[x>=nRead])>=nCell)
return(filter)
}
genefilter <- filterCount(fcounts)
# Filter genes
fcounts = fcounts[genefilter,]
fhk = hk[hk %in% rownames(fcounts)]
fde = de[de %in% rownames(fcounts)]
scone
As described earlier, not only does scone
normalize expression data, but it also provides a framework for evaluation the performance of various normalization workflows. In order to run the scone
function, we will need to decide which workflows we will want to compare.
run=FALSE
The arguments to scone
allow for a lot of flexibility, but sometimes you will need to run very specific combinations of modules. For this purpose, scone
can be run in run=FALSE
mode, returning only a data frame of workflows to be performed.
params <- scone(expr = as.matrix(fcounts),scaling = c(none = identity,deseq = DESEQ_FN, tmm = TMM_FN, uqp = UQ_FN_POS, fq = FQT_FN),
ruv_negcon = fhk, k_ruv = 3,
qc = as.matrix(fqc), k_qc = 3,
bio = fbio,adjust_bio = "yes",
batch = fbatch,adjust_batch = "yes",
run = FALSE)
head(params)
## imputation_method scaling_method
## none,none,no_uv,no_bio,no_batch none none
## none,deseq,no_uv,no_bio,no_batch none deseq
## none,tmm,no_uv,no_bio,no_batch none tmm
## none,uqp,no_uv,no_bio,no_batch none uqp
## none,fq,no_uv,no_bio,no_batch none fq
## none,none,ruv_k=1,no_bio,no_batch none none
## uv_factors adjust_biology adjust_batch
## none,none,no_uv,no_bio,no_batch no_uv no_bio no_batch
## none,deseq,no_uv,no_bio,no_batch no_uv no_bio no_batch
## none,tmm,no_uv,no_bio,no_batch no_uv no_bio no_batch
## none,uqp,no_uv,no_bio,no_batch no_uv no_bio no_batch
## none,fq,no_uv,no_bio,no_batch no_uv no_bio no_batch
## none,none,ruv_k=1,no_bio,no_batch ruv_k=1 no_bio no_batch
In the call above, we have set the following parameters:
These arguments translate to the following set of options:
apply(params,2,unique)
## $imputation_method
## [1] "none"
##
## $scaling_method
## [1] "none" "deseq" "tmm" "uqp" "fq"
##
## $uv_factors
## [1] "no_uv" "ruv_k=1" "ruv_k=2" "ruv_k=3" "qc_k=1" "qc_k=2" "qc_k=3"
##
## $adjust_biology
## [1] "no_bio" "bio"
##
## $adjust_batch
## [1] "no_batch" "batch"
While adjusting for biology prior to downstream analysis is generally problematic, it can be particularly dangerous in the single-cell context: there is significantly more sample-intrinsic biological variability in scRNA-Seq data, variability that cannot be captured by exposure. In our nested design scenario we would only consider adjustments for biology when batch effects are included. We can produce an updated params
data frame reflecting this choice:
is_screened = (params$adjust_biology == "bio") & (params$adjust_batch != "batch")
params = params[!is_screened,]
scone
with run=TRUE
Now that we have selected our workflows, we can run scone
in run=TRUE
mode. This mode offers a few additional arguments, including an optional params
argument to pass any results from the run=FALSE
mode. In order to understand these arguments, we must first understand the 8 metrics used to evaluate each normalization. The first 6 metrics rely on a reduction of the normalized data down to 3 dimensions via PCA (default). Each metric is taken to have a positive (higher is better) or negative (lower is better) signature.
bio
, defined with respect to a Euclidean distance metric over the first 3 expression PCs. Positive signature.batch
, defined with respect to a Euclidean distance metric over the first 3 expression PCs. Negative signature.k_qc
QPCs. Negative signature.eval_negcon
or ruv_negcon
by default) sub-matrix of the original (raw) data. Negative signature.eval_poscon
) sub-matrix of the original (raw) data. Positive signature.res <- scone(expr = as.matrix(fcounts),scaling = c(none = identity, deseq = DESEQ_FN, tmm = TMM_FN, uqp = UQ_FN_POS, fq = FQT_FN),
ruv_negcon = fhk, k_ruv = 3,
qc = as.matrix(fqc), k_qc = 3,
bio = fbio,adjust_bio = "yes",
batch = fbatch,adjust_batch = "yes",
run = TRUE,params = params,
eval_poscon = fde, eval_kclust = 2:3, conditional_pam = TRUE)
In the call above, we have set the following parameters:
Running the command will take a couple minutes. The output will contain a list of four elements:
names(res)
## [1] "normalized_data" "metrics" "scores" "params"
normalized_data
contains a list of normalized expression data (log-scale); each expression matrix is named according to the same convention as seen in the params
argument.
metrics
contains the 8 raw metrics for each normalization. scores
contains metrics multiplied by their signature - or “scores” - as well as a 9th column that contains the mean score for that normalization. Normalization workflows in normalized_data
,metrics
, and scores
are sorted in decreasing order by mean score.
head(res$scores)
## BIO_SIL BATCH_SIL PAM_SIL
## none,fq,ruv_k=1,no_bio,no_batch 0.08653280 0.1508225 0.3741884
## none,fq,ruv_k=2,no_bio,no_batch 0.08126123 0.1558709 0.3716837
## none,fq,ruv_k=3,no_bio,no_batch 0.09865276 0.1415136 0.3629964
## none,deseq,ruv_k=2,no_bio,no_batch 0.09721160 0.1722529 0.3983736
## none,deseq,ruv_k=1,no_bio,no_batch 0.08828919 0.1640219 0.4215769
## none,fq,qc_k=1,no_bio,no_batch 0.08318459 0.1581687 0.4326760
## EXP_QC_COR EXP_UV_COR EXP_WV_COR
## none,fq,ruv_k=1,no_bio,no_batch -0.2132825 -0.09864541 0.5405376
## none,fq,ruv_k=2,no_bio,no_batch -0.2012192 -0.08994268 0.5313734
## none,fq,ruv_k=3,no_bio,no_batch -0.2300302 -0.06228916 0.5779535
## none,deseq,ruv_k=2,no_bio,no_batch -0.2984754 -0.16153256 0.5042606
## none,deseq,ruv_k=1,no_bio,no_batch -0.3437892 -0.16409015 0.4916737
## none,fq,qc_k=1,no_bio,no_batch -0.2336496 -0.28368051 0.5126642
## RLE_MED RLE_IQR mean_score
## none,fq,ruv_k=1,no_bio,no_batch -0.002169450 -0.9977360 -0.01996901
## none,fq,ruv_k=2,no_bio,no_batch -0.002856720 -1.0625194 -0.02704359
## none,fq,ruv_k=3,no_bio,no_batch -0.002686065 -1.1112346 -0.02814046
## none,deseq,ruv_k=2,no_bio,no_batch -0.007600576 -0.9843497 -0.03498245
## none,deseq,ruv_k=1,no_bio,no_batch -0.007537978 -0.9375725 -0.03592851
## none,fq,qc_k=1,no_bio,no_batch -0.003546027 -0.9857528 -0.03999193
Based on our sorting criteria, it would appear that none,fq,ruv_k=1,no_bio,no_batch
performs well compared to other normalization workflows. A useful way to visualize this method with respect to others is the biplot_colored
function
pc_obj = prcomp(res$scores[,-ncol(res$scores)],center = TRUE,scale = FALSE)
bp_obj = biplot_colored(pc_obj,y = -res$scores[,ncol(res$scores)],expand = .6)
We have colored each point above according the corresponding method’s mean score. We can see that workflows cluster largely by three scores: BIO_SIL, EXP_WV_COR, and EXP_UV_COR. Since clustering by biology is playing a strong role, we can highlight the points corresponding the biology adjustment:
bp_obj = biplot_colored(pc_obj,y = -res$scores[,ncol(res$scores)],expand = .6)
points(bp_obj[grepl(",bio,batch",rownames(bp_obj)),], pch = 1, col = "red", cex = 1)
points(bp_obj[grepl(",bio,batch",rownames(bp_obj)),], pch = 1, col = "red", cex = 1.5)
We see that the intermediate-performance cluster to the bottom-left is made up only of these methods. This example highlights the danger of adjusting for biology, even in the nested design scenario. Although biology-adjusted data may reflect prior differences more strongly, other metrics would suggest that the data is poorly behaved. We can also consider cases where only batch is accounted for:
bp_obj = biplot_colored(pc_obj,y = -res$scores[,ncol(res$scores)],expand = .6)
points(bp_obj[grepl(",no_bio,batch",rownames(bp_obj)),], pch = 1, col = "red", cex = 1)
points(bp_obj[grepl(",no_bio,batch",rownames(bp_obj)),], pch = 1, col = "red", cex = 1.5)
Here we see that the poorly performing cluster on the right is made up solely of these methods. This follows from the simple fact that batch is nested within biology, and adjustments for the former can remove critical biological variation. Finally we will visualize the top-performing method and it’s relation to un-normalized data:
bp_obj = biplot_colored(pc_obj,y = -res$scores[,ncol(res$scores)],expand = .6)
points(t(bp_obj[1,]), pch = 1, col = "red", cex = 1)
points(t(bp_obj[1,]), pch = 1, col = "red", cex = 1.5)
points(t(bp_obj[rownames(bp_obj) == rownames(params)[1],]), pch = 1, col = "blue", cex = 1)
points(t(bp_obj[rownames(bp_obj) == rownames(params)[1],]), pch = 1, col = "blue", cex = 1.5)
arrows(bp_obj[rownames(bp_obj) == rownames(params)[1],][1],
bp_obj[rownames(bp_obj) == rownames(params)[1],][2],
bp_obj[1,][1],
bp_obj[1,][2],
lty = 2, lwd = 2)
The arrow traces a line from the “no-op” normalization to the top-ranked normalization in SCONE. We see that SCONE has selected a method in-between the two extremes, reducing the signal of unwanted variation (as defined by fhk
) while preserving biological signal (as defined by fbio
and fde
).
sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.4 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-2 scone_0.0.5
## [3] MAST_0.934 reshape_0.8.5
## [5] cluster_2.0.4 clusterExperiment_0.99.1-9003
## [7] SummarizedExperiment_1.3.5 Biobase_2.33.0
## [9] GenomicRanges_1.25.8 GenomeInfoDb_1.9.1
## [11] IRanges_2.7.11 S4Vectors_0.11.7
## [13] BiocGenerics_0.19.1 bioc2016singlecell_0.0.1
## [15] devtools_1.11.1 BiocStyle_2.1.10
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.2-6 hwriter_1.3.2
## [3] class_7.3-14 modeltools_0.2-21
## [5] mclust_5.2 XVector_0.13.2
## [7] flexmix_2.3-13 AnnotationDbi_1.35.3
## [9] mvtnorm_1.0-5 xml2_0.1.2
## [11] codetools_0.2-14 splines_3.3.0
## [13] R.methodsS3_1.7.1 doParallel_1.0.10
## [15] bold_0.3.5 DESeq_1.25.0
## [17] robustbase_0.92-6 geneplotter_1.51.0
## [19] knitr_1.13 ade4_1.7-4
## [21] jsonlite_0.9.22 locfdr_1.1-8
## [23] Rsamtools_1.25.0 phylobase_0.8.2
## [25] gridBase_0.4-7 annotate_1.51.0
## [27] kernlab_0.9-24 R.oo_1.20.0
## [29] rentrez_1.0.2 httr_1.2.0
## [31] assertthat_0.1 Matrix_1.2-6
## [33] lazyeval_0.2.0 rotl_3.0.0
## [35] limma_3.29.10 formatR_1.4
## [37] htmltools_0.3.5 tools_3.3.0
## [39] gtable_0.2.0 taxize_0.7.8
## [41] reshape2_1.4.1 dplyr_0.4.3
## [43] ShortRead_1.31.0 Rcpp_0.12.5
## [45] NMF_0.20.6 trimcluster_0.1-2
## [47] Biostrings_2.41.4 gdata_2.17.0
## [49] ape_3.5 nlme_3.1-128
## [51] rtracklayer_1.33.7 iterators_1.0.8
## [53] fpc_2.1-10 stringr_1.0.0
## [55] rngtools_1.2.4 gtools_3.5.0
## [57] XML_3.98-1.4 dendextend_1.2.0
## [59] edgeR_3.15.0 DEoptimR_1.0-4
## [61] zlibbioc_1.19.0 MASS_7.3-45
## [63] scales_0.4.0 aroma.light_3.3.0
## [65] yaml_2.1.13 RUVSeq_1.7.2
## [67] memoise_1.0.0 ggplot2_2.1.0
## [69] pkgmaker_0.22 segmented_0.5-1.4
## [71] biomaRt_2.29.2 latticeExtra_0.6-28
## [73] stringi_1.1.1 RSQLite_1.0.0
## [75] genefilter_1.55.2 foreach_1.4.3
## [77] GenomicFeatures_1.25.14 caTools_1.17.1
## [79] boot_1.3-18 BiocParallel_1.7.4
## [81] chron_2.3-47 prabclus_2.2-6
## [83] matrixStats_0.50.2 bitops_1.0-6
## [85] rncl_0.6.0 evaluate_0.9
## [87] lattice_0.20-33 GenomicAlignments_1.9.4
## [89] rredlist_0.1.0 plyr_1.8.4
## [91] magrittr_1.5 R6_2.1.2
## [93] gplots_3.0.1 DBI_0.4-1
## [95] whisker_0.3-2 withr_1.0.2
## [97] mixtools_1.0.4 RCurl_1.95-4.8
## [99] survival_2.39-4 abind_1.4-3
## [101] nnet_7.3-12 tibble_1.0
## [103] EDASeq_2.7.2 uuid_0.1-2
## [105] KernSmooth_2.23-15 howmany_0.3-1
## [107] rmarkdown_0.9.6 RNeXML_2.0.6
## [109] grid_3.3.0 data.table_1.9.6
## [111] digest_0.6.9 diptest_0.75-7
## [113] xtable_1.8-2 tidyr_0.5.1
## [115] R.utils_2.3.0 munsell_0.4.3
## [117] registry_0.3
Risso, Davide, John Ngai, Terence P. Speed, and Sandrine Dudoit. 2014. “Normalization of Rna-Seq Data Using Factor Analysis of Control Genes or Samples.” Nature Biotechnology 32: 896–902.
Trapnell, Cole, Lior Pachter, and Steven L. Salzberg. 2009. “TopHat: Discovering Splice Junctions with Rna-Seq.” Bioinformatics 25 (9): 1105–11.