spatialDmelxsim 1.12.0
This package contains data of allelic expression counts of spatial
slices of a fly embryo, which is a D melanogaster x D simulans
cross. The experiment is a reciprocal cross (see strain
), with three
replicates of one parental arrangement and two of another, which was
sufficient to ensure at least one embryo of each sex for each parental
arrangement.
Data was downloaded from GSE102233
as described in the publication:
Combs PA, Fraser HB (2018) Spatially varying cis-regulatory divergence in Drosophila embryos elucidates cis-regulatory logic. PLOS Genetics 14(11): e1007631. https://doi.org/10.1371/journal.pgen.1007631
The scripts for creating the SummarizedExperiment object can be
found in inst/scripts/make-data.R
.
We can find the resource via ExperimentHub:
library(ExperimentHub)
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: AnnotationHub
## Loading required package: BiocFileCache
## Loading required package: dbplyr
eh <- ExperimentHub()
query(eh, "spatialDmelxsim")
## ExperimentHub with 1 record
## # snapshotDate(): 2024-10-24
## # names(): EH6129
## # package(): spatialDmelxsim
## # $dataprovider: Fraser Lab, Stanford
## # $species: Drosophila melanogaster
## # $rdataclass: RangedSummarizedExperiment
## # $rdatadateadded: 2021-06-16
## # $title: spatialDmelxsim
## # $description: Allelic expression counts of spatial slices of a fly embryo ...
## # $taxonomyid: 7227
## # $genome: dm6
## # $sourcetype: TXT
## # $sourceurl: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102233
## # $sourcesize: NA
## # $tags: c("allelic", "ASE", "Drosophila_melanogaster_Data", "embryo",
## # "ExpressionData", "GEO", "patterning", "RNASeqData",
## # "SequencingData", "spatial")
## # retrieve record with 'object[["EH6129"]]'
Or load directly with a function defined within this package:
suppressPackageStartupMessages(library(SummarizedExperiment))
library(spatialDmelxsim)
se <- spatialDmelxsim()
## see ?spatialDmelxsim and browseVignettes('spatialDmelxsim') for documentation
## loading from cache
The rownames of the SummarizedExperiment are Ensembl IDs. For simplicity of code for plotting individual genes, we will change the rownames to gene symbols (those used in the paper). We check first that all genes have a symbol, because rownames cannot contain an NA.
table(is.na(mcols(se)$paper_symbol))
##
## FALSE
## 13498
rownames(se) <- mcols(se)$paper_symbol
Note we use the following annotation of alleles:
Then we calculate the allelic ratio for D simulans allele:
assay(se, "total") <- assay(se, "a1") + assay(se, "a2")
assay(se, "ratio") <- assay(se, "a1") / assay(se, "total")
We plot the ratio over the slice, using the normSlice
column of
metadata. This is the original slice
number, scaled up to 27 (rep2
had 26 slices and rep4 had 25 slices).
plotGene <- function(gene) {
x <- se$normSlice
y <- assay(se, "ratio")[gene,]
col <- as.integer(se$rep)
plot(x, y, xlab="normSlice", ylab="sim / total ratio",
ylim=c(0,1), main=gene, col=col)
lw <- loess(y ~ x, data=data.frame(x,y=unname(y)))
lines(sort(lw$x), lw$fitted[order(lw$x)], col="red", lwd=2)
abline(h=0.5, col="grey")
}
An example of a gene with global bias toward the simulans allele.
plotGene("DOR")
Example of some genes with spatial patterning of allelic expression:
plotGene("uif")
plotGene("bmm")
plotGene("hb")
plotGene("CG4500")
Other interesting spatial genes can be found by consulting the Combs
and Fraser (2018) paper, in Supplementary Figure 6 “Complete heatmap
of ASE for genes with svASE.” Other species-specific genes are found
in Supplementary Figure 7 “Genes with species-specific expression,
regardless of parent of origin.” Note that the SF6 spatially varying
ASE genes are labelled in mcols(se)$scASE
.
As said above, the file inst/scripts/make-data.R
provides the script
that was used to construct the SummarizedExperiment object from the
data available on GEO. Here are some additional details:
mcols(se)$matchDm557
). When compiling the data I
noticed that other genes, where the genomic locations of dm5.57 and
dm6 were not a match, had missing allelic counts. The current
dataset is provided with respect to dm6.mcols(se)$predicted
.mcols(se)
. One is the SYMBOL that
matches the Ensembl gene_id
according to org.Dm.eg.db
. The other
is the symbol that I obtained from the per-sample FPKM matrices
which have both Ensembl and gene symbols. The paper_symbol
column
is therefore better for matching genes according to their ID in the
paper figures.sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] spatialDmelxsim_1.12.0 SummarizedExperiment_1.36.0
## [3] Biobase_2.66.0 GenomicRanges_1.58.0
## [5] GenomeInfoDb_1.42.0 IRanges_2.40.0
## [7] S4Vectors_0.44.0 MatrixGenerics_1.18.0
## [9] matrixStats_1.4.1 ExperimentHub_2.14.0
## [11] AnnotationHub_3.14.0 BiocFileCache_2.14.0
## [13] dbplyr_2.5.0 BiocGenerics_0.52.0
## [15] BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.46.0 xfun_0.48 bslib_0.8.0
## [4] lattice_0.22-6 vctrs_0.6.5 tools_4.4.1
## [7] generics_0.1.3 curl_5.2.3 tibble_3.2.1
## [10] fansi_1.0.6 AnnotationDbi_1.68.0 RSQLite_2.3.7
## [13] highr_0.11 blob_1.2.4 pkgconfig_2.0.3
## [16] Matrix_1.7-1 lifecycle_1.0.4 GenomeInfoDbData_1.2.13
## [19] compiler_4.4.1 Biostrings_2.74.0 tinytex_0.53
## [22] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
## [25] pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
## [28] DelayedArray_0.32.0 cachem_1.1.0 magick_2.8.5
## [31] abind_1.4-8 mime_0.12 tidyselect_1.2.1
## [34] digest_0.6.37 dplyr_1.1.4 purrr_1.0.2
## [37] bookdown_0.41 BiocVersion_3.20.0 grid_4.4.1
## [40] fastmap_1.2.0 SparseArray_1.6.0 cli_3.6.3
## [43] magrittr_2.0.3 S4Arrays_1.6.0 utf8_1.2.4
## [46] withr_3.0.2 filelock_1.0.3 UCSC.utils_1.2.0
## [49] rappdirs_0.3.3 bit64_4.5.2 rmarkdown_2.28
## [52] XVector_0.46.0 httr_1.4.7 bit_4.5.0
## [55] png_0.1-8 memoise_2.0.1 evaluate_1.0.1
## [58] knitr_1.48 rlang_1.1.4 Rcpp_1.0.13
## [61] glue_1.8.0 DBI_1.2.3 BiocManager_1.30.25
## [64] jsonlite_1.8.9 R6_2.5.1 zlibbioc_1.52.0