txdbmaker 1.2.1
The txdbmaker package provides functions to make TxDb
objects from genomic annotation provided by the UCSC Genome Browser
(https://genome.ucsc.edu/), Ensembl (https://ensembl.org/),
BioMart (http://www.biomart.org/), or directly from a GFF or GTF file.
In this document we will quickly demonstrate the use of these functions.
Note that the package also provides a lower-level utility, makeTxDb()
,
for creating TxDb
objects from data directly supplied by the user.
Please refer to its man page (?makeTxDb
) for more information.
See vignette in the GenomicFeatures package for an
introduction to TxDb
objects.
txdbmaker
packageInstall the package with:
if (!require("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("txdbmaker")
Then load it with:
suppressPackageStartupMessages(library(txdbmaker))
makeTxDbFromUCSC
The function makeTxDbFromUCSC
downloads UCSC
Genome Bioinformatics transcript tables (e.g. knownGene
,
refGene
, ensGene
) for a genome build (e.g.
mm9
, hg19
). Use the supportedUCSCtables
utility function to get the list of tables known to work with
makeTxDbFromUCSC
.
supportedUCSCtables(genome="mm9")
## tablename track composite_track
## 1 acembly AceView Genes <NA>
## 2 augustusGene AUGUSTUS <NA>
## 3 ccdsGene CCDS <NA>
## 4 ensGene Ensembl Genes <NA>
## 5 exoniphy Exoniphy <NA>
## 6 geneid Geneid Genes <NA>
## 7 genscan Genscan Genes <NA>
## 8 knownGene UCSC Genes <NA>
## 9 knownGeneOld4 Old UCSC Genes <NA>
## 10 nscanGene N-SCAN <NA>
## 11 pseudoYale60 Yale Pseudo60 <NA>
## 12 refGene RefSeq Genes <NA>
## 13 sgpGene SGP Genes <NA>
## 14 transcriptome Transcriptome <NA>
## 15 vegaPseudoGene Vega Pseudogenes Vega Genes
## 16 vegaGene Vega Protein Genes Vega Genes
## 17 xenoRefGene Other RefSeq <NA>
mm9KG_txdb <- makeTxDbFromUCSC(genome="mm9", tablename="knownGene")
## Download the knownGene table ... OK
## Download the knownToLocusLink table ... OK
## Extract the 'transcripts' data frame ... OK
## Extract the 'splicings' data frame ... OK
## Download and preprocess the 'chrominfo' data frame ... OK
## Prepare the 'metadata' data frame ... OK
## Make the TxDb object ... OK
mm9KG_txdb
## TxDb object:
## # Db type: TxDb
## # Supporting package: GenomicFeatures
## # Data source: UCSC
## # Genome: mm9
## # Organism: Mus musculus
## # Taxonomy ID: 10090
## # UCSC Table: knownGene
## # UCSC Track: UCSC Genes
## # Resource URL: https://genome.ucsc.edu/
## # Type of Gene ID: Entrez Gene ID
## # Full dataset: yes
## # miRBase build ID: NA
## # Nb of transcripts: 55419
## # Db created by: txdbmaker package from Bioconductor
## # Creation time: 2024-11-25 20:10:13 -0500 (Mon, 25 Nov 2024)
## # txdbmaker version at creation time: 1.2.1
## # RSQLite version at creation time: 2.3.8
## # DBSCHEMAVERSION: 1.2
See ?makeTxDbFromUCSC
for more information.
makeTxDbFromBiomart
Retrieve data from BioMart by specifying the mart and the data set to
the makeTxDbFromBiomart
function (not all BioMart
data sets are currently supported):
mmusculusEnsembl <- makeTxDbFromBiomart(dataset="mmusculus_gene_ensembl")
As with the makeTxDbFromUCSC
function, the
makeTxDbFromBiomart
function also has a
circ_seqs
argument that will default to using the contents
of the DEFAULT_CIRC_SEQS
vector. And just like those UCSC
sources, there is also a helper function called
getChromInfoFromBiomart
that can show what the different
chromosomes are called for a given source.
Using the makeTxDbFromBiomart
makeTxDbFromUCSC
functions can take a while and
may also require some bandwidth as these methods have to download and
then assemble a database from their respective sources. It is not
expected that most users will want to do this step every time.
Instead, we suggest that you save your annotation objects and label
them with an appropriate time stamp so as to facilitate reproducible
research.
See ?makeTxDbFromBiomart
for more information.
makeTxDbFromEnsembl
The makeTxDbFromEnsembl
function creates a TxDb
object
for a given organism by importing the genomic locations of its transcripts,
exons, CDS, and genes from an Ensembl database.
See ?makeTxDbFromEnsembl
for more information.
makeTxDbFromGFF
You can also extract transcript information from either GFF3 or GTF
files by using the makeTxDbFromGFF
function.
Usage is similar to makeTxDbFromBiomart
and
makeTxDbFromUCSC
.
See ?makeTxDbFromGFF
for more information.
TxDb
ObjectOnce a TxDb
object has been created, it can be saved
to avoid the time and bandwidth costs of recreating it and to make it
possible to reproduce results with identical genomic feature data at a
later date. Since TxDb
objects are backed by a
SQLite database, the save format is a SQLite database file (which
could be accessed from programs other than R if desired). Note that
it is not possible to serialize a TxDb
object using
R’s save
function.
saveDb(mm9KG_txdb, file="mm9KG_txdb.sqlite")
And as was mentioned earlier, a saved TxDb
object can
be initialized from a .sqlite file by simply using loadDb
.
mm9KG_txdb <- loadDb("mm9KG_txdb.sqlite")
makeTxDbPackageFromUCSC
and makeTxDbPackageFromBiomart
It is often much more convenient to just make an annotation package
out of your annotations. If you are finding that this is the case,
then you should consider the convenience functions:
makeTxDbPackageFromUCSC
and
makeTxDbPackageFromBiomart
. These functions are similar
to makeTxDbFromUCSC
and
makeTxDbFromBiomart
except that they will take the
extra step of actually wrapping the database up into an annotation
package for you. This package can then be installed and used as of
the standard TxDb packages found on in the Bioconductor
repository.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] txdbmaker_1.2.1 GenomicFeatures_1.58.0 AnnotationDbi_1.68.0
## [4] Biobase_2.66.0 GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
## [7] IRanges_2.40.0 S4Vectors_0.44.0 BiocGenerics_0.52.0
## [10] BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4
## [3] blob_1.2.4 filelock_1.0.3
## [5] Biostrings_2.74.0 bitops_1.0-9
## [7] fastmap_1.2.0 RCurl_1.98-1.16
## [9] BiocFileCache_2.14.0 GenomicAlignments_1.42.0
## [11] XML_3.99-0.17 digest_0.6.37
## [13] timechange_0.3.0 lifecycle_1.0.4
## [15] KEGGREST_1.46.0 RSQLite_2.3.8
## [17] magrittr_2.0.3 compiler_4.4.2
## [19] rlang_1.1.4 sass_0.4.9
## [21] progress_1.2.3 tools_4.4.2
## [23] utf8_1.2.4 yaml_2.3.10
## [25] rtracklayer_1.66.0 knitr_1.49
## [27] prettyunits_1.2.0 S4Arrays_1.6.0
## [29] bit_4.5.0 curl_6.0.1
## [31] DelayedArray_0.32.0 xml2_1.3.6
## [33] abind_1.4-8 BiocParallel_1.40.0
## [35] grid_4.4.2 fansi_1.0.6
## [37] biomaRt_2.62.0 SummarizedExperiment_1.36.0
## [39] cli_3.6.3 rmarkdown_2.29
## [41] crayon_1.5.3 generics_0.1.3
## [43] httr_1.4.7 rjson_0.2.23
## [45] DBI_1.2.3 cachem_1.1.0
## [47] stringr_1.5.1 zlibbioc_1.52.0
## [49] parallel_4.4.2 BiocManager_1.30.25
## [51] XVector_0.46.0 restfulr_0.0.15
## [53] matrixStats_1.4.1 vctrs_0.6.5
## [55] Matrix_1.7-1 jsonlite_1.8.9
## [57] bookdown_0.41 hms_1.1.3
## [59] bit64_4.5.2 jquerylib_0.1.4
## [61] glue_1.8.0 codetools_0.2-20
## [63] lubridate_1.9.3 stringi_1.8.4
## [65] BiocIO_1.16.0 UCSC.utils_1.2.0
## [67] tibble_3.2.1 pillar_1.9.0
## [69] rappdirs_0.3.3 htmltools_0.5.8.1
## [71] GenomeInfoDbData_1.2.13 R6_2.5.1
## [73] dbplyr_2.5.0 httr2_1.0.6
## [75] evaluate_1.0.1 lattice_0.22-6
## [77] RMariaDB_1.3.3 png_0.1-8
## [79] Rsamtools_2.22.0 memoise_2.0.1
## [81] bslib_0.8.0 SparseArray_1.6.0
## [83] xfun_0.49 MatrixGenerics_1.18.0
## [85] pkgconfig_2.0.3