Abstract

The basejump package is an infrastructure toolkit that extends the base functionality of Bioconductor (Huber et al. 2015). The package leverages the S4 object system for object-oriented programming, and defines multiple additional generic functions for use in genomics research. basejump provides simple, user-friendly functions for the acquisition of genome annotations from multiple online databases, including native support for Ensembl (Hubbard et al. 2002) and websites supporting standard FASTA and GTF/GFF file formats. Consistent handling of sample metadata remains a challenge for many bioinformatic analyses, and basejump aims to address this by providing a suite of sanitization functions to help standardize these variable inputs. Additionally, interactive read/write operations in R can be cumbersome and non-trivial when working with multiple data objects; here we provide additional functions designed for interactive use that aim to reduce friction and provide consistent handling of multiple common file formats used in genomics research.

Introduction

library(basejump)
library(SummarizedExperiment)
options(acid.test = TRUE)
data(RangedSummarizedExperiment, package = "acidtest")
rse <- RangedSummarizedExperiment

The S4 object system

The basejump package defines generics and methods for object-oriented programming with the S4 object system, which is used extensively by the Bioconductor project. These resources describe the S4 object system in detail:

In R, you can check whether a function is using the S4 object system with the isS4() function. Additionally, showMethods() and getMethod() are useful for exploring source code of S4 methods. In contrast, to obtain information on functions using the S3 object system, use methods() instead.

R session information

utils::sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-redhat-linux-gnu (64-bit)
## Running under: Red Hat Enterprise Linux Server 7.7 (Maipo)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] SummarizedExperiment_1.14.1 DelayedArray_0.10.0        
##  [3] BiocParallel_1.18.1         matrixStats_0.55.0         
##  [5] Biobase_2.44.0              GenomicRanges_1.36.1       
##  [7] GenomeInfoDb_1.20.0         IRanges_2.18.2             
##  [9] S4Vectors_0.22.0            BiocGenerics_0.30.0        
## [11] basejump_0.11.14            BiocStyle_2.12.0           
## 
## loaded via a namespace (and not attached):
##  [1] ProtGenerics_1.16.0           bitops_1.0-6                 
##  [3] fs_1.3.1                      bit64_0.9-7                  
##  [5] progress_1.2.2                httr_1.4.1                   
##  [7] rprojroot_1.3-2               syntactic_0.2.5              
##  [9] tools_3.6.1                   backports_1.1.4              
## [11] R6_2.4.0                      colorspace_1.4-1             
## [13] lazyeval_0.2.2                DBI_1.0.0                    
## [15] withr_2.1.2                   tidyselect_0.2.5             
## [17] prettyunits_1.0.2             bit_1.1-14                   
## [19] curl_4.0                      compiler_3.6.1               
## [21] cli_1.1.0                     desc_1.2.0                   
## [23] rtracklayer_1.44.4            bookdown_0.13                
## [25] scales_1.0.0                  rappdirs_0.3.1               
## [27] pkgdown_1.4.0                 stringr_1.4.0                
## [29] digest_0.6.20                 Rsamtools_2.0.0              
## [31] rmarkdown_1.15                R.utils_2.9.0                
## [33] XVector_0.24.0                pkgconfig_2.0.2              
## [35] htmltools_0.3.6               sessioninfo_1.1.1            
## [37] ensembldb_2.8.0               dbplyr_1.4.2                 
## [39] rlang_0.4.0                   RSQLite_2.1.2                
## [41] shiny_1.3.2                   DelayedMatrixStats_1.6.0     
## [43] dplyr_0.8.3                   R.oo_1.22.0                  
## [45] RCurl_1.95-4.12               magrittr_1.5                 
## [47] GenomeInfoDbData_1.2.1        Matrix_1.2-17                
## [49] munsell_0.5.0                 Rcpp_1.0.2                   
## [51] R.methodsS3_1.7.1             stringi_1.4.3                
## [53] yaml_2.2.0                    MASS_7.3-51.4                
## [55] zlibbioc_1.30.0               brio_0.3.7                   
## [57] goalie_0.3.7                  BiocFileCache_1.8.0          
## [59] AnnotationHub_2.16.1          grid_3.6.1                   
## [61] blob_1.2.0                    transformer_0.2.6            
## [63] promises_1.0.1                crayon_1.3.4                 
## [65] lattice_0.20-38               Biostrings_2.52.0            
## [67] bioverbs_0.2.9                GenomicFeatures_1.36.4       
## [69] hms_0.5.1                     zeallot_0.1.0                
## [71] knitr_1.24                    pillar_1.4.2                 
## [73] biomaRt_2.40.4                XML_3.98-1.20                
## [75] glue_1.3.1                    evaluate_0.14                
## [77] freerange_0.2.5               data.table_1.12.2            
## [79] BiocManager_1.30.4            vctrs_0.2.0                  
## [81] httpuv_1.5.1                  grr_0.9.5                    
## [83] purrr_0.3.2                   assertthat_0.2.1             
## [85] xfun_0.9                      mime_0.7                     
## [87] xtable_1.8-4                  AnnotationFilter_1.8.0       
## [89] later_0.8.0                   SingleCellExperiment_1.6.0   
## [91] tibble_2.1.3                  GenomicAlignments_1.20.1     
## [93] Matrix.utils_0.9.7            AnnotationDbi_1.46.1         
## [95] memoise_1.1.0                 interactiveDisplayBase_1.22.0

References

The papers and software cited in our workflows are available as a shared library on Paperpile.

Hubbard, T, D Barker, E Birney, G Cameron, Y Chen, L Clark, T Cox, et al. 2002. “The Ensembl Genome Database Project.” Nucleic Acids Res. 30 (1) (January): 38–41. https://www.ncbi.nlm.nih.gov/pubmed/11752248.

Huber, Wolfgang, Vincent J Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S Carvalho, Hector Corrada Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nat. Methods 12 (2) (February): 115–121. doi:10.1038/nmeth.3252. http://dx.doi.org/10.1038/nmeth.3252.