This function is a utility wrapper for SummarizedExperiment that provides automatic subsetting for row and column data, as well as automatic handling of transgenes and spike-ins. Additionally, it improves upon the standard constructor by slotting useful session information into the metadata slot:

makeSummarizedExperiment(assays, rowRanges = NULL, rowData = NULL,
  colData = NULL, metadata = NULL, transgeneNames = NULL,
  spikeNames = NULL)

Arguments

assays

list. Count matrices, which must have matching dimensions. Counts can be passed in as either a dense matrix (matrix) or sparse matrix (sparseMatrix).

rowRanges

GRanges. Genomic ranges (e.g. genome annotations). Metadata describing the assay rows.

rowData

DataFrame. Metadata describing the assay rows, if genomic ranges are not available. Use rowRanges (GRanges) instead, if possible.

colData

DataFrame. Metadata describing the assay columns. For bulk RNA-seq, this data describes the samples. For single-cell RNA-seq, this data describes the cells.

metadata

list. Metadata.

transgeneNames

character. Vector indicating which assay rows denote transgenes (e.g. EGFP, TDTOMATO).

spikeNames

character. Vector indicating which assay rows denote spike-in sequences (e.g. ERCCs).

Value

  • Providing rowRanges: RangedSummarizedExperiment.

  • Providing rowData: SummarizedExperiment.

Details

  • date: Today's date, returned from Sys.Date.

  • wd: Working directory, returned from getwd.

  • sessionInfo: sessioninfo::session_info() return.

Note

Column and rows always return sorted alphabetically.

See also

Examples

library(IRanges) library(GenomicRanges) library(SummarizedExperiment) ## Rows (genes) genes <- c( "EGFP", # transgene paste0("gene", seq_len(3L)) ) print(genes)
#> [1] "EGFP" "gene1" "gene2" "gene3"
## Columns (samples) samples <- paste0("sample", seq_len(4L)) print(samples)
#> [1] "sample1" "sample2" "sample3" "sample4"
## Counts (assay) counts <- matrix( data = seq_len(16L), nrow = 4L, ncol = 4L, dimnames = list(genes, samples) ) ## Primary assay must be named "counts". assays <- list(counts = counts) print(assays)
#> $counts #> sample1 sample2 sample3 sample4 #> EGFP 1 5 9 13 #> gene1 2 6 10 14 #> gene2 3 7 11 15 #> gene3 4 8 12 16 #>
## Row data (genomic ranges) rowRanges <- GRanges( seqnames = rep("1", times = 3L), ranges = IRanges( start = seq(from = 1L, to = 201L, by = 100L), end = seq(from = 100L, to = 300L, by = 100L) ) ) ## Note that we haven't defined the transgene here. ## It will be handled automatically in the function call. names(rowRanges) <- paste0("gene", seq_len(3L)) print(rowRanges)
#> GRanges object with 3 ranges and 0 metadata columns: #> seqnames ranges strand #> <Rle> <IRanges> <Rle> #> gene1 1 1-100 * #> gene2 1 101-200 * #> gene3 1 201-300 * #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Column data colData <- S4Vectors::DataFrame( genotype = rep(c("wildtype", "knockout"), times = 1L, each = 2L), age = rep(c(3L, 6L), 2L), row.names = samples ) print(colData)
#> DataFrame with 4 rows and 2 columns #> genotype age #> <character> <integer> #> sample1 wildtype 3 #> sample2 wildtype 6 #> sample3 knockout 3 #> sample4 knockout 6
## Note that this returns with the dimnames sorted. x <- makeSummarizedExperiment( assays = assays, rowRanges = rowRanges, colData = colData, transgeneNames = "EGFP" ) print(x)
#> class: RangedSummarizedExperiment #> dim: 4 4 #> metadata(3): date wd sessionInfo #> assays(1): counts #> rownames(4): EGFP gene1 gene2 gene3 #> rowData names(0): #> colnames(4): sample1 sample2 sample3 sample4 #> colData names(2): genotype age