S7 class for single-cell phylotranscriptomic expression data. This class stores expression matrices and metadata, with support for dimensional reductions and pseudobulking functionality.
Usage
ScPhyloExpressionSet(
strata = stop("@strata is required"),
strata_values = stop("@strata_values is required"),
expression = stop("@expression is required"),
groups = stop("@groups is required"),
name = "Phylo Expression Set",
species = character(0),
index_type = "TXI",
identities_label = "Identities",
null_conservation_sample_size = 5000L,
.null_conservation_txis = NULL,
.pseudobulk_cache = list(),
.TXI_sample = numeric(0),
metadata = NULL,
selected_idents = character(0),
idents_colours = list(),
reductions = list()
)
Arguments
- strata
Factor vector of phylostratum assignments for each gene
- strata_values
Numeric vector of phylostratum values used in TXI calculations
- expression
Sparse or dense matrix of expression counts with genes as rows and cells as columns
- groups
Factor vector indicating which identity each cell belongs to (derived from selected_idents column in metadata)
- name
Character string naming the dataset (default: "Phylo Expression Set")
- species
Character string specifying the species (default: NULL)
- index_type
Character string specifying the transcriptomic index type (default: "TXI")
- identities_label
Character string labeling the identities (default: "Cell Type")
- null_conservation_sample_size
Numeric value for null conservation sample size (default: 5000)
- .null_conservation_txis
Precomputed null conservation TXI values (default: NULL)
- .pseudobulk_cache
Internal cache for pseudobulked expression matrices by different groupings
- .TXI_sample
Internal storage for computed TXI values
- metadata
Data frame with cell metadata, where rownames correspond to cell IDs and columns contain cell attributes
- selected_idents
Character string specifying which metadata column is currently used for grouping cells
- idents_colours
List of named character vectors specifying colors for each identity level, organized by metadata column name
- reductions
List of dimensional reduction matrices (PCA, UMAP, etc.) with cells as rows and dimensions as columns
Details
The ScPhyloExpressionSet class provides a comprehensive framework for single-cell phylotranscriptomic analysis. Key features include:
Identity Management:
The selected_idents
property determines which metadata column is used for grouping cells.
When changed, it automatically updates the groups
property and invalidates cached
pseudobulk data to ensure consistency.
Dimensional Reductions:
The reductions
property stores pre-computed dimensional reductions (PCA, UMAP, etc.).
If not provided during construction from Seurat objects, basic PCA and UMAP are computed
automatically.
Color Management:
idents_colours
allows custom color schemes for different metadata columns, ensuring
consistent visualization across plots.
Computed Properties: Several properties are computed automatically when accessed:
available_idents
- Character vector of factor columns in metadata that can be used for grouping (automatically detected from metadata)expression_collapsed
- Matrix of pseudobulked expression data (genes x identities), created by summing expression within each identity groupTXI_sample
- Named numeric vector of TXI (Transcriptomic Age Index) values for each cell, computed using efficient C++ implementation
Inherited computed properties from PhyloExpressionSetBase include:
gene_ids
- Character vector of gene identifiersidentities
- Character vector of identity labelssample_names
- Character vector of sample names (cell IDs)num_identities
- Integer count of unique cell types/identitiesnum_samples
- Integer count of total cellsnum_genes
- Integer count of genesnum_strata
- Integer count of phylostrataindex_full_name
- Full name of the transcriptomic index typegroup_map
- List mapping identity names to cell IDsTXI
- Numeric vector of TXI values for each identity (computed from pseudobulked expression)null_conservation_txis
- Matrix of null conservation TXI values for statistical testing
These properties use lazy evaluation and caching for optimal performance.
Examples
if (FALSE) { # \dontrun{
# Create from Seurat object
sc_set <- ScPhyloExpressionSet_from_seurat(seurat_obj, strata)
# Switch to different cell grouping
sc_set@selected_idents <- "development_stage"
# Access pseudobulked data (computed automatically)
pseudobulk <- sc_set@expression_collapsed
# Access TXI values for each cell
txi_values <- sc_set@TXI_sample
} # }