Pathway Activity Analysis Pipeline
This page documents the gene set activity analysis pipeline. See Methods for the manuscript prose.
Method
- Framework: Univariate Linear Model (ULM) in the decoupler R package
- Input: Wald z-scores from DESeq2 (log₂FC / SE)
- Pre-filtering: None — all genes with non-missing statistics included to preserve full null distribution
Gene Sets
- Source: MSigDB v2026.1 via OmniPath (downloaded January 2026)
- Collections: Hallmark, KEGG, Reactome, WikiPathways, BioCarta, Gene Ontology Biological Process
- Size filter: 10–400 overlapping genes per gene set
- Total tested: 7,774 gene sets
Model
ULM fits a linear model for each gene set, regressing gene-level statistics against gene-set membership to estimate pathway activity scores.
Significance
- Model-derived p-values corrected via Benjamini–Hochberg FDR
- Significance threshold: FDR < 0.05
Exclusions
- Dendritic cell populations excluded due to insufficient cells/pseudobulk profiles