Pathway Activity Analysis Pipeline

This page documents the gene set activity analysis pipeline. See Methods for the manuscript prose.

Method

  • Framework: Univariate Linear Model (ULM) in the decoupler R package
  • Input: Wald z-scores from DESeq2 (log₂FC / SE)
  • Pre-filtering: None — all genes with non-missing statistics included to preserve full null distribution

Gene Sets

  • Source: MSigDB v2026.1 via OmniPath (downloaded January 2026)
  • Collections: Hallmark, KEGG, Reactome, WikiPathways, BioCarta, Gene Ontology Biological Process
  • Size filter: 10–400 overlapping genes per gene set
  • Total tested: 7,774 gene sets

Model

ULM fits a linear model for each gene set, regressing gene-level statistics against gene-set membership to estimate pathway activity scores.

Significance

  • Model-derived p-values corrected via Benjamini–Hochberg FDR
  • Significance threshold: FDR < 0.05

Exclusions

  • Dendritic cell populations excluded due to insufficient cells/pseudobulk profiles