Differential Gene Expression Analysis Pipeline
This page documents the DESeq2-based DEG pipeline used in this study. See Methods for the manuscript prose.
Pipeline Steps
1. Pseudobulk Aggregation
- Raw counts summed per donor–cell-type pair
- Donor–cell type groups with fewer than 10 cells discarded
2. Gene Filtering
- Genes detected (non-zero) in < 20% of pseudobulk samples removed
3. Batch Correction
- ComBat-seq (sva package) applied separately per cell type
- Covariates: sex, age, T2D status, ancestry
- Continuous covariates standardized prior to modeling
4. Model Design
Corrected counts modeled as:
~ sex + age + ancestry × T2D
5. Dispersion Estimation
- DESeq
fitType = "local"(local regression)
6. Contrast Strategy
- Primary: ancestry-specific contrasts (e.g., T2D vs healthy within each ancestry group)
- No overall T2D coefficient reported: because ancestry-specific heterogeneity means the overall coefficient is a sample-size-weighted average that can be skewed by high-magnitude signals in a subset of ancestries
7. Significance Thresholds
- Adjusted P value < 0.05 (Benjamini–Hochberg)
- Absolute log₂ fold-change ≥ 0.5
8. Output for Downstream Analysis
- Wald z-scores (log₂FC / SE) passed to decoupler ULM for pathway activity analysis