Cross-Study Heterogeneity in T2D Blood Transcriptomics

Cross-study heterogeneity in T2D blood transcriptomics is the instability of differential-expression signals across independent blood RNA-seq cohorts.

Key Ideas

  • Tkachenko et al. 2025 found very low concordance in T2D-associated effect sizes across eight blood RNA-seq datasets.
  • Four of eight analyzed datasets had zero significant DEGs, while three datasets had substantial DEG counts.
  • Only five genes, FBLN2, TPCN1, PC, SHANK1, and PLD4, were differentially expressed in the same direction across all three datasets that yielded substantial DEG counts.
  • Principal component analysis showed strong dataset-level separation, indicating pronounced batch or cohort effects.
  • Batch correction reduced dataset separation but did not create clear case-control separation, suggesting high inter-individual variability.

Sources of Heterogeneity To Track

  • Blood cell-type proportions differ across individuals and datasets.
  • Whole blood and PBMC differ in cellular composition, especially because whole blood includes granulocytes while PBMCs do not.
  • Globin transcript abundance can add noise to whole-blood RNA-seq when globin depletion is not used.
  • Library preparation, sequencing protocol, infection status, tuberculosis status, site, timepoint, sex, BMI, and other covariates can affect observed expression.
  • Population structure and ancestry may contribute to expression differences, but Tkachenko et al. do not directly test ancestry-stratified effects.

Paper-Relevant Use

  • This page supports a conservative interpretation of any single-cohort T2D PBMC signature.
  • It should be linked when manuscript prose discusses replication, cohort confounding, sample-type differences, or ancestry-aware interpretation.
  • It pairs with the evidence map as a limitation and claim-discipline page.

Open Questions

  • Which heterogeneity sources can this project directly model versus only acknowledge?
  • Are ancestry-associated immune differences robust after accounting for sample type, site, and batch?
  • Do this project’s strongest signals align with the five cross-study concordant genes or the meta-analysis-only pathways?

Sources