Type 2 Diabetes PBMC Ancestry Paper

Focus

This project integrates multi-modal PBMC data — genotyping, scRNA-seq, and scATAC-seq — to identify immune features associated with type 2 diabetes and to trace those features to their genetic origins via ancestry-aware analysis.

Ancestry separation is itself based on genotyping data, so ancestry-associated immune differences can be resolved to specific genetic markers — bridging from cell-state phenotypes down to the variants that may drive them. The project asks which T2D-associated differences in PBMC composition, chromatin accessibility, and gene expression carry a genetic signal that varies with ancestry.

Working Research Question

How do PBMC immune profiles — across chromatin accessibility, gene expression, and genotype — differ in type 2 diabetes, and which of those differences carry a genetic signal that can be resolved through ancestry-aware analysis rather than reflecting cohort composition, environment, disease severity, medication exposure, or technical confounders?

Paper Goals

  • Identify reproducible PBMC immune changes reported in T2D across modalities (chromatin accessibility, transcription, genotype).
  • Distinguish cell-composition shifts from per-cell transcriptional or functional changes.
  • Use genotyping data to anchor ancestry separation and resolve ancestry-associated immune differences to specific genetic markers.
  • Track whether ancestry-associated differences are directly tested, inferred indirectly, or not evaluated.
  • Separate biological ancestry (genetic markers) from social, environmental, and healthcare-access variables in the argument.
  • Build a source-backed outline with citations attached to each claim.

Core Pages

Ingested Literature

  • Gu et al. 2024 — Korean PBMC scRNA-seq, scTCR-seq, and scBCR-seq study reporting inflammatory monocyte states, cytotoxic T-cell expansion, and B-cell differentiation changes in T2D.
  • Tkachenko et al. 2024 — medRxiv preprint from a Russian cohort generating bulk blood RNA-seq (n=18) and PBMC scRNA-seq (n=4) from T2D and control participants. Reports NK cell depletion in T2D PBMCs (contrasting with Gu et al. 2024), increased CD4+ TCM/naive cells, and 146 bulk blood DEGs with PCA showing disease status does not explain most transcriptomic variance. Data accessions: GSE280401 (scRNA-seq), GSE280402 (bulk RNA-seq).
  • Tkachenko et al. 2025 — cross-study meta-analysis of eight T2D blood RNA-seq datasets reporting low individual-study concordance, 2065 meta-analysis DEGs, and pathway themes involving neutrophil effector biology, ERAD, mTOR, oxidative stress, and RNA splicing.
  • Li et al. 2025 — PBMC scRNA-seq reanalysis using GSE268210 T2D samples and GSE244515 controls, reporting T-cell metabolic subtypes, stronger T-cell-monocyte communication, TF activity patterns, machine-learning subtype classifiers, and drug-enrichment hypotheses.
  • Huang et al. 2022 — islet RNA-seq and pancreatic scRNA-seq study identifying SLC2A2, SERPINF1, RASGRP1, and CHL1 as T2D diagnostic biomarkers, with a pancreatic fibroblast SERPINF1-NR2F2 regulatory axis. Uses pancreatic tissue, not PBMCs.
  • Tang et al. 2026 — pancreatic islet scRNA-seq + LASSO study identifying PNLIP, BUB1, CTSB, and NAMPT as T2DM signature genes, with qRT-PCR validation in peripheral blood from a Chinese cohort. Uses islet tissue as primary discovery and blood for validation; does not test ancestry effects.
  • Markelova et al. 2025 — our prior genotyping study of the same cohort (Chechen, Tatar, Yakut) demonstrating ancestry-specific distributions of T2D genetic clusters using partitioned polygenic scores. Yakuts show beta-cell dysfunction dominance, while Chechens and Tatars show obesity/insulin-resistance patterns. Provides the pre-existing genetic backbone for the current PBMC project, enabling within-person integration of pPGS with PBMC immune features.
  • Yabe et al. 2015
  • Zhao and Fang 2025 — Frontiers in Immunology PBMC scRNA-seq study from a small Chinese cohort (3 T2DM, 3 healthy controls), reporting T-cell and monocyte DEG/pathway findings involving TNF/NF-kB, interferon-gamma response, T-cell receptor signaling, chemokine signaling, and TNFRSF1A-centered network interactions. Data accession: GSE255566. Does not test ancestry effects. — review establishing T2D in East Asians as β-cell-dysfunction-dominant with lower insulin resistance, contrasting with the insulin-resistance-dominant paradigm in Europeans. Documents East-Asian-specific T2D genetic loci (KCNQ1, UBE2E2, C2CD4A/B, PTPRD, SRR, SPRY2, CDC123) and greater incretin-based therapy efficacy in Asians. Provides the pathophysiological framework for ancestry-specific T2D mechanisms that directly aligns with our same-cohort pPGS findings from Markelova et al. 2025.

Manuscript Draft

The manuscript/ folder contains the actual paper draft text, split into section notes. Use concepts/, references/, and synthesis/ as supporting material, but keep paper-ready prose in manuscript/.

Research Results

The research-results/ folder contains findings generated from this project’s own data analyses. Use it for analysis summaries, QC summaries, model outputs, figure/table result notes, and data-derived observations before they are converted into polished manuscript prose in Results.

Source Intake Workflow

  1. Put PDFs, abstracts, exported citations, or rough notes in _raw/.
  2. Ingest each source into references/ using the literature note template.
  3. Promote cross-source conclusions into synthesis/ only when supported by multiple references.
  4. Keep uncertainty explicit when ancestry is a proxy for unmeasured variables.

Claim Discipline

  • Use Evidence: lines with wikilinks to source notes for every manuscript claim.
  • Mark claims as established, mixed, single-study, or hypothesis until enough sources are reviewed.
  • Do not collapse ancestry, race, ethnicity, and geography into one construct unless a source does so and the limitation is stated.

Open Questions

  • Which PBMC cell types show the most consistent T2D-associated differences?
  • Does the NK cell depletion in T2D PBMCs observed by Tkachenko et al. 2024 (Russian cohort, n=2/group scRNA-seq) replicate in this project’s scRNA-seq data, or does it follow the Gu et al. 2024 NK trend? This discrepancy could serve as a useful replication test.
  • Are observed immune differences driven by altered cell proportions, activation states, gene expression, cytokine production, or metabolic rewiring?
  • Which studies include ancestry-aware design or ancestry-stratified analysis?
  • Which results replicate across ancestries, and which appear ancestry-specific?
  • What covariates are consistently controlled: age, sex, BMI, glycemia, medication, infection status, diet, socioeconomic variables, and batch?
  • Which ancestry-associated PBMC signals in this project overlap cross-study T2D blood transcriptomic themes versus appearing project-specific?
  • How should the manuscript distinguish whole-blood neutrophil signals from PBMC-intrinsic immune signatures?
  • Which Li et al. 2025 immunometabolic subtype signals replicate in independent T2D PBMC cohorts rather than reflecting reuse of GSE268210?
  • Do ancestry-associated findings in this project align with T-cell metabolic subtype differences, or are they independent axes of variation?
  • Which T2D-associated chromatin accessibility peaks (scATAC-seq) colocalize with ancestry-informative genetic variants?
  • Do expression quantitative trait loci (eQTL) or chromatin QTLs in PBMCs explain ancestry-stratified T2D immune differences?
  • Are there genetic markers (SNPs, haplotypes) — beyond admixture proportion — that directly associate with T2D PBMC cell-state variation?
  • Do ancestry-specific T2D genetic mechanisms (beta-cell vs. obesity/insulin-resistance dominant) correlate with specific PBMC immune profiles within the same cohort?
  • Are pPGS distributions from our prior analysis consistent when the same genotyping-based ancestry framework is applied alongside PBMC immune data? (Ancestry inference is from genotyping, not PBMC data — the question is whether the genetic backbone remains stable when integrated with the new multi-omic layer.)
  • Could the pPGS approach be extended to partition PBMC immune features by their genetic architecture, analogous to Smith et al.’s T2D cluster framework?
  • Do the TNF/NF-kB, interferon-gamma, T-cell receptor, and chemokine pathway themes from Zhao and Fang 2025 appear in this project’s ancestry-aware PBMC scRNA-seq or scATAC-seq analyses?