Columbia Technology Ventures

Multi-knockoff filter for high-dimensional survival -omics with controlled false discovery rate

This technology is a knockoff-based statistical method that utilizes existing knockoff procedures in combination with newly proposed features to allow for variable selection with a controlled, stable false discovery rate (FDR).

Unmet Need: Reproducible false discovery rate control in finite-sample ‘omics survival studies

Current workflows for finding mediators in large-scale epigenomic or transcriptomic datasets rely on single-run knockoffs, Benjamini–Hochberg procedures, or ad-hoc permutations. These approaches become unstable when the number of features dwarfs sample size, leading to irreproducible hit lists and wasted validation effort. A tool that keeps the false-discovery rate (FDR) in check while remaining stable from run to run would accelerate biomarker discovery and de-risk downstream assay development.

The Technology: Multi-knockoff survival mediation filter with data-driven thresholds

This technology, called CoxMKF, is a statistical method that controls the false discovery rate (FDR) in high-dimensional survival analysis using multiple model-X knockoffs. It generates several knockoff copies of each feature—synthetic variables that preserve the correlation structure but are independent of the outcome. Each augmented dataset is analyzed using a Cox proportional hazards model, and test statistics are aggregated using the Aggregation of Knockoffs (AKO) framework to reduce Monte Carlo variability. The method uses two key statistics: (1) the Cox Coefficient Difference (CCD), which compares the Cox regression coefficients of each variable with its knockoffs, and (2) the Multiple Cox Statistic (MCS), which aggregates CCDs across knockoff replicates. These feed into a data-driven thresholding rule to control FDR, even when the number of predictors exceeds the sample size.

CoxMKF has been validated on lung cancer data from The Cancer Genome Atlas (TCGA).

Applications:

  • Epigenetic and methylation-based biomarker discovery
  • Oncology, aging, and toxicogenomics target identification
  • High-throughput companion to transcriptomic, proteomic, or metabolomic screens
  • Statistical research tool for next-generation knockoff design

Advantages:

  • Exact false discovery rate (FDR) control in finite samples maintains statistical rigor even when p is much greater than the sample size
  • Stable run-to-run stability
  • Native handling of censored survival outcomes
  • Open-source R package; drop-in for existing pipelines

Lead Inventor:

Zhonghua Liu, Sc.D.

Related Publications:

Tech Ventures Reference: