cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
0 Kudos

Bayesian Kernel Machine Regression(BKMR) and Weighted Quantile Sum (WQS) Regression For Correlated and Missing Multivariate Data Structures

Development of BKMR and WQS models for studying exposure-outcome in environmental epidemiology. Methods are becoming increasingly popular in model building for longitudinal clinical data sets that have large block missingness, high collinearity (in time or space), or for data structures that are otherwise more robust to parametric assumptions.

1 Comment

Thank you @hha0101  for your contribution here! 

 

I include some references to support this request.


Small, R., & Coull, B. (2018). A Variational Inference Algorithm for BKMR in the Cross-Sectional Setting. 
arXiv: Computation (https://arxiv.org/pdf/1811.02609.pdf) propose 
use of Bayesian Kernel Machine Regression (BKMR) with a Variational Inference (VI) algorithm to help handle a non-longitudinal study design (one that does not track the same individual(s) over time).  It also proposes the use of a Generalized Least Squares (GLS) modification to make the credible intervals generated more conservative than they would be otherwise using BKMR with VI. BKMR by VI with GLS is also claimed to provide much faster analysis time compared to MCMC methods.    

 

A description of Weighted Quantile Sum (WQS) Regression for highly correlated data structures is given per J Agric Biol Environ Stat. 2015 March ; 20(1): 100–120 "Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting" doi:10.1007/s13253-014-0180-3:  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6261506/pdf/nihms-993354.pdf.  

"In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) suffer from collinearity and variance inflation, and shrinkage methods have limitations in selecting among correlated components. We propose a weighted quantile sum (WQS) approach to estimating a body burden index, which identifies “bad actors” in a set of highly correlated environmental chemicals. We evaluate and characterize the accuracy of WQS regression in variable selection through extensive simulation studies through sensitivity and specificity (i.e., ability of the WQS method to select the bad actors correctly and not incorrect ones). We demonstrate the improvement in accuracy this method provides over traditional ordinary regression and shrinkage methods (lasso, adaptive lasso, and elastic net). Results from simulations demonstrate that WQS regression is accurate under some environmentally relevant conditions, but its accuracy decreases for a fixed correlation pattern as the association with a response variable diminishes. Nonzero weights (i.e., weights exceeding a selection threshold parameter) may be used to identify bad actors; however, components within a cluster of highly correlated active components tend to have lower weights, with the sum of their weights representative of the set."

Additional commentary on the emergent application and use of Bayesian methods in model building for large Clinical data sets is included in a recent talk between Chris Holmes (Professor of Biostatistics at the University of Oxford) and Glenn Right Calopy here:  https://www.youtube.com/watch?v=lvkT-i0ki4Y&t=1683s