Choose Language Hide Translation Bar

Application of JMP Data Mining and Multivariate Analysis Tools in Coffee/Tea Health (2019-US-30MP-197)

Level: Intermediate


Patrick Giuliano, Senior Quality Engineer, Abbott
Annu Wu, UCLA Neurobiology and Neuroscience Major
Mason Chen, Stanford OHS Gifted Student and CSBB Certified, Stanford Online High School (OHS)


The purpose of this project is to determine which Starbucks drinks among all coffee and tea options are best for cardiovascular disease (CVD) prevention and overall good health. A science-based health index is constructed to consider different coffee/tea nutritional constituents, including saturated fat, cholesterol, sodium, carbohydrates, dietary fiber, sugars, protein and caffeine. Antioxidant activity of flavonoids from caffeine can reduce free radical formation and scavenge free radicals. Principal components analysis (PCA) is used to explore all factors in the analysis and to inform on the utility of the health index in relation to its link to CVD prevention and net healthiness. Principal component 1 is more relevant to most unhealthy constituents such as sugars, carbohydrates, saturated fat and total fat. Principal component 2 is more related to beneficial health due to caffeine content.  Additionally, dietary fiber and caffeine are most opposite versus the other unhealthy constituents along the direction of both the first and second principal components on the loading plot. PCA eigen analysis is a very powerful computational and visual diagnostic tool for discrimination and classification of coffee product types based on patterns in nutritional constituents. To avoid variance inflation due to the contribution of the many constituents in the analysis, the original data has been Z-transformed and JMP loading plots are standardized. A novel PCA-based health index is derived based on the eigenvalues and eigenvectors of the first two principal components. The new PCA-based health index is also compared and correlated to a previously established science-based health index (~70%-80% R-Squared Curve Fitting). Due to the orthogonality of principal eigen analysis, the remaining eight principal components are neutral on the health index (~0% R-Square). The PCA has also demonstrated the Pareto concept (the first 20% of principal components have addressed ~80% of the total variance).


We would like to acklowledge Dr. Charles Chen, PhD for his thought contributions in various aspects of this project.

Hi @elaine_daniloff: Nice meeting you again at the Summitt and fumbling into you at the airport! You can access all the materials here that I cited during my presentation.  Please checkout the hyperlink I included in my journal file from one of @julian 's old webinars which has a wonderful little introductory explanation of PCA, in a very practical way. I re-copy it here for your convenience.  He does the exact same demo I showed you at the airport where we generate the principal components, view their equations in column properties (as independent linear combinations of the input variables), and then demonstrate that they are completely uncorrelated with (orthogonal to) each other using the Fit Y by X platform in JMP.