Application of JMP Data Mining and Multivariate Analysis Tools in Coffee/Tea Health (2019-US-30MP-197)
Aug 27, 2019 12:45 PM
| Last Modified: Nov 5, 2019 5:00 PM
Starbucks Coffee 10172019.jmp
Starbucks Coffee Data Analysis.jrn
Patrick Giuliano, Senior Quality Engineer, Abbott Annu Wu, UCLA Neurobiology and Neuroscience Major Mason Chen, Stanford OHS Gifted Student and CSBB Certified, Stanford Online High School (OHS)
The purpose of this project is to determine which Starbucks drinks among all coffee and tea options are best for cardiovascular disease (CVD) prevention and overall good health. A science-based health index is constructed to consider different coffee/tea nutritional constituents, including saturated fat, cholesterol, sodium, carbohydrates, dietary fiber, sugars, protein and caffeine. Antioxidant activity of flavonoids from caffeine can reduce free radical formation and scavenge free radicals. Principal components analysis (PCA) is used to explore all factors in the analysis and to inform on the utility of the health index in relation to its link to CVD prevention and net healthiness. Principal component 1 is more relevant to most unhealthy constituents such as sugars, carbohydrates, saturated fat and total fat. Principal component 2 is more related to beneficial health due to caffeine content. Additionally, dietary fiber and caffeine are most opposite versus the other unhealthy constituents along the direction of both the first and second principal components on the loading plot. PCA eigen analysis is a very powerful computational and visual diagnostic tool for discrimination and classification of coffee product types based on patterns in nutritional constituents. To avoid variance inflation due to the contribution of the many constituents in the analysis, the original data has been Z-transformed and JMP loading plots are standardized. A novel PCA-based health index is derived based on the eigenvalues and eigenvectors of the first two principal components. The new PCA-based health index is also compared and correlated to a previously established science-based health index (~70%-80% R-Squared Curve Fitting). Due to the orthogonality of principal eigen analysis, the remaining eight principal components are neutral on the health index (~0% R-Square). The PCA has also demonstrated the Pareto concept (the first 20% of principal components have addressed ~80% of the total variance).