Predicting Protein Structure from Highly Correlated Factors Using the Partial Least Squares Platform in JMP® 14
Sep 27, 2018 2:31 PM
| Last Modified: Oct 31, 2018 6:18 AM
Stan Siranovich, Crucial Connection, LLC
Much has been written in the popular press about “Big Data” and its uses, both good and bad. Less well reported, but just as revolutionary, has been the development of statistical discovery software and analytical techniques used to unearth relationships and to make predictions using this data. One such statistical technique is that of Partial Least Squares.
In this poster session, we will use JMP Statistical Discovery Software and the Partial Least Squares platform to explore protein tertiary structure, downloaded from a large public data set of 45,730 rows by 10 columns. In particular, we will use Partial Least Squares analysis to predict the Root Mean Square Deviation (RMSD) between two proteins from nine very highly correlated variables. We will delve into an explanation of the output data and what it means, then look “under the hood” at what calculations or algorithms the software performed to give us our result.