Machine learning is the current buzz-phrase meant to encompass the computer algorithms used to make decisions, predictions, or classifications based on data.
When I joined JMP in 2014, I came fresh from completing a PhD in statistics at a school with a long history of excellence in agricultural experiments and mixed modeling. This was the world I was used to: small experiments designed to maximize the information received through judicious planning, almost always some experiment design feature that required mixed model techniques to analyze (the simplest being a split-plot design), and the goal of determining what treatment(s) result in the best outcomes. This was true whether the scientist I worked with was in animal science, entomology, food science, or education.
Shortly after arriving at JMP, the buzz around Big Data reached its peak. A couple of years after that came the rise in Machine Learning. It’s not surprising that Machine Learning’s rise trailed that of Big Data. What were people supposed to do with all the data they’d collected? Analyze it using the tools in the Machine Learning toolbox, of course.
My formal education hadn’t particularly set me up for either of these things, but as I began my career as a Statistician Developer, these topics were what I needed to get up to speed on and quickly! So I did what every student does these days when taking first steps into a new topic: I visited Wikipedia.
“Machine learning is an interdisciplinary field that uses statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) from data, without being explicitly programmed.”
This was a bit stunning. How is it possible for a computer system to “learn” anything without being explicitly programmed? We haven’t achieved Skynet, yet, that I’m aware! Time to keep digging. Encyclopaedia Britannica says, “Machine learning, in artificial intelligence (a subject within computer science), discipline concerned with the implementation of computer software that can learn autonomously.”
Now I’ve lost my connection to statistics in the definition, but at least it’s more apparent that we’re interested in implementing software that can learn autonomously. The “without being explicitly programmed” refers to the fact that once the data are passed off to the algorithm, the algorithm does the rest without any further instruction from us.
Based on these definitions and my perception of the common usage of the phrase, I’ve come to think of machine learning as being the current buzz-phrase meant to encompass the computer algorithms used to make decisions, predictions, or classifications based on data.
Supervised vs. Unsupervised Learning
So does JMP do machine learning? Absolutely! And we have, technically, from the very first version! Were we that ahead of our time back in 1989? I’d like to think so. But let’s look into further detail about those algorithms that people are thinking of when they say Machine Learning.
Typically, Machine Learning algorithms are classified into different types – supervised and unsupervised learning. With supervised learning, the algorithms work with both the predictors (the X columns) and a response (Y column). The training data used “supervises” the algorithm by telling it what the response should be for the given values of the predictor variables. With unsupervised learning, only the predictors are used by the algorithms, and any patterns are determined solely by them. There is no “truth” provided – and in fact we may not know the “truth.”
What machine learning algorithms in JMP fall into which categories?
Supervised
|
Unsupervised
|
Decision Trees *
|
Clustering
|
Neural Networks *
|
Self-Organizing Maps
|
Bootstrap Forest *
|
Association Analysis
|
Regression
|
Singular Value Decomposition
|
K Nearest Neighbors *
|
|
Naïve Bayes *
|
|
* Some features of these platforms are JMP Pro only.
Many people are surprised to see regression on the list, and it is certainly not among what’s commonly thought of as machine learning, but it is the most straightforward form of supervised learning. And Fit Y by X was in JMP 1! Clustering was added in JMP 3, and the Machine Learning options kept coming after that.
For those of you who are familiar with one or more of these algorithms, you may notice a similarity between them. Many models produce amazing predictive accuracy, but if you tried to explain what the coefficients or cut-offs mean, you would likely be hard pressed to do so. I can follow a decision tree through its cut points to arrive at a prediction, but there’s no inherent meaning behind where those cut points are. I have no idea what the parameter estimates in a neural net model mean in the “real world”, but neural nets are some of the best predictive performing Machine Learning models out there.
By contrast, a regression model provides parameter estimates that have inherent meaning for users. If a coefficient is positive, we know that increasing the value of that predictor by one unit will increase the response by the coefficient value. This ties in to what Dr. Galit Shmueli refers to as “To Explain or To Predict?” One of the best compromises in this trade off is in the Generalized Regression personality of the Fit Model platform. As my colleague Clay Barker has discussed, using the Lasso or Elastic Net improves predictive performance, but the model can still be explained.
But does the lack of interpretability matter? Maybe, maybe not. If the use case is strictly focused on achieving the most accurate predictions of future events, then an interpretable model isn’t necessary. On the other hand, if stakeholders want more concrete ways to “improve their score”, an explainable model may be preferred. The key for any model is can you put the output into action.
Humans Are Still Needed
This brings up my final point. Even with all of the amazing things that the computer algorithms can do, a human analyst is still needed to deal with the higher-level questions.
- What data should be included or excluded from the analysis? Some algorithms have the capability of choosing important variables, but if a variable hasn’t been given to the computer in the beginning, it won’t be picked even if it could be highly predictive!
- Where is the point of diminishing returns? If a model is too complex to run in the time needed, it’s not going to be useful. Perhaps a less complex model gives almost as good results and can be run quickly. A well-known example of this is the winner of the Netflix prize. The winning model performed better than almost anyone expected, but the amount of data required and the processing made it impossible to run in the real-time environment Netflix needed to suggest what users should watch next.
- Are there inherent biases in the model? Obviously, a computer algorithm won’t have the biases that humans have, but it’s possible for an algorithm to result in a model that highly correlates with characteristics that are biased. In mortgage lending there are characteristics that are illegal to use in making lending decisions. However, there are several legal factors which can correlate highly with illegal ones. An algorithm may generate a model that almost perfectly rejects a particular ethnicity even without ethnicity being directly part of the data. That model would likely have legal consequences.
- When should the model be updated? Because the point of modeling is to produce some sort of output on which we can take action, once that action is taken the model will need to be updated. This update is needed because the changes that are implemented will affect the future inputs into the model. If we continue to use the old model, we are at risk of extrapolating beyond the bounds of the original data. The point at which a model should be updated is a question only humans can answer.
In my next post, I’ll tell you about two of my favorite machine learning algorithms: K Nearest Neighbors and Naïve Bayes. And we’ll take a look at these two methods in action on some fun data I collected myself.