When I joined JMP in 2014, I came fresh from completing a PhD in statistics at a school with a long history of excellence in agricultural experiments and mixed modeling. This was the world I was used to: small experiments designed to maximize the information received through judicious planning, almost always some experiment design feature that required mixed model techniques to analyze (the simplest being a splitplot design), and the goal of determining what treatment(s) result in the best outcomes. This was true whether the scientist I worked with was in animal science, entomology, food science, or education.
Shortly after arriving at JMP, the buzz around Big Data reached its peak. A couple of years after that came the rise in Machine Learning. It’s not surprising that Machine Learning’s rise trailed that of Big Data. What were people supposed to do with all the data they’d collected? Analyze it using the tools in the Machine Learning toolbox, of course.
My formal education hadn’t particularly set me up for either of these things, but as I began my career as a Statistician Developer, these topics were what I needed to get up to speed on and quickly! So I did what every student does these days when taking first steps into a new topic: I visited Wikipedia.
“Machine learning is an interdisciplinary field that uses statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) from data, without being explicitly programmed.”
This was a bit stunning. How is it possible for a computer system to “learn” anything without being explicitly programmed? We haven’t achieved Skynet, yet, that I’m aware! Time to keep digging. Encyclopaedia Britannica says, “Machine learning, in artificial intelligence (a subject within computer science), discipline concerned with the implementation of computer software that can learn autonomously.”
Now I’ve lost my connection to statistics in the definition, but at least it’s more apparent that we’re interested in implementing software that can learn autonomously. The “without being explicitly programmed” refers to the fact that once the data are passed off to the algorithm, the algorithm does the rest without any further instruction from us.
Based on these definitions and my perception of the common usage of the phrase, I’ve come to think of machine learning as being the current buzzphrase meant to encompass the computer algorithms used to make decisions, predictions, or classifications based on data.
So does JMP do machine learning? Absolutely! And we have, technically, from the very first version! Were we that ahead of our time back in 1989? I’d like to think so. But let’s look into further detail about those algorithms that people are thinking of when they say Machine Learning.
Typically, Machine Learning algorithms are classified into different types – supervised and unsupervised learning. With supervised learning, the algorithms work with both the predictors (the X columns) and a response (Y column). The training data used “supervises” the algorithm by telling it what the response should be for the given values of the predictor variables. With unsupervised learning, only the predictors are used by the algorithms, and any patterns are determined solely by them. There is no “truth” provided – and in fact we may not know the “truth.”
What machine learning algorithms in JMP fall into which categories?
Supervised 
Unsupervised 
Decision Trees * 
Clustering 
Neural Networks * 
SelfOrganizing Maps 
Bootstrap Forest * 
Association Analysis 
Regression 
Singular Value Decomposition 
K Nearest Neighbors * 

Naïve Bayes * 

* Some features of these platforms are JMP Pro only.
Many people are surprised to see regression on the list, and it is certainly not among what’s commonly thought of as machine learning, but it is the most straightforward form of supervised learning. And Fit Y by X was in JMP 1! Clustering was added in JMP 3, and the Machine Learning options kept coming after that.
For those of you who are familiar with one or more of these algorithms, you may notice a similarity between them. Many models produce amazing predictive accuracy, but if you tried to explain what the coefficients or cutoffs mean, you would likely be hard pressed to do so. I can follow a decision tree through its cut points to arrive at a prediction, but there’s no inherent meaning behind where those cut points are. I have no idea what the parameter estimates in a neural net model mean in the “real world”, but neural nets are some of the best predictive performing Machine Learning models out there.
By contrast, a regression model provides parameter estimates that have inherent meaning for users. If a coefficient is positive, we know that increasing the value of that predictor by one unit will increase the response by the coefficient value. This ties in to what Dr. Galit Shmueli refers to as “To Explain or To Predict?” One of the best compromises in this trade off is in the Generalized Regression personality of the Fit Model platform. As my colleague Clay Barker has discussed, using the Lasso or Elastic Net improves predictive performance, but the model can still be explained.
But does the lack of interpretability matter? Maybe, maybe not. If the use case is strictly focused on achieving the most accurate predictions of future events, then an interpretable model isn’t necessary. On the other hand, if stakeholders want more concrete ways to “improve their score”, an explainable model may be preferred. The key for any model is can you put the output into action.
This brings up my final point. Even with all of the amazing things that the computer algorithms can do, a human analyst is still needed to deal with the higherlevel questions.
In my next post, I’ll tell you about two of my favorite machine learning algorithms: K Nearest Neighbors and Naïve Bayes. And we’ll take a look at these two methods in action on some fun data I collected myself.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.