Using JMP® to Compare Models from Various Environments (2020-US-45MP-593)
Lucas Beverlin, Statistician, Intel, Corp.
The Model Comparison platform is an excellent tool for comparing various models fit within JMP Pro. However, it also has the ability to compare models fit from other software as well. In this presentation, we will use the Model Comparison platform to compare various models fit to the well-known Boston housing data set in JMP Pro15, Python, MATLAB, and R. Although JMP can interact with those environments, the Model Comparison platform can be used to analyze models fit from any software that can output its predictions. |
Auto-generated transcript...
Speaker | Transcript |
Lucas | Okay, thanks everyone for coming and listen to my talk. My name is Lucas Beverlin. I'm a statistician at Intel. |
And today I'm going to talk about using JMP to compare models from various environments. | |
Okay so currently JMP 15 Pro is the latest and greatest that JMP has out and if you want to fit the model and they've got several different tools to do that. | |
There's the fit model platform. There's the neural platform on neural network partition platform. If you want classification and regression trees. The nonlinear platform for non linear modeling and there's several more. | |
And so within JMP 15 I think it came out in 12 or 13 but this model comparison platform is a very nifty tool that you can use to compare | |
model fits from various platforms within JMP. So if you have a tree and a neural network and you're not really sure which one's better. Okay, you could flip back and forth between the two. But now with this, | |
you have everything on the same screen. It's very quick and easy to tell, is this better that that, so on so forth. | |
So that being said, JMP can fit a lot of things, but, alas, it can't fit everything. So just give a few ideas of some things that can't fit. So, for example, | |
those that do a lot of machine learning and AI might fit something like an auto encoder or convolutional neural network that generally requires lots of activation functions or yes, lots of hidden layers nodes, | |
other activation functions than what's offered by JMP so JMP's not going to be able to do a whole lot of that within JMP. | |
Another one is something called projection pursuit regression. Another one is called multivariate adaptive regression splines. So there are a few things unfortunately JMP can't do. | |
R, Python, and MATLAB. | |
There's several more out there, | |
but I'm going to focus on those three. Now that being said, the ideas I'm going to discuss here, you want to go fit them in C++ or Java or Rust or whatever other language | |
comes to mind, you should be able to use a lot of those. | |
So we can use the model comparison platform, as I've said, to compare from other software as well. So what will you need? | |
So the two big things you're going to need are the model predictions from whatever software you use to fit the model. | |
And generally, when we do model fitting, particularly with larger models, you may split the data into training validation and/or test sets. | |
You're going to need something that tells all the software which is training, which is validation, which is test, because you're going to want those to be consistent when you're comparing the fits. | |
OK, so the biggest reason I chose R, Python and MATLAB to focus on for this talk is that turns out JMP and scripting language can actually create their own sessions of R and run code from it. | |
So this picture here just shows very quickly if I wanted to fit a linear regression model to some output | |
To to the Boston housing data set. I'll work a lot more with that data set later. | |
But if you wanted to just very quickly fit a linear regression model in R and spit out the predictive values, you can do that. Of course you can do that JMP. But just to give a very simple idea. | |
So, | |
one thing to note so I'm using R 3.6.3 but JMP can handle anything as long as it's greater than 2.9. | |
And then similarly, Python, you can call your own Python session. So here the picture shows I fit the linear regression with Python. I'm not going to step through all the lines of code here but you get the basic idea. | |
Now, of course, with Python be a little bit careful in that the newest version of Python 3.8.5. But if you use Anaconda to install things, JMP has problems talking to it when it's greater than 3.6 so | |
since I'm using 3.6.5 for this demonstration. | |
And then lastly, we can create our own MATLAB session as well. | |
So here I'm using MATLAB 2019b. But basically, as long as your MATLAB version has come out in the last seven or eight years, it should work just fine. | |
Okay, so how do we tie everything together? So really, there's kind of a four-step process we're going to look at here. So first off, we want to fit each model. So we'll send each software the data and which set each observation resides. | |
Once we have the models fit, we want to output those fits and their predictions and add them to a data table that JMP can look at. | |
So of course my warning is, be sure you name things that you can tell where did you fit the model or how did you fit the model. I've examples of both coming up. | |
So step three, depending on the model and you may want to look at some model diagnostics. Just because a model fits...appears to fit well based on the numbers, one look at your residual plot, for example, and you may find out real quickly the | |
area of biggest interest is not fit very well. | |
Or there's something wrong with residuals so on so forth. So we'll show how to output that sort of stuff as well. | |
And then lastly we'll use the model comparison platform, really, to bring everything into one big table to compare numbers, much more easily as opposed to flipping back and forth and forth and back. | |
Okay, so we'll break down the steps into a little more detail now. So for the first step where we do model fitting, we essentially have two options. So first off, we can tell JMP via JSL to call your software of choice. | |
Send it the data and the code to fit it. And so, in fact, I'm gonna jump out of this for a moment and do exactly that. | |
So you see here, I have some code for actually calling R. | |
And then once it's done, I'll call Python and once it's done, I'll call MATLAB and then I'll tie everything together. Now I'll say more about the code here in a little bit, but it will take probably three or four minutes to run. So I'm going to do that now. | |
And we'll come back to him when we're ready. | |
So our other option is we create a data set with the validation. Well, we create a data set with the validation column and and/or a test column, depending on how many | |
sets were splitting our data into. We're scheduled to run on whatever software, we need to run on, of course output from that whatever it is we need. | |
So of course a few warnings. Make sure you're...whatever software you're using actually has what you need to fit the model. | |
Make sure the model is finished fitting before you try to compare it to things. | |
Make sure the output format is something JMP can actually read. Thankfully JMP can read quite a few things, so that's not the biggest of the four warnings. | |
But as I've warned you earlier, make sure the predictions from each model correspond to the correct observations from the original data set. | |
And so that comes back to the if it's training, if it's a training observation, when you fit it in JMP, | |
it better be a training observation when you fit it in whatever software using. If it's validation in JMP, it is the validation elsewhere. It's test in JMP, it's going to be test elsewhere. | |
So make sure things correspond correctly because the last thing we want to find out is to look at test sets and say, "Oh, this one fit way better." Well, it's because the observations fit in it were weighted different and didn't have any real outliers. So that ends up skewing your thinking. So | |
a word of caution, excuse me, a word of caution there. | |
Okay. So as I've alluded to, I have an example I'm currently running in the background. And so I want to give a little bit of detail as far as what I'm doing. So | |
it turns out I'm going to fit neural networks in R and Python and MATLAB. So if I want to go about doing that, within R, two packages I need to install in R on top of whatever base installing have and that's the Keras package and the Tensorflow package. | |
numpy, pandas and matplotlib. | |
So numpy to do some calculations pretty easily; pandas, pandas to do data...some data manipulation; and matplotlib should be pretty straightforward to create some plots. And then in MATLAB I use the deep learning tool box, whether you have access to that are not. | |
Okay. So step two, we want to add predictions to the JMP data table. So if you use JMP to call the software, you can use JSL code to retrieve those predictions and add them into a data table so then you can compare them later on. | |
So then the other way you can go about doing is that the software ran on its own and save the output, you can quickly tell JMP, hey go pull that output file and then do some manipulation to bring the predictions into whatever data table you have storing your results. | |
So now that we can also read the diagnostic plots. In this case what I generally am going to do is, I'm going to save those diagnostic plots as graphics files. So for me, it's going to be PNG files. But of course, whichever | |
graphics you can use. Now JMP can't hit every single one under the sun, but I believe PNG that maps jpgs and some and they...they have your usual ones covered. | |
So the second note I use this for the model comparison platform, but to help identify what you...what model you fit and where you fit it, | |
I generally recommend adding the following property for each prediction column that you add. And so we see here, we're sending the data table of interest, this property called predicting. | |
And so here we have the whatever it is you're using to predict things (now here in value probably isn't the best choice here) but | |
but with this creator, this tells me, hey, what software did I use to actually create this model. And so here I used R. It shows R so this would actually fit on the screen. Python and MATLAB were a little too long, but we can put whatever string we want here. | |
You'll see those when I go through the code in a little more detail here shortly. | |
So, and this comes in handy because I'm going to fit multiple models within R later as well. So if I choose the column names properly and I have multiple ones where R created it, I still know what model I'm actually looking at. | |
Okay, so this is what the typical model comparison dialog box looks like. | |
So one thing I'm going to note is that, so this is roughly what it would look like if I did a point and click at the end of all the model fitting. So you can see I have several predictors. So I've neural nets for a MATLAB, Python and R. | |
Various prediction forms; I used to JMP to fit a few things. | |
Now, oftentimes what folks will do is, they'll put this validation column as a group, | |
so that it'll group the training validation and test. I actually like the output a bit better when I stick it in the By statement here. So I'll show that here a little later. But you can put it either or | |
but I like the output this way better is the long and short of it. | |
So this is the biggest reason why it is now I can clearly see, these are all the training, | |
these are all the validation (shouldn't see by the headers) and these are all the test. | |
If you use validation as a group variable, you're going to get one big table with 21 entries in it. Now, there'll be an extra column. It says training validation test or in my case, it will be 012 but | |
this way with the words, I don't have to think as hard. I don't have to explain to anyone what 012 means so on so forth. So that was why I made the choice that I did. | |
Okay, so in the example I'm gonna break down here, I'm going to use the classic Boston housing data set. Now this is included within JMP. So that's why I didn't include it as a file in my presentation because if you have JMP, you've already got it. | |
So Harrison and Rubinfeld had several predictors of the median value of the house, such things such as per capita crime rate, the proportion of non retail business acres per town, | |
average number of rooms within whatever it is you're trying to buy, pupil to the teacher ratio by town (so if there's a lot of teachers and not quite as many students that generally means better education is what a lot of people | |
found) and several others. I'm not gonna really try to go through all 13 of them here. | |
Okay, so let me give a little bit of background as far as what models I looked at here. And then I'm going to delve into the JSL code and how I fit everything. | |
So some of the models, I've looked at. So first off, the quintessential linear regression model. So here you just see a simple linear regression. I just fit the median value to, looks like, by tax rate. | |
But of course I'll use a multiple linear regression and use all of them. | |
So, | |
But with 13 different predictors, and knowing some of them might be correlated to one another, I decided that maybe a few other types of regression would be worth looking at. | |
One of them is something called bridge regression. So really all it is, is it's linear regression with essentially an added constraint that | |
the squared values of my parameters can't be larger than some constant. | |
I can...turns out I can actually rewrite this as an optimization problem where some value of lambda corresponds to some value of C. | |
And so then I'm just trying to minimize this with this extra penalty term, as opposed to the typical least squares that you're used to seeing. | |
Now, this is called a shrinkage method because of course as I make this smaller and smaller, it's going to push all these closer and closer to zero. So, of course, some thought needs to be put into how restrictive do I want it to be. | |
Now with shrinkage, it's going to push everybody slowly towards zero. But with another type of penalty term, I can actually eliminate some terms altogether. | |
And I can use something called the lasso. And so the contraint here is, okay, instead of the squared parameter estimates, I'm just going to take the sum of the absolute value of those parameter estimates. | |
And so it turns out from that, what'll actually happen is those that are very weak actually get their parameter estimates set to zero itself, which kind of serves as a | |
elimination, if you will, of unimportant terms. | |
So to give a little bit of a visual as to what lasso and ridge regression are doing. So for ridge regression, the circle here represents the penalty term. | |
And here we're looking at the parameter space. And so the true least squares estimates would be here. So we're not quite getting there, because we have this additional constraint. | |
So in the end, we find where does...where do we get the minimum value that touches the circle, basically. And so this is, this would be our ridge regression parameter estimates. | |
For lasso, similar drawing, but you can see now with the absolute value, this is more of a diamond as opposed to a circle. Now note, this is two dimensions, of course, we're going to get into hyper spheres and all those shapes. | |
But you can see here, notice it touches right at the tip of the diamond. And so in this case beta one is actually zero. So that's how it eliminates terms. | |
Okay, so another thing we're going to look at is what's called a regression tree. Now JMP uses the partition platforms to do these. So just to give a very quick demo of what this shows, in that, ok so I have all of my data. | |
And so my first question I'll ask myself is, how many rooms are in the dwelling, and I know I can't have .943 of a room, so basically, | |
I have six rooms or less. So come down this part of the tree, let's not come down this part of the tree. Now if I have seven rooms or more, this tells me immediately I'm going to predict my median | |
value to be 37. Remember it's in tens of thousands of dollars, so make that $370,000. | |
If it's less than seven, then the next question I asked is, well, how big is lstat? | |
So if it's bigger than or equal to 14.43, I'll look at this node and suddenly my median housing estimates about 150 grand and if I come over here, it's gonna be about 233 grand. | |
So what regression trees really do is they're partitioning your input space into different areas. And we'er giving the same prediction to every value within that area. | |
So you can see here I've partitioned...now on this case, I'm taking a two dimensional one because it's easier to draw... | |
and so you can see this tree here, where I first look at x1. Now look at x2 here and ask another question about x1 and ask another question about x2, and this is how I end up partitioning the input space. | |
Now each of these five is going to have a prediction value. And that's essentially what this looks like. I look at this from up top. I'm going to get exactly this. But you can see here that the | |
prediction is a little bit different depending upon which of the five areas right. | |
Now, I'm not going to get into too much of the details on how exactly to fit one of these, but | |
James, Witten, Tibshirani and Friedman give a little bit; Leo ??? wrote the seminal book on it so you can take a look there. | |
So next off, I'll come to neural networks, which are being used a lot in machine learning and whatnot these days. And so this kind of gives a visual of what a neural network looks like. So here, this visual just uses five and the 13 inputs when passing them | |
to these...this hidden layer. And each of these is transformed via an activation function. | |
And for each of these activation functions, you get an output and we'll use... | |
oftentimes, we'll just use a linear regression of these outputs to predict the median value. | |
Okay, so | |
really, neural network are nonlinear models, at the end of the day, and really, they're called neural networks because the representation generally is how we viewed neurons as working within the human brain. | |
So each input can be passed to nodes in a hidden layer. At the hidden layer your inputs are pushed through an activation function and some output is calculated and each output can be passed to a node in another hidden layer or be an output of the network. | |
Now within JMP, you're only allowed two hidden layers. | |
Truth be told, as far as creating a neural network, there's nothing that says you can't have 20 for all that | |
we're concerned about now. Truth be told, there's some statistical theory that suggests that hey, we can approximate any continuous function, | |
given a few boundary conditions, with two hidden layers. So that's likely why JMP made the decision that they did. | |
linear, hyperbolic tangent and Gaussian radial basis. So in fact, | |
on these nodes here, notice the little curve here. I believe that is for the hyperbolic tangent function; linear will be a straight line going up; and Gaussian radial basis, I believe, will look more like a normal curve. | |
That's the neural network platform. So the last one we'll look at is something called projection pursuit regression. I wanted to pull something that JMP simply can't do just kind of give an example here. | |
Um, so projection pursuit regression was a model originally proposed by Jerome Friedman and Steutzle over at Stanford. Their model takes prediction...makes predictions of the form y equals the summation of beta i, f sub i, and a linear transformation of your inputs. | |
So really this is somewhat analogous to a neural network. You have one hidden layer here with k nodes and each with activation function f i | |
L. Turns out with projection pursue regression, we're actually going to estimate these f sub i as well. | |
Generally they're going to be some sort of smoother or a spline fit. Typically the f sub i are called ridge functions. Now we have alphas, we have betas and we have Fs we need to optimize over. So generally a stagewise fitting is done. I'm not going to get too deep in the details at this point. | |
Okay, so I've kind of gone through all my models. | |
So now I'm going to show some output and hopefully things look good. So one thing I'm going to note before I get into JMP is that | |
it's really hard to set seeds for the neural networks in R, Python for Keras. So do note that if you take my code and run it, you're probably not going to get exactly what I got, but it should be pretty close. So with that said, | |
let's see what we got here. So this was the output that I got. Now, unfortunately, things do not appear to have run perfectly. | |
So, | |
Lucas | what do I have here? So I have my training, my validation, and my test. And so we see very quickly that one of these models didn't fit very well. The neural net within R unfortunately something horrible happened. |
It must have caught a bad spot in the input space to start from and whatnot. And so it just didn't fit a very good model. | |
So unfortunately, starting parameters with nonlinear models matter; in some cases, we get bit by them. | |
But if we take a look at everything else, everything else seems to fit decently well. Now what is decently well, we can argue over that, but I'm seeing R squares, one above .5. I'm seeing root average | |
squared errors here around five or so, and even our average absolute errors are in the three range. Now for training, it looks like projection pursuit regressions did best. | |
If I come down to the validation data set, it still looks like R | |
projection pursuit did best. But if we look at the test data set, all of a sudden, no, projection pursuit regression was second, assuming we're gonna ignore the neural net from R, second worst. | |
Oftentimes in a framework like this, we're going to look at the test data set the closest because it wasn't used in any way, shape, or form to determine the model fit. | |
And we see based on that, It looks like | |
the ridge regression from JMP fit best. | |
We can see up here, it's R squared was .71 here before was about .73, and about .73 here, so we can see it's consistently fitting | |
the same thing through all three data sets. So if I were forced to make a decision, just based on what I see at the moment, I would probably go with | |
the ridge regression. So that being said, we have a whole bunch of diagnostics and whatnot down here. So if I want to look at what happened with that neural network from R, | |
I can see very quickly, something happened just a few steps into there. As you can see, it's doing a very lousy job of fitting because pretty much everything is predicted to be 220 some thousand. | |
So we know something went wrong during the fitting of this. | |
So we saw the ridge regression looked like the best one. So let's take a look at what it spits out. So I'll show in a moment my JSL code real quick that | |
shows how I did all this but, um, we can see here's the parameter estimates from the ridge regression. We can see the ridge diagnostic plots, so things hadn't really shrunk too much from the original estimates. | |
You can see from validation testing with log like didn't whatnot. | |
And over here on the right, we have our essentially residual plots. These are actual by predicted. | |
So you can see from the training, looks like there was a few that were rather expensive that didn't get predicted very well. We see fewer here than in the test set, it doesn't really look like we had too much trouble. We have a couple of points here a little odd, but we can see for generally | |
when we're in lower priced houses, it fits all three data sets fairly well. Again, we may want to ask ourselves what happened on these but, | |
at the moment, this appears to be the best of the bunch. So we can see from others. | |
See here. So we'll look at MATLAB for a moment. | |
So you can see training test validation here as well. So here we're spitting out...MATLAB spits out one thing of diagnostics and you can see it took a few epochs to finish so. | |
But thankfully MATLAB runs pretty quickly as we can tell. | |
And then the actual by predicted here. | |
We can see all this. | |
Okay, so I'm going to take a few minutes now to take a look at the code. | |
So of course, a few notes, make sure things are installed so you can actually run all this because if not, JMP's just going to fail miserably, not spit out predictions and then it's going to fail because it can't find the predictions. | |
So JMP has the ability to create a validation column with code. So I did that I chose 60 20 20. I did choose that random seed here so that you can use the same training validation test sets that I do. | |
So actually, for the moment, what I end up doing is I save what which ones are training, validation and test. I'm actually going to delete that column for a little bit. | |
The reason I do that here is because I'm sending the data set to R, Python and MATLAB and it's easier to code when everything in it is either the output or all the inputs. | |
So I didn't want a validation column that wasn't either and then it becomes a little more difficult do that. So what I ended up doing was I sent it the data set, | |
I sent it which rows of training, validation, and test, and then I call the R code to run it. Now you can actually | |
put the actual R code itself within here. I chose to just write one line here so that I don't have to scroll forever. But there's nothing stopping you. If it's only a few lines of code, like what you saw earlier in the presentation, I would just paste it right in here. | |
So that once it's done, it turns out...this code spits out | |
a picture of the diagnostics. We saw it stopped after six or seven iterations, let's have this say is that out. | |
And also fits the ridge regression in this script so we get two pictures. So I spit that one out as well and save it and outline box. Now, these all put together at the end of all the code. | |
And then I get the output and I'll add those to the data table here in a little while. | |
Okay, so I give a little bit of code here in that. Let's say you have 6 million observations and it's going to take 24 hours to actually fit the model, you're probably not going to want to run it within JMP. | |
So as a little bit of code that you could do from here, you can say, hey, I'm | |
okay, I'm going to just open the data table I care about. I'm going to tell R to go run it somewhere else in the meantime, and once my system, when I gives me the green light that hey, it's done, I can say, okay, well go open the output from that and bring it into | |
my data table. So this would be one way you could go about doing some of that. And of course you want to save those picture file somewhere and use this code as well. But this is gonna be the exact same code. | |
Okay, so for Python, it's going to be very similar. I'm going to pass it these things. Run some Python code, spit out the diagnostic plot and spit out the predictions. | |
And I give some Python code, you can see, it's very, very similar to what we did from JMP. I'm just going to go open some CSV file in this case. Copy the column in and close it, because I don't need it anymore. | |
And then MATLAB again the exact same game. Asset things, run the MATLAB script. I get the PNG file that I spat out of here. Save it where I need to, save the predictions. And if you need to grab it | |
rather than run it from within here, a little bit of sample code will do that. | |
OK, so now that I'm done calling R, Python and Matlab, I bring back my validation columns so that JMP can use it. So since I remember which one's which. | |
So by default, JMP looks at the values within the validation column which we'll use. The smallest value is training, the next largest is validation, the largest is test. Now if you do K fold cross validation, it'll tell it which fold it is. So coursing though 012345678 so on so forth. | |
So then create this. I also then in turn created this here, so that way instead of 012, it'll actually say training, validation, and test in my output, so it's a little clearer to | |
understand. So if I'm going to show someone else that's never run JMP before, they're not going to know what 012 means, but they should know a training, validation and test are. | |
OK, so now I start adding the predictions to my data table. Um, so here's that set property I alluded to earlier in my talk. | |
So my creator's MATLAB, I've given the column name so, hey, I know it's the neural net prediction for MATLAB. So I may not necessarily need the creator, but in case I'm a little sloppy in naming things | |
Sorry about that. | |
So we can get all the projection pursuit regression, neural nets, and whatnot. | |
Then I also noted that, hey, I fit ridge regression, lasso, linear regression in JMP. | |
So I did all that here. So here I do my fit model, my generalize regression. Get all these spat out. | |
Sve my prediction formulas. | |
Plot my actual by predicted for my full output at the end. | |
And I'm going to fit my neural network. Can I say the validation column. I transfer my covariates, generally neural networks tend to fit a little bit better when we scale things around zero as opposed to whatever the output is usually at. | |
So my first hidden layer has three nodes. My second hidden layer has two nodes. Here they're both linear activation functions. Turns out for the three above, I use the restricted linear units activation function so slightly different, | |
but I found they seem to fit about the same regardless. | |
5. What that means is, hey, I'm going to try five different sets of starting values, whichever one does best is what I'm going to keep. | |
As you can tell from my code, I probably should have done that with the R. It's done kind of a four loop, done several of them, spit out the one that does best. So for future work, that would be one spot I would go. | |
So then I save that stuff out and now I'm ready for the model comparison. So now I bring all those new columns in the model comparison. | |
Scroll over a little bit. So here I'm doing the by validation, as I alluded to earlier. | |
And so lastly I'm just doing a bit of coding to essentially make it look the way I want it to look. | |
So I get these outline boxes, just to say training diagnostics, validation diagnostics, test diagnosis, instead of the usual stuff that JMPs says. | |
I'm gonna get these diagnostic plots. | |
Now here I'm just saying I just only want part of the outputs on grabbing a handle to that. | |
I'm going to make some residual plots real quick because not all of them instantly spit those out, so particularly ones from MATLAB, Python and R. | |
Set those titles and then here I just create the big old table or the big old dialogue. | |
And then I journal everything. | |
So that it's nice and clean. Close a bunch of stuff out, so I don't have to worry about things. And then what I did here at the end is what I wanted to happen is when I pop one of these open, | |
everything else below it is immediately open rather than | |
having to click on six or seven different things. You can see, I have to click here and here and | |
Over here, there's three more. I guess one more. Sorry. But this way, I don't have to click on any of these, they're automatically open. | |
So that's what this last bit of code does. | |
Okay. | |
Lucas | So this is just different output it and run it live. But this is where it can also look like. |
So as I mentioned, so in the code, you saw there, you saw something else that's what we saw. | |
Richard | |
Lucas | Nope. So to wrap everything up the model comparison platform is really a very nice tool for comparing the predictive ability of multiple models in one place. You don't have to cut back and forth between various things. You can just look at everything right in front of you. |
The flexibility can even be used to fit models that weren't fit in JMP or compare models that weren't even fit in JMP. And so | |
with this, if we need to fit very large models that take a long time to fit, we can tell them to go fit. Pull everything in JMP and very easily look at all the results to try to determine next steps. | |
And with that, thank you for your time. |