Improving Machine Learning Using Space Filling DOE to Tune Hyperparameters

Tuning hyperparameters is crucial for optimizing machine learning models, but the process can be computationally expensive and complex. Traditional grid, random search, or even Bayesian optimization methods often miss critical areas of the hyperparameter space, leading to suboptimal models.

In this talk, we show a JMP add-in we have developed that uses space-filling DOE to more efficiently approach the hyperparameter tunning challenge. The use of space-filling DOE ensures that hyperparameter combinations are sampled more evenly across the entire parameter space, thus reducing the number of required evaluations while increasing the likelihood of finding optimal settings.

This talk also highlights the improved integration with Python found in JMP 18 and how leveraging capabilities like DOE inside JMP can be beneficial to data scientists. This talk combines advanced statistical techniques with practical, accessible tools to enhance model performance in diverse applications.

Hello. Thanks for coming to our talk. This talk is on improving machine learning using space filling designs to tune hyperparameters. My name is Peter Hirsch, and my co-presenter is Scott Allen. Before we explain our approach to improving machine learning, I wanted to share some results.

Here are two contestants that were given the challenge to build 10 predictive models. Each of these predictive models was a categorical response. If we look at the background of the two contestants, in one corner, we have a world-famous data scientist, Russ Wolfinger, and in the other corner, not a world-famous data scientist. This is equivalent to me playing one on one basketball with Michael Jordan. You might expect a very lopsided result.

As you can see, the results ended up being very close with the non-world-famous data scientists actually having better results. How is this possible? The Torch Companion Add-in was my secret weapon. This Add-in helps all of us improve our modeling results, and this is like having a LeBron James as a teammate in that one-on-one basketball game.

I want to start with a very easy quiz here. When we look at this image, do we see a cat or a dog? What about this other image? Do we see a cat or a dog? Pretty easy, we can tell that the dog is on the left there and the cat is on the right, but how do we know that that image is of a dog or a cat? What are the key features? If we were going to build a model, how would we train that model to identify cat or dog? Here's a diagram that gives us a visual representation of how a neural network is constructed. In this case, the image is of a handwritten four broken apart into three layers with layer sizes of four nodes each.

Each node has a different representation of the original input, and these nodes and layers are combined to help make a prediction. Each dataset can have a different level of complexity. If we look at the top example here, just trying to identify if a point is green or red might be a much simpler model than on the bottom where we're looking at a dog or cat picture. This might require a much more complex model. The number of layers and layer sizes or nodes per layer are called hyperparameters, and it'll need to be tuned to optimize each model.

How do we adjust the model to address different problems? If we look here, this is the launch window for the Torch tuning Add-in inside of JMP, and there are many different options. We can look at different things like epochs, learning rate, layer size, activation function. With all of these options, we have millions of different combinations we can run. Where do we start?

A of couple options we could take when we're exploring the hyperparameter space. The first one is a grid search, and I think this is most common or most closely related to a full factorial design. If we looked at this, we would look at every possible combination of all the factors, and this would lead to over two and a half million different models we would have to run. That's too much in most cases. We could also do a random search where we pick certain combinations of factors to run at random.

There's also a Bayesian Optimization approach we could take, or we could ask Russ Wolfinger. Now, it's great if you have Russ on speed dial, or you have another data scientist that's similar to Russ, but that's not always an option. How would I or Scott or a normal experimentalist be able to solve this problem? We treat this like an experiment. In JMP, how do we do experiments? Do we do it one factor at a time? No, Drake doesn't think that's a good idea. We should probably design an experiment. We're going to talk a little bit about how that approach works, and I'm going to hand it over to Scott.

Thanks, Peter. What we'll do here is just go through what this Torch Companion is comprised of, show some examples, and then work within the Add-in a little bit. Before we go into that, I'm just going to talk a little bit about what's in this Add-in.

This Add-in really just is stitching together a lot of different JMP capabilities, things like Table Functions, the DOE Platform, the Torch Deep Learning Add-in that Russ and his team have developed, and then just some Graph Builder outputs. This Add-in, you could do all of these things probably on your own individually, but what this Add-in is doing is just trying to make a seamless workflow. The way that you interact with the Add-in is really the user, just like many JMP platforms, will specify columns that can be your response, your factors, validation. Then you specify Torch parameters or in this case a range of parameters or a selection of different options, and then you pass those into the Torch Add-in. We just want to take a couple a look at some of these interfaces really quick.

The column dialog is very similar to what you might see in a standard JMP platform, where you're going to load your response, your factors, and your validation columns. The factors can be numeric, or character, or they can be, in this case, an image model. I think that's one thing that the Torch Add-in brings to JMP that is unique, is that ability to look at image models, and so we made sure to build that into this companion Add-in as well. You click OK, and you go to the specification window. We worked with Russ a bit on this just to get some guidance on what we should include here in terms of what parameters can you screen that are going to get you most of the way to an optimal model.

The goal here is to use the Torch Companion in conjunction with the Torch Deep Learning Add-in. We did not include all of the different parameters that you might want to tune, but really, this is going to get you close to some optimal model, and want to do it more efficiently. We can tune things like the model, the tabular model type, or the image model type, and some of those tuning parameters. We can also look at different model structure parameters options, most specifically that number of DOE trials. This is how you can set up, instead of running one model at a time, that one factor at a time approach, you can just set up a range of parameters and look at 200 or 300 models in one click.

You'll notice some options are grayed out here. We wanted to try to make this Add-in, accessible to people that are new to Torch or new to machine learning in general, so we built in some guardrails here, and I know Peter appreciated some of these guardrails as we were going through it.

Yeah, definitely needed to keep me on track. It was helpful.

If you're not familiar with these different types of models, we preselect some of the most common and preselect some good starting points. If you want to now use this Add-in to screen parameters, you can select the screen parameter checkbox, and now you can start to look at a range of parameters. Once again, we're going to gray out some options that may not be optimal based on what that data table that you have looks like. We also down select to some more of the common activation functions, and set up some of these parameters to scale at a factor of, or by the power of two, and that just helps you screen a really wide range of, for instance, layer sizes more efficiently.

We also want to have some advanced options in here, and if you select the advanced options, now you're going to have access to all the different model types available in the Torch Add-in, as well as the different tabular model types, and all the different activation functions that are available. Depending on where you are in your Torch Deep Learning knowledge, you can go with the simple route, or you can go an advanced route. Let's just see an example of how this works. In this case, I've preloaded some different options here, a range of epochs and learning rates, just selected some layer sizes, and set this up to 100. This could be as many or as few trials as you want.

Then, when you've set all those parameter ranges, you click Run. Once you click Run, now we're going into the behind the scenes of that Add-in, and when you click Run it's going to create that space filling diagram after doing some table manipulations. Then it's going to pass each row of that space filling design table into Torch Deep Learning. Actually, before we started the talk, I started a model, and you can see I set this up to run 500 Torch models, and you can see it's chugging along as we've been talking, so it's going through each of those rows in that space filling design, giving us a little bit of an indication of how our loss function is going with respect to the epochs, just to give us an indication that we've selected some appropriate parameters. We're going to continue on with the talk here and check back in with that in a few minutes.

Once the Torch deep learning Add-in is running, once it completes all those hundreds of models or dozens or however many you specified, it's going to summarize those results using some additional table functions and graph builder, and then give you an output or a report window to interpret those results. Based on that interpretation, there are two different paths. You can take one path that's going to take you back to the parameter specification. Maybe you didn't get any satisfactory models. Maybe you think that there's room to improve these.

You can go and set new parameter ranges, select different functions, and then start the cycle again, and this will just append the new results to your original results, or maybe you found some models that were really good, that you feel confident in, that you don't need to improve any further, and what you can do now is pass those directly into the Torch Deep Learning Add-in so that you can do some further model refinement, or you can compare and save and deploy those models, or you can tune parameters that you aren't able to tune in the companion. This is the workflow that we laid out, and it was fun to pull this together, but I will say that Peter is the one that did most of the heavy lifting on the model building. Peter, maybe you can talk about your experience in using this.

I think if there's any software developers here, you can think of Scott as the developer of the Add-in, and I was the tester. A couple of things came to mind. This is a DOE, and DOE is often an iterative process where we do a single DOE, we learn from that DOE, and then we build maybe an optimal DOE on top of that. If we look at the results that I got from a couple models, you can see most of the time I'm running more than one DOE. Oftentimes, these models would run really quickly in a few seconds, and so you could do very large number of models. Sometimes you would take much longer, and you would have to run multiple DOE's, and you wouldn't want to run 100 to start with. Like looking at the top model there, that was an image model, and each model took about 10 minutes to run.

If you started out with 200 models, you would be sitting there waiting for a weekend for it to happen. Then, if you look at some of the others, maybe running 200 when they're all taking 4 seconds apiece is a little bit better. Couple pointers on this. I like to use a holdout cross validation set to start with because that is a much quicker process than using K-fold. When you actually go through and if you like K-fold, at the end when you launch it in Torch, you can switch out your validation column, but I think for tuning, I like using that validation single validation set.

Then also, love the time check that Scott brought built in. If you're worried that a model might take a long time, like an image model, you can have the tuning add in check for you.

Good. Thanks, Peter. I would venture to say that you've probably built, aside from Rust, probably the most torch models of anybody, at JMP

Definitely the most poorly performing ones.

Let's switch gears and go into the Add-in and show some of these reports and how you can screen and optimize these models. Let's check back in on our model. We're not quite done. We've got 270 or so out of the 500 that we specified.

One other thing that was built into this is the ability to cancel. If we just cancel this, and you might cancel for a number of reasons. Maybe they're taking longer than you expected, or you're short on time. Whatever it might be, you can cancel. When you accept the torch error, it's going to give you an option to either continue on with the next run, so maybe you're running an image model, and they're all running pretty quickly, but then you get one that's going to take a really long time.

You can just continue with the DOE and essentially skip it and go to the next one, or you can just stop running the DOE altogether, and that's what we're going to do here. It's going to stop it, it's going to build the report, and the report window is down here now. Here's the report window. What it's done is we had 500 in our DOE, but you can see we've got results for 270 of those.

What we're going to do is now just take a look at this results table, and I want to guide us through some of the most important elements to get started here. The first thing is the summary statistics. The table is organized by the validation R-squared, so the higher the validation R-squared, the lower the row number. Then you have access to all the different parameter settings, the different inputs as well as some outputs like the time it took to run the models and the number of DOE's that had been run so far.

The first diagnostic tool that you can look at is R-squared heat map. This is really just plotting that validation R-squared versus the training R-squared. You really would prefer the models that have high R-squared values for both validation and training, but you can certainly see that you get a wide range of R-squared values here. I think this is an important consideration if you're new to Torch or new to machine learning, is that if you set some parameters, if you're not sure what parameters to set, you might get models down here. You might say, I'm really not going to be able to build a good model.

Then you adjust some parameters, and maybe they get worse. What we can do is see that when you screen a bunch of parameters, you're going to get a better view of what's possible with that model. We can see in this case, we do have some models up here that are quite good with high R-squared for the validation and the training. There's some built in data filters, but these are all just in Graph Builder, so you could certainly take this data and graph it however you would like. There are also two tabs that are going to look at individual parameters, so the model tuning parameters like learning rate and epochs, and how those trend over time, or over the model space, and then as well as the activation functions and other layer structure parameters.

One thing that you might consider here is scanning many different types of activation functions. I know, Peter, you came across some interesting results here when you were running through these models.

Absolutely. When we started building this Add-in, Russ recommended that we just go ahead and look at a single activation function and use ReLU. ReLU is the most common. It's what everybody uses. It's a good starting point. In some cases, ReLU was doing well, but in most cases, we found that one of the other activation functions actually performed a little better. Not that Russ' approach isn't a good one, but it was nice to be able to look at all these activation functions, look to see if there's some slightly better result you can get by tweaking that activation function.

No. I think that is certainly one of the intents here, is if you don't know about these activation functions, or maybe if you do, you might know which might be appropriate for a dataset or a type of model, or based on your experience, but if you don't know, then this just really helps you scan all of them. In this case, we scanned six of those functions, but there's 20 that are available that you could also look at.

We also have a parameter importance output. This is really just taking a look at, just building a quick bootstrap forest model to show which of those hyperparameters or tuning parameters are most important in affecting the response, in this case, the R-squared validation. Then we also have individual model details. You'll notice in the results table there is a column that gives you a unique model ID, and so if you wanted to drill into any of these models, you can just open that up. This is just an export from Torch while it was running, so that you still have access even though we're summarizing everything based on the R-squared for the validation set. All of the summary statistics for all of the models are available.

I think that covers pretty much what I wanted to show for the report window, and just wanted to wrap up what you can do with the Add-in with a couple different workflow options. The first is rerunning. In this case, maybe we're happy with some of these models, but we can look through and see what activation functions were used, and maybe we want to increase the number of layers or the layer size, so we can go back to the specifications, and we can down select to certain functions.

Maybe we only want to look at these three, but maybe we want to look at larger layer sizes. Then, what we could do is then specify, maybe we'll only look at 50, and then click Run to now run and run just that down selected series of parameters, and then run another 50, and append that to the table. That would be one workflow option to consider. Another is some options in the red triangle. Maybe you're satisfied with this model or these two models, and you want to take them into the Torch Add-in to run directly.

You don't have to go back and forth and remember what each of these parameter settings are and type them into the Torch Add-in. You can just go to the red triangle, and we're just going to say run selected. When we click Run selected, now it's taking those parameters, passing them into the Torch Add-in, and then running them, and will give us a report now in the native Torch Add-in, and so now we can see how those models compare. You can see we've passed through all the appropriate parameters, and now we can more directly compare maybe our best fit models, and get a better indication on those summary statistics as well.

Finally, maybe you've got models that take a long time to run. In that case, you might not be able to do all of your tuning in 1 day. If you wanted to save this and come back to it later, you can go to the red triangle, and we're just going to save this result's data table. When I save it, I can just save this to any location that I want, and then reload it later. What we'll do is just show how that works.

I'm going to close this down, click okay, and now if I wanted to reload a saved data table, I'm going to go back to my original data table. This was the data table that I was using for modeling. I'm going to run the Add-in, the companion, and now I'm going to just click Recall, and you just have to make sure that you're putting in the right columns, the same columns that were specified in your tuning, and click OK. Now instead of specifying all of the different parameters here, we're just going to go to the load button, and I did not have this up.

Let's go here, and we will go to our tuning results, and we'll click Open. We get a dialog window that says, yes, we have the same x variables, y variables, and validation column. If any of those don't match with what you originally ran, it will not load it. When we click OK, we'll see our prior tuning runs, and we regenerate all of the graphs. You can go back to the specifications, and now start running these again.

I think that's all I wanted to show. There are a few other options in the Torch Add-in, and really want people to use this in conjunction with the Torch Deep Learning Add-in from that Russ and his team developed, and I'm sure you'll find a few other useful features in there once you start working in it. Maybe now I'll pass it back to Peter to talk a little bit about some of the results, or the final results.

Sounds good. Thanks, Scott. The results we got here in red is the results of the validation R-square that I got from my models, and in blue are the results Russ got from his models. This is not a knock on Russ. Russ is a great data scientist, but I think it's more an advertisement for the power of using DOE versus one factor at a time.

This shows that taking a chemist versus a data scientist and letting them run models, if you use DOE, the chemist can have a similar success or maybe even improve upon what a very good data scientist can do. The tuning Add-in gives everybody that ability, and it's not meant, like Scott said, it's not meant to replace the Torch Add-in, it's in conjunction with the Torch Add-in, and it's going to be available on the marketplace. You can get both of those.

I gave myself an Olympic medal for beating Russ because I think that's a pretty good result for anyone, but also just for someone without that data science background. We started talking about identifying cats and dogs, but if we think about this in the world of what you do, you can imagine there's all different kinds of image models or even tabular models that you might use this deep learning neural network type approach, and tuning these hyperparameters will help you get better results.

These images just come from some different examples in the Torch Add-in storybook. I think that everyone has this type of data that they could apply this approach to. We did want to acknowledge, of course, Russ. I know we gave him a hard time a little bit, but he also helped us out greatly with developing this Add-in, helping put guardrails on it, so I wasn't all over the place, and of course, the JMP steering committee for selecting this talk. Like we mentioned, both the Torch Add-in and the Tuning Add-in are available in the marketplace. We have a QR code here if you'd like to go download it or just go to marketplace.jmp.com. Thank you.

Thank you.

Presented At Discovery Summit Europe 2025

Presenters

Skill level

Advanced

Beginner
Intermediate
Advanced

Files

Peter22025-EU-30MP-2022.pptx

Improving Machine Learning Using Space Filling DOE to Tune Hyperparameters

Presenters

Skill level

Files

Design of Experiments

Predictive Modeling and Machine Learning