Solved: Re: Prediction Profiler and desirability functions

SDF1 · Jun 8, 2023 5:47 PM

Hi All,

JMP Pro 16.1.0.

I'm using the Prediction Profiler to work through some verification of model tuning and was wondering if there is a way to "trick" the desirability functions into doing something they're not really designed to do. With the standard setting for desirability, you have the option to Maximize, Match Target, Minimize or None.

During the model tuning process, I record the R^2 values for training and validation, as well as the difference in between the two. One of the things I try to look for when model tuning are parameter settings that not only maximize R^2 for BOTH training and validation, but minimizes the difference between the two. For example, it's better for me if I have a model that has an R^2 training of 0.4 and R^2 validation of 0.35 than it is to have a R^2 training of 0.7 and an R^2 validation of 0.25. I.e., the closer the training and validation sets are to each other, even if smaller tend to perform better for my situations.

The problem is that with the Prediction Profiler, If you set both training and validation R^2 to maximize and minimize the difference, the profiler suggests settings that result in training R^2 of 0.6 and validation R^2 of 0.2 or so. Not what I'm after.

What I'd really like the Profiler to do is to try and converge on training and validation R^2 that are as close to each other as possible, while also being as large as possible, in order to minimize the difference while keeping large R^2 values. Unfortunately, changing things like the importance (0 to 1) or the goal don't really achieve what I'm after.

If anyone knows a trick on how to get the Profiler to do something like this, it would be great to know. Or, if it's not possible, it might be a really nice new feature to be able to direct the Profiler to try and converge two of the responses while maximizing (or minimizing) them.

Thanks!,

DS

ih · Apr 5, 2022 12:13 PM

I am guessing that you don't care whether the R2 difference is 0.01 or 0.1, but you gradually care more about it as it increases above some threshold. You could try making the R2 difference a formula that is flat when below a certain value and then gradually decreases as it goes above that limit, or you could set up the desirability function to describe this. I often do the latter with the T2 metric in multivariate models. You can see an example of this by skipping to about 16:30 in the video linked below.

View solution in original post

Mark_Bailey · Apr 5, 2022 02:12 PM

You can also change the shape of the graph of the desirability to indicate that a larger difference is a bigger penalty. It does not have to be linear with a middle value in the center.

Note: you can control click (windows) or command click (macOS) on the plot of the function graph and change values quickly while you try different schemes. I recommend NOT changing the desirability, but only change the response levels Move the center value towards one end to make it non-linear.

View solution in original post

Mark_Bailey · Apr 5, 2022 11:15 AM

You are using the Profiler and Desirability Functions correctly. It is not wrong to use desiriability = (0,1), but you might try using values like 1 for maximize and 2 for difference and see if that gets you closer to what you want. The definition of the functions (limits, importance) is key to telling the profiler what you are after.

That said, it is not guaranteed that you will get what you want!

ih · Apr 5, 2022 12:13 PM

I am guessing that you don't care whether the R2 difference is 0.01 or 0.1, but you gradually care more about it as it increases above some threshold. You could try making the R2 difference a formula that is flat when below a certain value and then gradually decreases as it goes above that limit, or you could set up the desirability function to describe this. I often do the latter with the T2 metric in multivariate models. You can see an example of this by skipping to about 16:30 in the video linked below.

Mark_Bailey · Apr 5, 2022 02:12 PM

You can also change the shape of the graph of the desirability to indicate that a larger difference is a bigger penalty. It does not have to be linear with a middle value in the center.

Note: you can control click (windows) or command click (macOS) on the plot of the function graph and change values quickly while you try different schemes. I recommend NOT changing the desirability, but only change the response levels Move the center value towards one end to make it non-linear.

SDF1 · Apr 5, 2022 03:07 PM

Hi @Mark_Bailey and @ih ,

@Mark_Bailey , your first reply was one that will "make it work". Unfortunately, it's not an automated optimization process, so it doesn't iteratively find the optimal conditions. Nonetheless, it's certainly easy enough to manually change the relative importance of each response.

Utilizing a mix of your suggestion and @ih 's suggestion is proving to be more along the lines of what I think ultimately I'd like it to do. There's just a lot of manual adjustment of the desirability functions -- either changing the numbers by hand, or moving the little boxes in the graph around. It would be nice to have a more automated and slightly less subjective method of obtaining my end goal.

@ih you are partially right that I don't care if the R2 Diff is 0.01 or 0.1, however, if I have an R2 Diff of 0.1 and my training R2 is 0.8 and my validation R2 is 0.7, I'd prefer that to a R2 Diff of 0.01 if my training is at 0.31 and my validation is at 0.3. In practice, my data sets tend to behave more like the latter rather than the former -- I wish I could get my two R2's to be as high as 0.8 and 0.7!

Mixing the two suggestions gives me a lot to play with, just wish is was a little less subjective.

Thanks for the feedback!,

DS

Mark_Bailey · Apr 5, 2022 03:56 PM

The process is objective. The decisions are not.

ih · Apr 5, 2022 6:07 PM

I agree with @Mark_Bailey, you cannot get around your thoughts and views being part of whatever solution you provide here.

I encourage you not to get hung on up on RSquared. Having a high R2 does not mean you have a useful model, and having a low one doesn't mean your model will not serve its purpose. It would be fair for someone to say the RMSE of your model needs to be below a threshold, but it is meaningless for them to say R2 must be above some arbitrary value. In fact, if you are predicting something with a lot of measurement error or variation due to something not in your dataset and you have a really high R2 then you likely have a problem.

Using R2 to compare test and validation is certainly valid and it can help detect an overfit model. Just by having different rows in your dataset though you sould expect the R2 to be different. As an example, try taking the big class data set and predict height using a linear least squares model from age and weight. Now if you have JMP Pro right click on the displayed RSquare value and select bootstrap. Use the defaults and then do a distribution in the resulting table and you should get a distribution similar to the one below. If you do not have JMP Pro, you can get something similar by taking a subset of the table with 30 random rows and doing the least squares fit, record the R2, and then repeat the process a bunch of times.

The model didn't change, just the rows used to build it, and yet look at that variation in R2! So, expecting your training and validation R2 to always be within 10% of each other might be putting an unnecessary artificial constraint on your solution.

If you want to test for overfitting using a scripted solution, in addition to some 'keep it on the road' limit on the difference in R2, you might try using the Col Shuffle formula and fitting your model with the randomized Ys, you should see the R2 drop to near zero. It it does not then your model is capable of modeling the random noise in your data.

SDF1 · Apr 6, 2022 08:32 AM

Hi @ih ,

You both are certainly right that I cannot get around my thoughts and subjective influence on the process and decisions, which therefore makes the the entirety of it somewhat subjective and prone to someone's bias. It would be nice to remove as much of that as possible so as to limit any one person's bias into the matter at hand. After all, we're after the best performing model, not what someone wants/wishes to be the best performing model.

It's also true that I don't solely use the R2 as a metric for the models I generate. It is however one of the more reliable methods in helping to drive the tuning process of some of the model fits, for example with boosted trees and bootstrap forest. Both of those methods have some randomness involved in the process, which is good, but fine tuning the parameters of the model can get rather unhandy -- it's very easy to generate large tuning tables as there are several parameters, making the hyperparameter space very big. In order to help that process a little, the Gaussian Fit platform can reduce the parameter space upon iterative tuning fits. R2 works as a very effective stand-in metric with that platform.

The bootstrap option for report diagnostics is a very helpful tool that I routinely use. I find it also very helpful with the null factor from the autovalidation add-in to test which factors (out of a large set) truly should be kept in the model -- a sort of pre-modeling factor reduction technique.

As mentioned before, the models tend to perform better when the training and validation R2 approach similar values. The other extreme, high training and low validation routinely perform far worse overall. When setting the desirability functions and levels (or cutoffs), that can be a very subjective portion and can heavily influence the outcome. What would be nicer is an algorithm that would optimize that process for you. Of course, the end decision is still one which requires the user to make the call -- to asses whether the result has physical meaning, not just mathematical meaning. An analogy would be spectral data, say Raman or photoelectron spectra, which often have line shapes that are Lorentzian or Gaussian (or a mix). Mathematically, it's possible to have a negative sigma (standard deviation) since in the function, sigma is squared. But physically, this has zero meaning, so any fit result from an algorithm that suggested a negative sigma for the curve can be discarded because there is no physical basis for a measurement to have a negative standard deviation. Similarly, one can obtain R2 values that are negative, or have R2 validation values greater than for the training set, but those instances should be major red flags and not considered in further analysis.

In the end, this method is used to help narrow down the hyperparameter space so that finding a more stable global minimum is faster and easier than generating a massive tuning table with hundreds of thousands of rows to test all possible combinations. This still only provides recommendations for the tuning parameters, and one must still assess whether those findings make sense. The model is still iteratively generated under random starting conditions, but each time, narrowing down on a more stable set of parameters. This approach is helpful with the other modeling platforms that have many input parameters for tuning them.

Thanks!,

DS