Speaker | Transcript |
| Hi, I'm Chris Gotwalt. My co |
| presenters, Laura Lancaster and |
| Jeremy Ash, and I are presenting |
| an useful new JMP Pro |
| capability called Extrapolation |
| Control. Almost any model that |
| you would ever want to predict |
| with has a range of |
| applicability, a region of the |
| input space, where the |
| predictions are considered to be |
| reliable enough. Outside that |
| region, we begin to extrapolate |
| the model to points far from the |
| data used to fit the model. Using |
| the predictions from that model |
| at those points could lead to |
| completely unreliable |
| predictions. There are two |
| primary sources of |
extrapolation | statistical |
| extrapolation and domain based |
| extrapolation. Both types are |
| covered by the new feature. |
| Statistical extrapolation occurs |
| when one is attempting to |
| predict using a model at an x |
| that isn't close to the values |
| used to train that model. |
| Domain based extrapolation |
| happens when you try to evaluate |
| at an x that is impossible due |
| to scientific or engineering |
| based constraints. The example |
| here illustrates both kinds of |
| extrapolation in one example. |
| Here we see a profiler from a |
| model of a metallurgy production |
| process. The prediction reads |
| out says -2.96 with no |
| indication that we're evaluating |
| at a combination of temperature |
| and pressure that is impossible |
| in a domain sense to attain for |
| this machine. We also have |
| statistical extrapolation as it |
| is far from the data used to fit |
| the model as seen in the scatter |
| plot of the training data on the |
| right. In JMP Pro 16, Jeremy, |
| Laura and I have collaborated to |
| add a new capability that can |
| give a warning when the profiler |
| thinks you might be |
| extrapolating. Or if you turn |
| extrapolation control on, it |
| will restrict the set of points |
| that you see to only those that |
| it doesn't think are |
| extrapolating. We have two types |
| of extrapolation control. One is |
| based on the concept of leverage |
| and uses a least squares model. |
| This first type is only |
| available in the Pro version of |
| Fit Model least squares. The |
| other type we call general |
| machine learning extrapolation |
| control and is available in the |
| Profiler platform and several of |
| the most common machine learning |
| platforms in JMP Pro. Upon |
| request, we could even add it to |
| more. Least squares |
| extrapolation control uses the |
| concept of leverage, which is |
| like a scaled version of the |
| prediction variance. It is model- |
| based and so it uses information |
| about the main effects |
| interactions in higher order |
| terms to determine the |
| extrapolation. For the general |
| machine learning extrapolation |
| control case, we had to come up |
| with our own approach. We |
| wanted a method that would be |
| robust to missing values, linear |
| dependencies, faster compute, |
| could handle mixtures of |
| continuous and categorical input |
| variables, and we also |
| explicitly wanted to separate |
| the extrapolation model from the |
| model used to fit the data. So |
| when we have general |
| extrapolation control turned on, |
| there's only one supervised |
| model that is...that fits the |
| input variables to the responses |
| that we see in the profiler |
| traces. The profiler comes |
| up with a quick and dirty |
| unsupervised model to describe |
| the training set axes, and this |
| unsupervised model is used |
| behind the scenes by the |
| profiler to determine the |
| extrapolation control |
| constraint. So I'm having to |
| switch because PowerPoint and my |
| camera aren't getting along |
| right now for some reason. We |
| know that risky extrapolations |
| are being made every day by |
| people working in data science |
| and are confident that the use |
| of extrapolations leads to poor |
| predictions and ultimately to |
| poor business outcomes. |
| Extrapolation control places |
| guardrails on model predictions |
| and will lead to quantifiably |
| better decisions by JMP Pro |
| users. When users see an extrapolation |
| occurring, the user must make a |
| decision about whether the |
| prediction should be used or not |
| used based on their domain |
| knowledge and familiarity with |
| the problem at hand. If you |
| start seeing extrapolation |
| control warnings happen quite |
| often, it is likely the end of |
| the life cycle for that model in |
| time to refit it to new data |
| because the distribution of the |
| inputs has shifted away from |
| that of the training data. We |
| are honestly quite surprised and |
| alarmed that the need for |
| identifying extrapolation isn't |
| better appreciated by the data |
| science community and have made |
| controlling extrapolation as |
| easy and automatic as possible. |
| Laura, who developed it in JMP |
| Pro, will be demonstrating the |
| option up next. Then Jeremy, who |
| did a lot of research on our |
| team, will go into the math |
| details and statistical |
| motivation for the approach. |
| Hello, my name is Laura |
| Lancaster and I'm here to do a |
| demo of the extrapolation |
| control that was added to JMP |
| Pro 16. I wanted to start off |
| with a fairly simple example |
| using the fit model least |
| squares platform. I'm gonna |
| use some data that may be |
| familiar; it's the Fitness data |
| that's in sample data and I'm |
| going to use Oxygen Uptake as |
| my response and Run Time, Run |
| Pulse and Max Pulse as my |
| predictors. And I wanted to |
| reiterate that in fit model, |
| fit least squares the |
| extrapolation metric that's |
| used is leverage. So let's go |
| ahead and start to JMP. |
| So now I have the fitness data |
| open in JMP and I have a script |
| saved to the data table to |
| automatically launch my fit |
| least squares model. So I'm |
| going to go ahead and run that |
| script, it launches the least |
| squares platform. And I have the |
| profiler automatically open. And |
| we can see that the profiler |
| looks like it always has in the |
| past, where the factor boundaries |
| are defined by the range of each |
| factor individually, giving us |
| rectangular bound constraints. |
| And when I change the factor |
| settings, because of these bound |
| constraints, it can be really |
| hard to tell if you're moving |
| far outside the correlation |
| structure of the data. |
| And this is why we wanted to add |
| the extrapolation control. So |
| this has been added to several |
| of the platforms in JMP Pro |
| 16, including fit least squares. |
| And to get to the extrapolation |
| control, you go to the menu under |
| the profiler menu. So if I look |
| here, I see there's a new option |
| called Extrapolation Control. |
| It's set to off by default, |
| but I can turn it to either |
| on or warning on to turn on |
| extrapolation control. If I |
| turn it to on, notice that |
| it restricts my profile |
| traces to only go to values |
| where I'm not extrapolating. |
| If I were to turn it to warning |
| on, I would see the full profile |
| traces, but I would get a |
| warning when I go to a region |
| where it would be considered |
| to be extrapolation. |
| I can also turn on extrapolation |
| details, which I find really |
| helpful, and that gives me a |
| lot more information. First of |
| all, it tells me that my |
| metric that I'm using to |
| define extrapolation is |
| leverage, which is true in the |
| fit least squares platform. |
| And the threshold that's being |
| used by default initially is |
| going to be maximum leverage, |
| but this is something I can |
| change and I will show you that |
| in a minute. Also, I can see |
| what my extrapolation metric |
| is for my current settings. |
| It's this number right here, |
| which will change as I change |
| my factor settings. |
| Anytime this number is greater |
| than the threshold, I'm going to |
| get this warning that I might be |
| extrapolating. If it goes below, |
| I will no longer get that |
| warning. This threshold is not |
| going to change unless I change |
| something in the menu to adjust |
| my threshold. So let me go ahead |
| and do that right now. So I'm going |
| to go to the menu |
| and I'm going to go to set |
| threshold criterion. So |
| in fit least squares, you have two |
| options for the threshold |
| initially,it's set to maximum |
| leverage, which is going to keep |
| you within the convex hull of |
| the data, or you can switch to a |
| multiplier times the average |
| leverage or model terms over |
| observations. And I want to |
| switch to that threshold. So it's |
| set to 3 as the multiplier |
| by default. So this is going to |
| be 3 times the average leverage |
| and I click OK, and notice that |
| my threshold is going to change. |
| It actually got smaller, so this |
| is a more conservative |
| definition of extrapolation. |
| And I'm going to turn it back to |
| on to restrict my profile traces. |
| And now I can only go to |
| regions where I'm within 3 |
| times the average leverage. |
| Now we have also |
| implemented optimization |
| that obeys the |
| extrapolation |
| constraints. So now if I |
| turn on set desirability |
| and I do the optimization, |
| I will get an optimal value that |
| satisfies the extrapolation |
| constraint. Notice that this |
| metric is less than or equal to |
| the threshold. So now when I go |
| to my next slide, which is going |
| to compare in a graph, a scatterplot |
| matrix, the difference |
| between the optimal value with |
| extrapolation control turned on |
| and with it turned off. |
| So this is the scatterplot |
| matrix that I created with JMP, |
| and it shows the original |
| predictor variable data, as well |
| as the predictor variable values |
| for the optimal solution using |
| no extrapolation control, in |
| blue, and the optimal solution using |
| extrapolation control in red. |
| And notice how the unconstrained |
| solution here in blue, |
| right here, violates the |
| correlation structure for the |
| original data for run pulse and |
| Max pulse, thus increasing the |
| uncertainty of this prediction. |
| Whereas the optimal solution |
| that did use extrapolation |
| control is much more in line |
| with the original data. |
| Now let's look at an example |
| using the more generalized |
| extrapolation control method, |
| which we refer to as a |
| regularized T squared method. As |
| Chris mentioned earlier, we |
| developed this method for models |
| other than least squares models. |
| So we're going to look at a |
| neural model for the Diabetes |
| data that is also in the sample |
| data. The response is a measure |
| of disease progression, and the |
| predictors are the baseline |
| variables. Once again, the |
| extrapolation metric used for |
| this example is the |
| regularized T square that |
| Jeremy will be describing in |
| more detail in a few minutes. |
| So I have the Diabetes data open in |
| JMP and I have a script saved |
| of my neural model fits. I'm |
| going to go ahead and run that |
| script. It launches the neural |
| platform, and notice that I am |
| using validation method, random |
| hold back. I just wanted to note |
| that anytime you use a |
| validation method, the |
| extrapolation control is based |
| only on the training data |
| and not your validation |
| or test data. |
| So I have the profiler open and |
| you can see that it's using the |
| full traces. Extrapolation |
| control is not turned on. Let's |
| go ahead and turn it on. |
| And I'm also going to |
| turn on the details. |
| You can see that the traces have |
| been restricted and the metric |
| is the regularized T square. The |
| threshold is 3 times the |
| standard deviation of the sample |
| regularized T squared. Jeremy is |
| going to talk more about what |
| all that means exactly in a few |
| minutes. And I just wanted to |
| mention that when we're using |
| the regularized T squared |
| method, there's only one choice |
| for threshold, but you can |
| adjust the multiplier. So if you |
| go to extrapolation control, set |
| threshold, you can adjust this |
| multiplier, but I'm going to |
| leave it at 3. And now I |
| want to run optimization using |
| extrapolation control. So I'm |
| just going to maximize and |
| remember. Now I have an |
| optimal solution with |
| extrapolation control turned |
| on. And so now I want to look |
| at our scatterplot matrix, just |
| like we looked at before, with |
| the original data, as well as |
| with the optimal values with |
| and without extrapolation |
| control. |
| So this is a scatterplot matrix |
| of the Diabetes data that I |
| created in JMP. It's got the |
| original predictor values, as |
| well as the optimal solution |
| using extrapolation control in |
| red, and optimal solution without |
| extrapolation control in blue. |
| And you can see that the red |
| dots appear to be much more |
| within the correlation structure |
| of the original data than the |
| blue, and that's particularly |
| true when you look at this LDL |
| versus total cholesterol. |
| So now let's look at an example |
| using the profiler that's under |
| the graph menu, which I'll call |
| the graph profiler. It also uses |
| the regularized T squared method |
| and it allows us to use |
| extrapolation control on any |
| type of model that can be |
| created and saved as a JSL |
| formula. It also allows us to |
| have extrapolation control on |
| more than one model at a time. |
| So let's look at an example |
| for a company that uses powder |
| metallurgy technology to |
| produce steel drive shafts for |
| the automotive industry. |
| They want to be able to find |
| optimal settings for their |
| production that will minimize |
| shrinkage and also minimize... |
| minimize failures due to bad |
| service conditions. So we have |
two | responses shrinkage (which is |
| continuous and we're going to |
| fit a least squares model for |
| that) and surface condition (which |
| is pass/fail and we're going to |
| fit a nominal logistic model for |
| that one). And our predictor |
| variables are just some key |
| process variables in production. |
| And once againm the extrapolation |
| metric is the regularized T square. |
| So I have the powder |
| metallurgy data open in JMP |
| and I've already fit a least |
| squares model for my shrinkage |
| response, and I've already fit a |
| nominal logistic model for the |
| surface condition pass/fail |
| response, and I've saved the |
| prediction formulas to the data |
| table so that they are ready to |
| be used in the graph profiler. |
| So if I go to the graph menu |
| profiler, I can load up the |
| prediction formula for shrinkage |
| and my prediction formula is for |
| the surface condition. |
| Click OK. And now I have |
| both of my models launched into |
| the graph profiler. |
| And before I turn on |
| extrapolation control, you |
| can see that I have the full |
| profile traces. Once I turn on |
| extrapolation control |
| you can see that the traces |
| shrink a bit, and I'm also going |
| to turn on the details, |
| just to show that indeed I am |
| using the regularized T square |
| here in this method. |
| So what I really want to do is I |
| want to find the optimal |
| conditions where I minimize |
| shrinkage and I minimize |
| failures with extrapolation |
| control and I want to make sure |
| I'm not extrapolating. I want to |
| find a useful solution. And |
| before I can do the optimization, |
| I actually need to set my |
| desirabilities. So I'm going to |
| set desirabilities. It's already |
| correct for shrinkage, but I |
| need to set them for the service |
| condition. I'm going to try to maximize |
| passes and minimize failures. |
| K. |
| And now I should be able to do |
| the optimization with |
| extrapolation controls on. |
| Do maximize and remember. |
| And now I have my optimal |
| solution with extrapolation |
| control on. So now let's look |
| once again at the |
| scatterplot matrix of the |
| original data, along with the |
| solution with extrapolation |
| control on in the solution, |
| with the extrapolation control |
| off. |
| So this is a scatterplot matrix |
| of the powder metallurgy data |
| that I created in JMP. And it |
| also has the optimal solution |
| with extrapolation control as a |
| red dot, and the optimal |
| solution with no extrapolation |
| control as a blue dot. And once |
| again you can see that when we |
| don't enact the extrapolation |
| control, the optimal solution |
| is pretty far outside of the |
| correlation structure of the |
| data. We can especially see |
| that here with ratio versus |
| compaction pressure. |
| So now I want to hand over |
| the presentation to Jeremy |
| to go into a lot more |
| detail about our methods. |
| Hi, so here are a number of |
| goals for extrapolation control |
| that we laid out at the |
| beginning of the project. We |
| needed an extrapolation metric |
| that could be computed quickly |
| with a large number of |
| observations and variables, and |
| we needed a quick way to assess |
| whether the metric indicated |
| extrapolation or not. This was |
| to maintain the interactivity of |
| the profiler traces and |
| we needed this to |
| perform optimization. |
| We wanted to be able to |
| support the various variable |
| types available in the |
| profiler. These are |
| essentially continuous, |
| categorical and ordinal. |
| We wanted to utilize |
| observations with missing cells, |
| because some modeling methods |
| will include these observations |
| in ???. |
| We wanted a method that was |
| robust to linear dependencies in |
| the data. These occur when the |
| number of variables is larger |
| than the number of observations, |
| for example. And we wanted |
| something that was easy to |
| automate without the need for a |
| lot of user input. |
| For least squares models, we |
| landed on leverage, which is |
| often used to identify outliers |
| in linear models. The leverage |
| for new prediction point is |
| computed according to this |
| formula. There are many |
| interpretations for leverage. |
| One interpretation is that it's |
| the multivariate distance of a |
| prediction point from the center |
| of the training data. Another |
| interpretation is that it is a |
| scaled prediction variance. So |
| as prediction point moves |
| further away from the center |
| of the data, the uncertainty |
| of prediction increases. And we |
| use two common thresholds in |
| the statistical literature for |
| determining if this distance |
| is too large. The first is |
| maximum leverage, prediction |
| points beyond this threshold |
| or outside the convex hull of |
| the training data. |
| And the second is 3 times the |
| average of the leverages. It |
| can be shown that this is |
| equivalent to three times the |
| number of model terms divided |
| by the number of observations. |
| And as Laura described |
| earlier, you can change the |
| multiplier of these |
| thresholds. |
| Finally, when desirabilities |
| are being optimized, the |
| extrapolation constraint is a |
| nonlinear constraint, and |
| previously the profiler allowed |
| constrained optimization with |
| linear constraints. This type of |
| optimization is more |
| challenging, so Laura implemented |
| a genetic algorithm. And if you |
| aren't familiar with these, |
| genetic algorithms use the |
| principles of molecular |
| evolution to optimize |
| complicated cost functions. |
| Next, I'll talk about the |
| approach we used to generalize |
| extrapolation control to models |
| other than linear models. When |
| you're constructing a predictive |
| model in JMP, you start with a |
| set of predictor variables and a |
| set of response variables. Some |
| supervised model is trained, and |
| then a profiler can be used to |
| visualize the model surface. |
| There are numerous variations in |
| the profiler in JMP. You can |
| use the profiler internally in |
| modeling platforms. You can |
| output prediction formulas and |
| build a profiler for multiple |
| models. As Laura demonstrated, |
| you can construct profilers for |
| ensemble models. We wanted an |
| extrapolation control method |
| that would generalize all these |
| scenarios, so instead of |
| tying our method to a |
| specific model, we're going |
| to use an unsupervised |
| approach. |
| And we're only going to flag a |
| prediction point as |
| extrapolation if it's far |
| outside where the data are |
| concentrated in the predictor |
| space. And this allows us to |
| be consistent across |
| profilers so that our |
| extrapolation control method |
| will plug into any profiler. |
| The multivariate distance |
| interpretation of leverage |
| suggested Hotelling's T squared as |
| a distance for general |
| extrapolation control. In fact, |
| some algebraic manipulation will |
| show that Hotelling's T squared is |
| just leverage shifted and |
| scaled. This figure shows how |
| Hotelling's T squared measures |
| which ellipse an observation |
| lies on, where the ellipses are |
| centered at the mean of the |
| data, and the shape is defined |
| by the covariance matrix. |
| Since we're no longer in |
| linear models, this metric |
| doesn't have the same |
| connection to prediction |
| variance. So instead of |
| relying on thresholds used |
| back in linear models, we're |
| going to make some |
| distributional assumptions |
| to determine if T squared |
| for prediction point should |
| be considered extrapolation. |
| Here I'm showing the formula for |
| Hotelling's T squared. The mean and |
| covariance matrix is estimated |
| using the training data for the |
| model. If P is less than N, |
| where P is the number of |
| predictors, N is the number |
| of observations and if the |
| predictors of multivariate |
| normal, then T squared for |
| addiction point has an F |
| distribution. However, we wanted |
| a method to generalize the |
| data sets with complicated data |
| types, like a mix of continuous |
| and categorical data sets where P |
| is larger than N, data sets with |
| missing values. So instead of |
| working out the distributions |
| analytically in each case, we |
| used a simple conservative |
| control limit that we found |
| works well in practice. This is |
| a three Sigma control limit |
| using the empirical distribution |
| of T squared from the training |
| data and, as Laura mentioned, you |
| can also tune this multiplier. |
| One complication is that when P |
| is larger than N, Hotelling's T |
| squared is undefined. There are |
| too many parameters in the |
| covariance matrix to estimate |
| with the available data, and |
| this often occurs in typical use |
| cases for extrapolation control |
| like in partial least squares. |
| So we decided on a novel |
| approach to computing Hotelling's T |
| squared, which deals with these |
| cases, and we're calling it a |
| regularized T squared. |
| To compute the covariance |
| matrix we use a regularized |
| estimator originally |
| developed by Schafer and |
| Strimmer for high |
| dimensional genomics data. |
| It's just a weighted |
| combination of the full |
| sample covariance matrix, |
| which is U here and a |
| constraint target matrix |
| which is D. |
| For the Lambda weight |
| parameter, Schafer and Strimmer |
| derived an analytical |
| expression that minimizes the |
| MSE, the estimator |
| asymptotically. |
| Schafer and Strimmer proposed |
| several possible target |
| matrices. The target matrix we |
| chose was a diagonal matrix with |
| the sample variances of the |
| predictor variables on the |
| diagonal. This target matrix has |
| a number of advantages for |
| extrapolation control. First, we |
| don't assume any correlation |
| structure between the variables |
| before seeing the data, which |
| works well as a general prior. |
| Also, when there's little data |
| to estimate the covariance |
| matrix, either due to small N or |
| a large fraction missing, the |
| elliptical constraint is |
| expanded by a large weight on |
| the diagonal matrix, and this |
| results in a more conservative |
| test for extrapolation control. |
| We found this was necessary to |
| obtain reasonable control of the |
| false positive rate. To put this |
| more simply, when there's |
| limited training data, the |
| regularized T squared is less |
| likely to label predictions as |
| extrapolation, which is what you |
| want, because you're more |
| likely to observe covariances |
| by chance. We have some |
| simulation results |
| demonstrating these details, |
| but I don't have time to go |
| into all that. Instead on |
| the Community webpage, we put a |
| link to a paper on archive and |
| we plan to submit this to the |
| Journal of Computational |
| Graphical Statistics. |
| This next slide shows some other |
| important details we needed to |
| consider. We needed to figure |
| out how to deal with categorical |
| variables. We are just |
| converting them into indicator- |
| coded dummy variables. This is |
| comparable to a multiple |
| correspondence analysis. Another |
| complication is how to compute |
| Hotelling's T squared when |
| there's missing data. Several |
| JMP predictive modeling |
| platforms use observations with |
| missing data to train their |
| models. These include naive |
| Bayes and Bootstrap forest. And |
| these formulas are showing the |
| pairwise deletion method we |
| used to estimate the covariance |
| matrix. It's more common to use |
| row wise deletion. This means |
| all observations with missing |
| values are deleted before |
| computing the covariance matrix. |
| And this is simplest, but it can |
| result in throwing out useful |
| data if the sample size of the |
| training data is small. With |
| pairwise deletion observations |
| and deleted only if there are |
| missing values in the pair of |
| variables used to compute the |
| corresponding entry and that's |
| what these formulas are showing. |
| Seems like a simple thing to do. |
| You're just using all the data |
| that's available, but it |
| actually can lead to a host of |
| problems because there are |
| different observations used to |
| compute each entry. This can |
| cause weird things to happen, |
| like covariance matrices with |
| negative eigenvalues, which is |
| something we had to deal with. |
| Here are a few advantages of |
| the regularized T squared we |
| found when comparing to other |
| methods in our evaluations. One |
| is that the regularization |
| works the way regularization |
| normally works. It strikes a |
| balance between overfitting the |
| training data and over biasing |
| the estimator. This makes the |
| estimator more robust to noise |
| and model misspecification. |
| Next, Schafer and Strimmer |
| showed in their paper that |
| regularization results in a |
| more accurate estimator in |
| high dimensional settings. |
| This helps with the cursive |
| dimensionality which plauges |
| most distance based methods |
| for extrapolation control. |
| Then in the fields that have |
| developed the methodology for |
| extrapolation control, |
| often they have both high |
| dimensional data and highly |
| correlated predictors. For |
| example in cheminformatics and |
| chemometrics, the chemical |
| features are often highly |
| correlated. Extrapolation control |
| is often used in combination |
| with PCA and PLS models, where |
| T squared DModX are used to |
| detect violations of correlation |
| structure. This is similar to |
| what we do in model driven |
| multivariate control chart. |
| Since this is a common use case, |
| we wanted to have an option that |
| didn't deviate too far from |
| these methods. Our regularized T |
| squared provides the same type |
| of extrapolation control, but it |
| doesn't require projection step |
| which has some advantages. |
| We found that this allows us to |
| better generalized other types |
| of predictive models. Also, in |
| our evaluations we observed that |
| if a linear projection doesn't |
| work well for your data, like |
| you have nonlinear relationships |
| between predictors, the errors |
| can inflate the control limits |
| of projection based methods, |
| which will lead to poor |
| protection against |
| extrapolation, and our approach |
| is more robust than this. |
| And then another important point |
| is that we found the |
| single extrapolation metric |
| was much simpler to use and |
| interpret. |
| And here is a quick summary of |
| the features of extrapolation |
| control. The method provides better |
| visualization of feasible |
| regions in high dimensional |
| models in the profiler. |
| A new genetic algorithm has |
| been implemented for flexible |
| constrained optimization. |
| Our regularized T squared |
| handles messy observational |
| data, cases like P larger |
| than N, and continuous and |
| categorical variables. |
| The method is available in most |
| of the predictive models in JMP |
| 16 Pro and supports many of |
| their idiosyncracies. It's also |
| available in the profiler in |
| graph, which really opens up its |
| utility because you can operate |
| on any prediction formula. |
| And then as a future direction, |
| we're considering implementing |
| a K-nearest neighbor based |
| constraint that would go beyond |
| the current correlation |
| structure constraint. Often |
| predictors are generated by |
| multiple distributions resulting |
| in clustering in the predictor |
| space. And a K-nearest neighbors |
| based approach would enable |
| us to control extrapolation |
| between clusters. |
| So thanks to everyone who |
| tuned in to watch this and |
| here are our emails if you have |
| any further questions. |