Re: help with model comparsion: DOE vs ANN boosted

ivanpicchi · Jan 19, 2024 4:39 AM

Good morning guys!

1) I have a DOE and some boosted neural network models that I would like to compare their respective Rsquared (largest), RASE (smallest) and AAE (smallest) to select the best optimized models from my experiments.

I saw that in the "Model Comparison" tool, through the predictive formulas saved in the experiment spreadsheet, I can make the selection according to the criteria above the model, as shown in the attached image. But I would like to include my DOE with its Rsquared, RASE and AAE in this generated table. Would anyone have any tips on how to do it?

2) Another question: in the model launch, would the "number of tours" field be the numbers of epoch? How does increasing this field work? Does anyone have any suggestions for testing values?

*** add

I forgot to ask:

3) I only have 27 experiments (triplicates of 8 experiments + 3 central points resulting from a complete factorial DOE), and I cannot carry out any more due to lack of material. I ended up choosing to test ANN models with boosting because I believed that it would create a model with greater predictive capacity through the combination of several small models. Would this be the best option? Would anyone recommend another ANN construction for this scenario? Any ideas for optimizing networks, increasing Rsquared and decreasing RASE and AAE? I'm really starting with ANN, any contribution would be very relevant!

* The attached JMP file follows.

Victor_G · Jan 19, 2024 7:10 AM

Hi @ivanpicchi,

I am confused and may not directly answer your questions, as I'm not sure what is/are your objective(s) in this plan : are H/W and FN MED the two responses ?

Also, what is your objective :

Explaining your system based on your factors ?
Screen important factors ?
Create a predictive model ?

What is your knowledge about the system ? and many other questions ... Here are some first thoughts, but there is much more to discuss (but this may take too long to write on a forum).

Using Neural Network for a model-based DoE seems completely off for me, as you only have 3 levels for each of your factor, and you use a highly flexible model that can approximate any complex functions... Seems like using a bazooka to kill a fly : overly complex and unsuitable approach. So one or several NN are completely premature at this stage. There are other (more simple) algorithms that you could try : Decision Tree/Random Forests, Support Vector Machines, ...

Also the validation column is not set properly for Machine Learning, as the experiments (through replicates) can be both in training, validation and/or test sets. This results in overfitting of models, and overconfidence in the prediction results. For the validation of predictive model on small-size dataset, you have to keep the same treatment in the same set to avoid data leakage, and overfitting in your model. Cross-validation approach, like K-folds and Leave-One-Out, might be interesting to consider (with the same precaution with the use of replicates).

Anyway, a general advice is to visualize your data before modeling, and use Occam razor principle when modeling : Start your modeling with a simple model, and then iterate and add complexity (if needed !). Compare your new model with the previous one, and assess how much improvement you gained (and how much the complexity has increased). I would recommend to start simple, use regression models first and evaluate properly the pro's and con's from your models.

I don't know which kind of regression model and the terms you included, but the response H/W can be quite good modelized through Fit Least Squares model (see script "Fit Group"). For the response FN, it seems you have a lot of noise in the measurements, so maybe a further work on the ranges, experimental protocol, factors (control nuisance factors, and/or use blocking, ...), measurement device, and maybe on the a priori model may be considered to improve results.
Fitting a complicated model on this response won't help: there are no big differences between the results of your best NN (R² = 0,772 - RASE = 0,174 (with overfitting...)) and a simple regression model (R² = 0,758 - R² adjusted = 0,715 - RMSE = 0,586).

Take also in consideration that the modeling approach, metric(s) and design used should be defined accordingly. For example, in a predictive objective focusing on the inside of the experimental space, a Space-Filling design is suitable to use Machine Learning algorithms, with possible metrics like RMSE/RASE/MSE focussing on the accuracy of the predictions.

I hope these first discussion points will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

ivanpicchi · Jan 19, 2024 05:03 PM

Thank you for such a detailed answer!

The experiment was DOE carried out to predict - within the experimental defined space - WAAM walls by robotic GMAW with 10 overlapping welds, where each experiment corresponds to a wall executed.

I need to predict the following two answers: 1) H/W (height/width ratio) and FN (ferrite content in the deposits, obtained by an average of 10 points measured by a ferritoscope on the wall). With the factorial DOE carried out, the students were able to propose an optimized impression maximizing the H/W ratio and with a target for the FN content. We are starting this DOE and ANN journey aiming at welding parameterization.

However, the model is not as satisfactory for predicting some welding experiments outside of the DOE runs, with some of the measurements performed falling outside the 95% CI predicted by the model. Therefore, the attempt to maximize the predictive capacity with the experiments that have already been carried out. I need to predict the answers WITHIN the sample space, because below it is not weldable and above, I damage machine components.

Reading some books and articles, I considered applying K-fold to ANN due to the low amount of data as indicated by you in your answer, but where do I select it in JMP pro? Thanks for your tip about putting the triplets in the same validation data set, it didn't even cross my mind that this would influence ANN overfitting.

Yes, I applied the Fit Least Squares model to my regression. Could the noise level in the FN be because it is an average of points? If so, any tips with the above?

Again, thanks for the excellent response and questions!

ivanpicchi · Jan 19, 2024 07:04 PM

For example: I would like to carry out an analysis similar to these images taken from the article "A comparative study using response surface methodology and artificial neural network towards optimized production of melanin by Aureobasidium pullulans AKW" (https://doi.org/10.1038/s41598-023 -40549-z).

Victor_G · Jan 22, 2024 06:24 AM

I would be cautious with the results, as we don't know (at least I don't) how the partitioning has been done for the neural network validation.

Did they take into consideration that the 3 last lines are replicates, and so for Neural Network it would be more "fair" to use an average of the response for this treatment, instead of partitioning the 3 different results in possibly different sets ?
Also there is a lack of explanations about the modeling part of the regression, so I don't know what are the terms kept in the modeling and on which basis terms have been possibly deleted from the model: is it done in a "Machine Learning"/predictive way, testing different regression models and keeping the one with best validation results ? Or based on statistical significance, information criterion, likelihood ... ?

However, even if there is a lack of information on how the modeling has been done (perhaps present in the article, but can't access with the link provided), it's interesting to see the bias-variance tradeoff in action when comparing the results from the optimum with experimental validation :

Regression model may have a higher bias (biggest mean difference for the actual vs. predicted optimum) than the neural network, but it may have a lower variance (since the optimum found with regression model has lower uncertainty (SD ?) than the one found with Neural Network), so it may prove a more robust choice.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Victor_G · Jan 22, 2024 2:42 AM

Hi @ivanpicchi,

Again, the complexity of the model you can fit will be linked to the number of levels, quality and representativity of your data points. With only 3 levels for your factors min, median and max, traditional regression methods will only be able to fit up to a quadratic model (with interaction terms and quadratic terms). If you expect a more complex response surface, I would recommend augmenting your design with Space-filling design points in the inside of your experimental space.

Some responses to your questions :

In the Neural platform, you can select as validation methods "K-folds". If you specify K = number of rows, then you'll do Leave-One-Out validation method. Take care that you have no replicates in your datatable, as it could greatly mislead the results (since a treatment used in validation could also be present in the training if replicated).

More infos on validation methods in Neural platform : Validation Methods for Neural
You can find an explanation for Number of Tours (one of your initial question) here : Neural Fitting Options
And I can recommend using this add-in to help you fine-tuning your Neural Network : https://community.jmp.com/t5/Discovery-Summit-Americas-2023/JMP-Pro-Neural-Network-Tuning-Add-In-202...

More generally, Machine Learning is a different mindset compared to "traditional" statistical modeling. Machine Learning emphasizes on predictive accuracy, with the choice and optimization of the best interpolating model with very few assumptions, whereas statistical modeling emphasizes on the understanding of the data generation process, requiring the choice of an appropriate model and checking the assumptions, and may require the use of replicates to distinguish what is variation of the response due to factors from what is random variation or noise.
Regarding the response FN: repetition is about making multiple response measurements on the same experimental run (same sample without any resetting between measurements), while replication is about making multiple independent randomized experimental runs (multiple samples with resetting between each runs) for each treatment combination. Repetitions only reduce the variation from the measurement system (by using the average of the repeated measurements), whereas replications reduce the total experimental variation (process + measurements) in order to provide an estimate for pure error and reduce the prediction error (with more accurate parameters estimates). If you still have some noise in your average response, that could probably indicate that most of the noise/nuisance comes from the process, not the measurement alone. Maybe some nuisance factors could be taken into consideration through blocking, or some parts of the experimental protocol could be improved. Are several operators doing the measurements ?
For some reasons, the link is not working for me. I don't know the conclusion of this article, but looking at the results, I can clearly see that there is no big difference between the regression model and the neural network used in terms of predictive accuracy when plotting the residuals :

You can look at the results with the datatable attached, as well as the modeling done (regression + neural network).
Using the Neural Network platform, I'm able to find a model that is quite interesting for predictive purpose (hide and exclude the 2 replicate runs before launching it and use the Y response mean by pattern), but which doesn't take into consideration the repeatability/reproducibility, so this may not be a robust model to use. Again, depending on your objective, you may lose something for predictive accuracy : with Neural Networks, you lose interpretability (as every factor is combined and used in activation functions of the NN, you can't assess the relative individual influence and importance of each factor) in favor of predictive accuracy.

If you're interested into combining models, I think averaging or ensembling different types of models may be more interesting than combining the same type of models, as they may suffer from same "theoritical" problems and assumptions (generalization performances, robustness, predictive performances, overfitting, ...). Combining different type of models may help averaging the errors and compensate drawbacks from certain models by advantages from others, and help homogeneize the optimum location found (which can be quite different between regression and neural networks).

I hope this more complete answer will help you

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)