Solved: Re: Automate the neural network - Page 3

giaMSU · Feb 10, 2020 04:36 PM

Hi all,

I am working on a project using a neural network model to predict soil moisture in different locations (around 1000 ones) defined by their unique coordinates. The same structure for the neural network in JMP can be applied to all sites. So, I am thinking about automating the work. It can be a JMP script that can find the location coordinator (X, Y), extract necessary data inputs, and run the neural network. At the end, the script can save reports of the models. Do you have any pointers or advice you can share with me?

Thank you in advance.

Any suggestion or discussion is appreciated.

SDF1 · Oct 11, 2021 03:49 PM

Hi @Marco1 ,

Thank you. I am actually in the process of updating the JSL even further so that it generates the tuning table for the user. The issue with doing it manually is, yes, you need to know the ins and outs of the code pretty well. The problem you're having is that the code is expecting certain columns to be in the tuning data table, even if they are constant because each row contains the parameters that are fed to the modeling platform.

My guess is that when you created the tuning table JMP automatically dropped any constant terms. The resulting tuning table doesn't have the right columns in it and therefore will throw an error. I'm not the best programmer, so trying to go about all possible scenarios is taking some time.

In the meantime, before my next version comes out, I suggest you copy back the columns that JMP dropped and then fill the values to the end of the table. This should get you going. If you have any further issues, please let me know.

Hope this helps!,

DS

Marco1 · Oct 12, 2021 07:57 PM

Hi Diedrich,

I'm glad to greet you, it would be excellent if the GUI could perform multi-object optimization (pareto border) in all the hyperparameters that the user has to adjust and create more than 2 layers (deep learning)
A query: it is possible to make prediction of times series with the JMP neural networks?
The example I sent, that is, the adjustment table was created by JMP's custom DOE, I am just learning to use the NN and DOE, which columns do DOE eliminate? What columns should I place and in what table? That is, the data table or adjustment table?

Please could you send an example?

Thanks for your answer and help !!!
Greetings,

Marco

SDF1 · Oct 13, 2021 02:27 PM

Hi @Marco1 ,

I am not sure which platform in JMP one would use to do Pareto border DOE generation. In my GUI that I'm updating now, I call the Space Filling DOE platform, which optimally fills the hyperparameter space with points based on the low and high input values for the Space Filling DOE. The resulting data table that is generated saves all the fit statistics so one can evaluate each fit and even go back and run them manually in the platforms later.

To the best of my knowledge, JMP does not support more than two layers. I don't remember the details, but it was mentioned during one of the sessions in this past week's Discover Summit Americas. I think it's because when evaluating performance (especially predictive performance), more than two layers did not add much for the additional computation time in building the model. Also, please note that when boosting with NN, only one layer is allowed. Often, the number of nodes is more meaningful than the number of layers.

As far as I know you should be able to model time series with the NN platform. I do not see any limitation there. The only challenge of course, is interpretability of the NN model. It's much harder to interpret the X's -> Y's transformations in NNs.

I am attaching a basic NN tuning table (NN Tuning) and a Space Filling DOE to explore the hyperparameter space. Note that I have only two of the four penalty methods in my DOE. Also, I allowed every variable (except robust fit) to vary, but since boosting is only allowed for a single layer model, the code automatically eliminates the two-layer choices and reverts to a boosting option. As a final note, all 13 columns must be present in the tuning table that you select when running the GUI. If they are not present, it will throw an error.

The DOE platforms will remove any factor that is a constant by default. If a column is missing after generating the DOE, you must copy it back in from your factor table (and fill the value to the end of the table), hence the example where Robust Fit is held constant at 0. I used a random seed of 1234, with 120 runs to generate the Space Filling DOE.

Hope this helps,

DS

Marco1 · Oct 13, 2021 10:33 PM

Hi Diedrich,

Good explanation, a query: About the prediction of time series with JMP neural networks, how could a prediction of the next 2, 3, 4, 5 or more periods in the future be achieved, with 1, 2, 3 or more input columns? ... that is, without output columns.

Regarding multiobjective optimization (pareto border), I have attached a very good GUI that can be downloaded from the JMP community, maybe it will help in updating the GUI you are working on.

It would be great if the GUI could optimize the parameters of any platform that JMP has, including Time Series (ARIMA, Winter ...) avoiding the user having to do the work that some optimization could do.

Thanks for your help!

Greetings,

Marco

SDF1 · Oct 14, 2021 08:05 AM

Hi @Marco1 ,

I'm not sure that I follow why you would want to forecast a certain number of periods without having an output column. If you know that you have a good set of predictors for your model, whether it's NN or tree-based, or kNN, then I don't think that it really matters, it's all about whether or not the model does a good job predicting the next events. You have to feed a Y-column to the models or else they have no outcome to try and predict for the training and validation of the model. The platforms let you save the prediction formula to the data table or publish them to the Depot for comparison with other predictions you generate, so you can test them on new data and see which one is the better model -- that depends on what metric you are looking at and whether you're doing classification or regression.

I will have to look into the pareto border when I have some extra time, thank you for the add-in.

Although it would be great for a one-stop GUI to run all the different modeling platforms within JMP, that's a bit much for me to undertake right now. Perhaps in future versions I can add additional platforms. Right now, it can run NN, boosted tree, bootstrap forest, and XGBoost. I plan to add kNN, naive Bayes, and SVM to the list, too. It all takes time to work through bugs and whatnot, especially when calling multiple different platforms.

Thanks!,

DS

giaMSU · Mar 5, 2020 12:21 PM

Hi, DS,

Thank you for helping me. Honestly, I gave up and got it done manually.

I've tried script many times by either asking people or self-learning with JMP Scripting Index. For example, I used Get RSquare Validation option in the fit() or add rSquare = (report(obj)["Validation"]["mvalue"][NumberColBox(1)]<<get)[1];show (rSquare);

Both of them didn't work.

About your suggestion on randomly select the K-fold validation dataset. It is a great idea. However, I want to keep the consistency of the NN structure so that I can compare performances between NNs. Most of the papers that I read, using Artificial NN, do not mention how they select their validation dataset during iterations.

I will give it a tried to see any improvement in R2. I will let you know the result soon.

Best

SDF1 · Oct 13, 2021 02:35 PM

Hi @giaMSU ,

If you really want to compare different NNs with different validation schemes, I would set the random seed to some number, say 12345. That way, it will always start from the same point. Where it goes from there depends on the parameters used in the model -- and the validation set, but all will at least start from the same place. In my experience, having a good validation scheme is very important. If by selecting specific examples for training or validation, you might create a highly imbalanced data set that prefers one outcome over another, which would end up either overfitting on the training and not predicting the validation set well, or vice versa. You could even consider making K-fold validation columns, which is kind of a nested k-fold validation scheme, then you could look at the fit predictions and even fit statistics for each of the nested columns. Just a thought.

Good luck!,

DS

lala · Mar 4, 2020 08:48 AM

Learn the script.