Solved: Re: Boosted Neural Network in JMP 13 Pro

pans · Mar 19, 2018 02:02 PM

I find the JMP 13 manual description of the boosting procedure in neural networks to be vague. Here is a snap of it:

"The first step is to fit a one-layer, two-node model. The predicted values from that model are scaled by the learning rate, then subtracted from the actual values to form a scaled residual. The next step is to fit a different one-layer, two-node model on the scaled residuals of the previous model."

I don't understand how JMP is using residuals from the predicted values. A mathematical formula or a diagram could be extremely helpful.

Is JMP adding the residuals to defined features and train the network again? or does if train the chained networks of residuals alone? For example, if I have 10 features and I am trying to predict a single continuous response and my network has one hidden layer of 3 neurons (aka nodes); the first neural net in a boosted network would be of shape [10, 3, 1]. Now JMP calculates the residuals (True values - Predicted values) from the last layer. Now, what happens to the residuals? What is the shape of the second network, what is its inputs and what is it predicting?

Thank you for your time in advance.

Phil_Kay · Mar 21, 2018 12:53 PM

Okay, thanks for clarifying.

The second stage in boosting is using the same predictors as in the first stage. The difference is in the response that it is fitting. In the second stage the response is the residual from the first stage.

The result is that you add the same number of hidden nodes again for each boost.

Phil

View solution in original post

Phil_Kay · Mar 20, 2018 09:14 AM

Hi, the general approach of boosting in machine learning is to first fit a "weak predictor" model to Y. The next model then fits to the residuals of the first model. The combination of these two models is then a better model than either of the original models. Assuming there is a still a residual between Y(predicted) and Y(actual) the third model will then fit to these residuals and is then combined with the models 1 and 2. This continues until the goodness-of-fit stops improving (according to whichever measure you are using).

pans · Mar 21, 2018 12:37 PM

Hi Phil, thank you for your reply. The input to the second neural network is still a little unclear to me. So would you say that the second neural network would be working with only 1 feature (the residuals) even if my initial dataset had 100 measurements and I was using them as features?

Phil_Kay · Mar 21, 2018 12:41 PM

Hi, can you clarify: are the 100 “measurements” 100 variables (columns) or 100 observations (rows)?
Thanks,
Phil

pans · Mar 21, 2018 12:44 PM

let's say I start with 100 different measurements (columns) and 200 observations (rows) trying to predict 1 continuous response (column). I assume that the first neural net would take in all 100 columns and output 1 prediction. Would the second neural net take in 1 column of residuals and also output 1 column of predictions?

Phil_Kay · Mar 21, 2018 12:53 PM

Okay, thanks for clarifying.

The second stage in boosting is using the same predictors as in the first stage. The difference is in the response that it is fitting. In the second stage the response is the residual from the first stage.

The result is that you add the same number of hidden nodes again for each boost.

Phil