I find the JMP 13 manual description of the boosting procedure in neural networks to be vague. Here is a snap of it:
"The first step is to fit a one-layer, two-node model. The predicted values from that model are scaled by the learning rate, then subtracted from the actual values to form a scaled residual. The next step is to fit a different one-layer, two-node model on the scaled residuals of the previous model."
I don't understand how JMP is using residuals from the predicted values. A mathematical formula or a diagram could be extremely helpful.
Is JMP adding the residuals to defined features and train the network again? or does if train the chained networks of residuals alone? For example, if I have 10 features and I am trying to predict a single continuous response and my network has one hidden layer of 3 neurons (aka nodes); the first neural net in a boosted network would be of shape [10, 3, 1]. Now JMP calculates the residuals (True values - Predicted values) from the last layer. Now, what happens to the residuals? What is the shape of the second network, what is its inputs and what is it predicting?
Thank you for your time in advance.