I am currently in the process of wrapping up the final practices of 7.3 Neural Networks in the Predictive Modeling and Text Mining module in the JMP STIPS course. Fitting a neural network and understanding the basic function of a node in the hidden layer as a regression model appears intuitively easy. Yet, I am wondering whether and hoping that someone with more insight in the community could provide answers to the questions I still have after the very basic training I just underwent. For the example I want to discuss I used file GreenPolymerProduction.jmp and the standard version of JMP. A copy of the file including the model is attached.
The diagram of the model is shown below. We can see that data for each of the eight predictors is fed into each of the three nodes in the hidden layer. Each node in the layer uses a TanH activation function.
I've saved the formulas for the model to the table. The screenshots of the formulas used in each of the three nodes are shown below.
Question #1: How can I interpret the fact that each node has a different formula for the same activation function?Intuitively it appears naive to expect that the same formula should be used in each node, but I am still wondering how JMP arrives at the results shown below. One interpretation I thought of was that each node receives different input data, which would lead me to expect different formulas as shown below, but this assumption may be grossly wrong.
Question #2: Anyone who can share good references (using preferably phenomenological explanations) for reading up on neural networks for beginners?