I have a large dataset and I have been able to run a neural network on the data successfully after a long wait.
The data has thousands of input variables and about 700 observations.
What is the most efficient way or saving out the formulas? I tried "save formulas" but it was unable to produce a result after 24 hours of waiting, so I restarted jmp. I am going to try fast formulas or SAS dataset next, but thought I would ask the question anyways.
I was also wondering how to recreate the neural network I have in jmp in enterprise miner. The model has just a single layer with 3 tanH activation nodes, is boosted (model runs 40 times) with a learning rate of .6, and uses no penalty. If someone can get me going on how to repeat this in SAS (base or EM) that would be very helpful. I guess alternativly I could run the jmp code in enterprise miner, is that is possible?
I did a bit more testing with smaller Neural Networks, and that appears to be the most efficient computationally.
Thanks for the suggestion.
With thousands of input variables and 40 boosting steps I can imagine you have a massive and massively complicated formula.
If I may, I would suggest using a variable reduction technique such as Partition if you are using regular JMP or Bootstrap Forest, Boosted Tree or Generalized Regression if you have JMP Pro to get the number of input variables down to a more manageable number of the most important factors. Once you have those factors you can then run a Neural model as before with the reduced set of input variables. Not saying the model still won't be large and/or complicated, but I would bet it would be much smaller than your current NN model. Also, if you use those other techniques you can compare the models to see which one is best via Model Comparison in either the Formula Depot as Karen suggests or standalone Model Comparison.
Yeah, I should try the dimension reduction approach you mention. I am a little hesitant to believe it will work as the input data is similar to a time series with data that has a undetermined lag associated with it, so the relationship between all the input variables are a little "fuzzy" (observation one, input variable one is the first sample in the series for all observations, but I am using the NN to find patterns in the data instead of directly comparing the observations for each variable, if that makes sense) and so I am not 100% sure PCA or the suggested partition method would be suitable.
I think reducing my sample rate for each series and manually removing one series at a time may result in much less data, and an equivalent model.