Solved: Order of input columns affects results in Neural platform

Cate · Dec 7, 2017 1:52 PM

I am working on a group project for class and once one member put the columns in our file in a different order, their Neural Net model is giving much worse test misclassification results from 16% to 18%. Why would this be happening?

mia_stephens · Oct 6, 2017 6:50 AM

Hi Cate,

The order in which the factors are listed in the model dialog DOES affect the starting values of the neural network and will produce different results. The difference in the results is related to the difference of the assigned weights to the factors during the first trial of the neural net. The only way to ensure that the same results are obtained is the use the same factors in the same order with the same seed.

Hope you're enjoying the class!

Mia

View solution in original post

ian_jmp · Oct 2, 2017 05:21 AM

Many of the algoithms used in predictive modeling necessarily have a random aspect. JMP provides a number of ways to make results 'reproducible', the details depending on exactly what you have done (look for references to 'Random Seed' in the User Interface and check out the documentation). Almost certainly the variation you have seen is for this reason, and nothing to do with shuffling columns.

Having said that, it's a good supplementary question to ask whether it's deirable to force results to be reproducible, or whether one should embrace this 'algorithmic variability'.

Cate · Oct 2, 2017 09:23 AM

Thank you for your response. We will indeed revisit the documentation that you referenced. And ultimately like you said, we may just need to embrace this 'algorithmic variability’.

By way of explanation, the desire to force reproducible results, while not appropriate in a professional setting, is highly desirable in an academic setting. This is a group project for a graduate school class. Unfortunately, the team members all reside in different states and are trying to work in concert with each other to develop predictive models. For Neural Nets, we are all using the same random seed with the Random Seed Add-In that, per our professor, was designed by Mia Stephens Academic Ambassador with JMP for the purposes of being used in the academic arena. It seems strange that the same person running the same model with the same random seed, will get such different responses from one day to the next (and always worse results). When run in the same day, the model reliably produce the same test misclassification rate. The only change, other than shutting the computer down, was to re-order the columns to prepare the file for an additional dump of data. In addition, other team members can’t reproduce the results running that same model.

This has forced us to investigate if there is a problem with the model, or if it is attributable to randomness, which we thought was eliminated via the Random.

Again, thank you for your response earlier.

ian_jmp · Oct 2, 2017 6:58 AM

Understood!

It's good you know about @mia_stephens' add-in. She will be able to speak to what it does and doesn't do (I understand there were some revisions). I'll direct her to this thread.

Cate · Oct 2, 2017 09:58 AM

You are WONDERFUL!! Thank you so much!

I am so impressed with the fantastic customer service and responsiveness.

Have a great week.

mia_stephens · Oct 2, 2017 10:11 AM

Hi Cate,

Thanks for posing this question. Neural networks have a random component - it uses random starting values for the weights, and then iteratively updates the weights to improve the performance of the model. The add-in, if run just prior to running a new model from within the Neural platform, simply tells JMP to start at the same starting values for the weights (if the same model is run again). The add-in is no more than a simple line of JMP code (JSL). I would guess that if the columns are shuffled in a different order you might see different results (I'll confirm this). I'm assuming you have the same partitioning into Training, Validation (and Test)?

I hope you're enjoying the class!

Mia

Dan_Obermiller · Oct 2, 2017 05:21 PM

One final, perhaps obvious check, is that the neural network routines REQUIRE a validation set. If you do not have a validation set specified, JMP will automatically RANDOMLY pick your validation set. To avoid this, create a column specifying your validation set and specify that when running the neural network platform. This along with Mia's add-in should allow you to get reproducible results.

Dan Obermiller

Cate · Oct 2, 2017 06:13 PM

Hi, yes, we do have a validation column.

mia_stephens · Oct 6, 2017 6:50 AM

Hi Cate,

The order in which the factors are listed in the model dialog DOES affect the starting values of the neural network and will produce different results. The difference in the results is related to the difference of the assigned weights to the factors during the first trial of the neural net. The only way to ensure that the same results are obtained is the use the same factors in the same order with the same seed.

Hope you're enjoying the class!

Mia

Order of input columns affects results in Neural platform

Re: Re-Ordered columns and now model is giving diferent results

Re: Re-Ordered columns and now model is giving diferent results

Re: Re-Ordered columns and now model is giving diferent results

Re: Re-Ordered columns and now model is giving diferent results

Re: Re-Ordered columns and now model is giving diferent results

Re: Re-Ordered columns and now model is giving diferent results

Re: Re-Ordered columns and now model is giving diferent results

Re: Re-Ordered columns and now model is giving diferent results

Re: Re-Ordered columns and now model is giving diferent results