Fitting the NN by Day means that within each subset, each fold is 12-15 observations, so the stability of the estimates will be adversely affected.
That said, the top predictors are generally the same and they exhibit similar profiles in both runs. You can remove some of these predictors, perhaps as many as half, from the model with little loss in performance but the resulting parsimonious model will be more stable over NN runs.
Also, you are fitting a very complex model. There are 18 hidden nodes. Is the response really that complicated? Over-fitting the model will also affect the stability of the runs.
If the response is time-dependent as you say, then it makes more sense to me to include Day as a predictor and use the larger, combined data set for better stability.