Most of the statistics we are taught were developed 80-100 years ago, when small samples were the rule. So the issue was, "What can I convincingly learn from or decide with a small sample?" Inferential techniques started then. Such situations still exist (e.g., no data available, need to experiment...) but a new situation arose 25 years ago. We now have massive databases from which to learn. Inference is pointless. Everything is statistically significant. That is, even the smallest differences or smallest parameter estimates are statistically significant. So, predictive modeling does not try to infer significance. Instead, the focus is on predictive performance through feature / model selection. It is a different challenge and requires a different mindset.
Also, in this specific case, why is a normal distribution assumed? What is supposed to be normally distributed? Data? An estimate? And what are the consequences of non-normality? How much of a departure from normal is necessary to compromise the result or decision? How robust is the method?