In addition to other problems mentioned in previous posts,

1) When testing equivalence of means, the sidedness of the test must be specified a priori. If not otherwise stated, testing for mean equivalence usually implies a 2-sided test, in which case alpha, the 5% type I error probability, is the sum of the area in the upper and lower tails of the null distribution. In almost all cases, this probability is divided equally, so the tail area to which the p-value should be compared should be alpha/2, which is 0.025, not 0.05.

** EDIT: JMP takes care of this for you by doubling the tail it found using the critical statistic, but if you were doing this "by hand", you would end up rejecting for a single-tail area of < 0.025.

2) "95% confidence in the results" is cringeworthy because it misunderstands the main idea behind a hypothesis test. You can be 100% confident that, in running a hypothesis test at the 0.05 level of significance, you will have employed a certain procedure (if you've done things correctly). When the means of the 2 distributions are actually the same, over the long term (and assuming all assumptions hold), 5% of the time that procedure will produce a test statistic which results in a conclusion of differing means, due to sampling variability.

3) There are only 2 possible conclusions for a hypothesis test of this nature: 1) "The sample data provides (compelling) evidence that the null hypothesis is false" and 2) "The sample data does not provide compelling evidence that the null hypothesis is false". Notably absent is the statement "Based on the sample data we conclude that the null hypothesis is true." Generally when you want to prove something, you frame what you are trying to prove as the alternative hypothesis, hoping that the data supports rejection of the null in favor of the alternative. If the researcher is trying to demonstrate equivalence in means, then two one-sided tests would be more appropriate.

4) Others have mentioned this but it bears repeating: statistical significance and practical significance need not be, and usually are not, the same thing. Practical significance should be discussed and agreed upon from square one, if not square zero.

@dale_lehman mentions that *"The problem with the correct interpretation is that it doesn't leave us able to say anything about the sample we actually have." *I'll disagree, but I know where he is going with this. Every statistical technique has a particular question it is designed to answer. Particularly in the case of a null hypothesis that the data has failed to reject, one of the main issues non-statisticians (and Bayesian statisticians, for that matter) have with frequentist hypothesis tests is that a p-value answers a question that is an unnatural question to ask, namely: "Under the null hypothesis, what is the probability that I would observe results as extreme as, or even more extreme than, the results I actually did observe?". Bayesians will say that their methods answer more natural questions, but Bayesian techniques have their own issues--there is no panacea. There is no substitute for understanding which techniques are better equipped to answer certain questions, and choosing a technique that is well-suited to the question you are trying to answer. The recent(ish) backlash against the p-value, especially in the medical research arena, strikes me a bit like a backlash against forks when they're used as eye patches. A p-value is a tool, like any other. Use it the right way, for the right job, and you'll be fine... use it the wrong way and "you'll shoot your eye out!".