Most statisticians I know don't take the Shapiro-Wilk test too seriously since it can be ridiculously over-sensitive, especially for large sample sizes. A lot of things that fail Shapiro-Wilk are normal enough to work well with parametric methods that assume normality. I always recommend looking at the normal quantile plot and using good judgement.
Additionally, you must absolutely check normality on residuals. If you have any effects in your data, the whole lump of data is probably not going to look normal. The assumption is around the errors, not the raw data values. Imagine if your model is 2 groups where there is a large group effect. Each group may be normally distributed, but the whole data set will look bi-modal.
-- Cameron Willden