Hi Alex,
Like Smoore2, I think your approaches are all reasonable and with small samples it's nice to try several techniques and hopefully obtain a consensus. In this case it doesn't appear you'll have that so I wanted to add a few thoughts about the choice between parametric and nonparametic tests, as well as your concern about the homogeneity of variances among your groups.
First, regarding the assumption of normality, I want to address your comment "Not to mention that making any assumptions about the normality of the data is meaningless when n = 3." Keep in mind that the assumption of normality is not about your sample being normally distributed, but rather that your sample is drawn from a population that is normally distributed. This is a subtle point but an important one. My sample could have the values: {5,5,5,5,5,5,6,6,6,6,6,6}, and the assumption of normality could be perfectly upheld IFF the population from which those values were drawn is normally distributed (this sample would be highly improbable, but that wouldn't make the assumption false, and wouldn't affect the meaningfulness of the parametric test results). If the population is normal, the sampling distribution of the sample mean is necessarily normal, which means our parametric tests will return p-values that are meaningful (other assumptions also being true). In cases where the population is not normal we can trust that the sampling distribution of the sample means will be normally distributed when sample sizes are "large" (thanks to the ever beautiful central limit theorem, though what counts as "large" for a given situation is somewhat debatable). You don't have sample sizes anywhere near large enough to fall back on the CLT, but I bring this up because I would like to ask: given your knowledge of whatever you are measuring, do you have any reason to believe the population is normal? For instance, if you were collecting data on IQ and you had only three IQ scores for each group, you could, without concern, trust your parametric tests since IQ is known to be normally distributed (indeed, the tests are structured and scored to ensure that property). If you can drawn on extra-statistical information to justify the assumption that the population from which your sample is drawn is normally distributed you don't need to, in a data-driven way, justify the assumption.
Second, regarding your observed unequal variances, how unequal are we talking about here? T-tests are not entirely robust to violations of homogeneity of variance, but false alarm rates do not increase considerably until you have quite unequal variances in your groups (and you have equal sample sizes, which helps with the robustness of the t-test to unequal variances). If you would like to go the route of generating all pairwise t-tests NOT assuming equal variances there is certainly a way we can make this happen through scripting OR you can use the local data filter (available under the red triangle in Fit Y by X >> Script >> Local Data filter) and then select two levels, produce the t-test (not assuming equal variances) and click through the different comparisons. Not entirely elegant, but it would eliminate the need for relaunching the platform after global filtering, or iteratively subsetting your data. In case you're interested, here's a quick video of how that looks:
Finally, it's worth clarifying (for yourself, not us) the root of your concerns here. Is your chief concern with these tests (in the context of small samples) that you might false alarm, or that you might miss a real effect? If it's the former, non-parametric tests are probably best, and alpha-controlled non-parmetric pairwise tests at that (like the Steel-Dwass); if it's the latter (and you have some reasons to think your assumptions are generally reasonable) then parametric tests will probably be better (and you might want to relax your criterion for statistical significance if this is exploratory work).
I hope some of this helps!
Julian