Hi P_Bartell - thanks very much for your reply. I'll try to clarify my objective. Each record represents a mapped well sample showing the results (concentrations) of many different PFAS compounds (each compound is a different field). Three sources of contamination are possible in my study area: (1) an air deposition source, which presumably is associated predominantly with the compound "PMPA", (2) a process waste source, which presumably is associated with "PFMOAA", and (3) a mixture of both air deposition and process waste. Of course, the world is not so neat and tidy, so both of those compounds (PMPA and PFMOAA) will occur in nearly all of the wells, regardless of their proximity to a either source, plus other compounds of course. It's the relative concentrations of the compounds that will help identify whether the well is impacted by one source or the others. And, it may be that other compounds are better surrogates or indicators?....tbd.
My objective is to decide which areas (i.e. wells) are contaminated by which source. (The spreadsheet I attached is just a sample of the data but gives you an idea of my data format.) Importantly, wells farther away from a source will naturally have decreasing concentrations of the different compounds that make up that well, but the relative ratios and compounds associated with that source should continue to hold. So a well's location is part of the story (i.e. where a well is can effect its compound makeup).
So, given the above, if you still think that discriminant analysis (DA), for example, is a good way to go, I have a couple of questions. (1) Aren't my samples (well results) supposed to be independent? And based on my description above, would they be? It seems that they are not truly independent as those close to one source type would be more like those located close to a different source. Thoughts? (2) The concentrations of each of my PFAS compounds (independent variables) is supposed to be normally distributed. Given that these are environmental contaminant data, this assumption is often not met, sometimes even with log normalizing. Is this a problem? (3) It seems that my predictor variables (PFAS compounds) would naturally be colinear. In other words, a given source would be associated with a few compounds that rise and fall together, depending on proximity to that source. Is this a problem? Thanks in advance!