cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
learning_JSL
Level IV

should i use principal component analysis or k-means cluster analysis?

Hi - I am trying to decide the best method of cluster analysis (e.g. principal component analysis, k-means, etc) to use for the following situation.  I have a mapped dataset with 12,928 records, each corresponding to a well with sample results.  Each row of data (each location on my map) has a well name, latitude, longitude, and results of compound A, compound B, compound C, compound D, compound E, etc (8 chemical compounds in all).  These wells have been contaminated by one of three sources:  1) air deposition, 2) process waste, or 3) a combination of the two (mixed).  And each source is associated with a unique source signature (e.g. the air deposition source tends to have high compound X and low compound Y, while process water tends to have high compound Y and Z and low compound X.).  So, each row (i.e. well location) of my dataset is associated with one of the three sources of contamination.   My goal is to identify which source is most likely for each record (i.e. well location) in my dataset.   

 

Importantly, a subset of ~800 records in my dataset are known to be associated with the air deposition source signature.  As such, this subset of data can be used as a training set for the air deposition signature.  I can also come up with a subset of ~50 records that are representative of the process waste source signature.

 

Any suggested approaches in JMP or JMP Pro would be greatly appreciated.  Thanks in advance! 

 

10 REPLIES 10
P_Bartell
Level VIII

Re: should i use principal component analysis or k-means cluster analysis?

An additional thought and maybe wacky idea...if this is a real world issue you are working with and not some made up academic exercise once you've got some clusters identified, and the geographic locations associated with each well, use Graph Builder mapping tools to produce density maps of the wells. Then with a little snooping I bet you might be able to take a reasonable guess as to the source pathway for the compounds. As luck would have it, and mind you I'm nowhere near qualifed/trained/educated as an environmental geologic engineer, I just finished reading a book about the Love Canal (Niagara Falls NY) affair. I would wager that any wells there or any plumes nearby would have some similar contamination profiles...and the source is pretty obvious...process waste that was buried and left to percolate for years. Ah to be a JMP Systems Engineer again...I always liked working on these sorts of 'messy' problems with my customers.