Solved: should i use principal component analysis or k-means cluster analysis? - Page 2

Report Inappropriate Content · Jun 17, 2023 03:39 PM

Hi - I am trying to decide the best method of cluster analysis (e.g. principal component analysis, k-means, etc) to use for the following situation. I have a mapped dataset with 12,928 records, each corresponding to a well with sample results. Each row of data (each location on my map) has a well name, latitude, longitude, and results of compound A, compound B, compound C, compound D, compound E, etc (8 chemical compounds in all). These wells have been contaminated by one of three sources: 1) air deposition, 2) process waste, or 3) a combination of the two (mixed). And each source is associated with a unique source signature (e.g. the air deposition source tends to have high compound X and low compound Y, while process water tends to have high compound Y and Z and low compound X.). So, each row (i.e. well location) of my dataset is associated with one of the three sources of contamination. My goal is to identify which source is most likely for each record (i.e. well location) in my dataset.

Importantly, a subset of ~800 records in my dataset are known to be associated with the air deposition source signature. As such, this subset of data can be used as a training set for the air deposition signature. I can also come up with a subset of ~50 records that are representative of the process waste source signature.

Any suggested approaches in JMP or JMP Pro would be greatly appreciated. Thanks in advance!

P_Bartell · Jun 20, 2023 02:46 PM

An additional thought and maybe wacky idea...if this is a real world issue you are working with and not some made up academic exercise once you've got some clusters identified, and the geographic locations associated with each well, use Graph Builder mapping tools to produce density maps of the wells. Then with a little snooping I bet you might be able to take a reasonable guess as to the source pathway for the compounds. As luck would have it, and mind you I'm nowhere near qualifed/trained/educated as an environmental geologic engineer, I just finished reading a book about the Love Canal (Niagara Falls NY) affair. I would wager that any wells there or any plumes nearby would have some similar contamination profiles...and the source is pretty obvious...process waste that was buried and left to percolate for years. Ah to be a JMP Systems Engineer again...I always liked working on these sorts of 'messy' problems with my customers.

should i use principal component analysis or k-means cluster analysis?

Re: should i use principal component analysis or k-means cluster analysis?

Recommended Articles

Get Going with JMP: Essentials for Using JMP