- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Simulating clusters using K Means - Negative Values
Hi,
When I simulate clusters from the K Means platform I get some negative simulated values for one of my variables which, in practical terms, can only be positive.
Looking at the original distribution of this variable, it is non-normal and bounded at zero (so something like a log-normal distribution fits it well).
Is there a way to ensure the data generated from the cluster simulation remains positive?
Many thanks,
Alicia
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Simulating clusters using K Means - Negative Values
Hi @Alicia_500,
Welcome in the Community !
Clustering can be done with different algorithms, depending on your objectives, data types, and the criterion on which you are creating the clustering : based on distributions, on points density, on hierarchical structures/links between points, ...
You can have a look at available algorithms based on your data types here : Overview of Platforms for Clustering Observations
If you need more infos about how to use the different algorithms, you can watch this video : Clustering | JMP
There is also a very nice blog by @Chelsea-Parlett explaining the differences between clustering methods : Clustering methods for unsupervised machine learning (jmp.com)
Concerning your use case, with the relative low information provided and absence of data to test some approaches, I think K-Means may not be the best suitable clustering techniques as you're facing different distributions with different "spread". K-Means creates spherical clusters, as it doesn't assume any differences on the distributions.
You could try using Normal Mixtures, as it will be influenced by distributions and variances differences of your features or Hierarchical Cluster, that doesn't assume any distributions for clustering. You could compare the outcomes of the clustering to see which one(s) make more sense, and the agreement between each method.
Hope I did understand your situation,
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)