cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.

Simulating clusters using K Means - Negative Values

Alicia_500
Level I

Hi,

When I simulate clusters from the K Means platform I get some negative simulated values for one of my variables which, in practical terms, can only be positive.

 

Looking at the original distribution of this variable, it is non-normal and bounded at zero (so something like a log-normal distribution fits it well).

 

Is there a way to ensure the data generated from the cluster simulation remains positive?

 

Many thanks,

 

Alicia

1 REPLY 1
Victor_G
Super User

Re: Simulating clusters using K Means - Negative Values

Hi @Alicia_500,

 

Welcome in the Community !

 

Clustering can be done with different algorithms, depending on your objectives, data types, and the criterion on which you are creating the clustering : based on distributions, on points density, on hierarchical structures/links between points, ...

You can have a look at available algorithms based on your data types here : Overview of Platforms for Clustering Observations

 

If you need more infos about how to use the different algorithms, you can watch this video : Clustering | JMP 

There is also a very nice blog by @Chelsea-Parlett explaining the differences between clustering methods : Clustering methods for unsupervised machine learning (jmp.com)

 

Concerning your use case, with the relative low information provided and absence of data to test some approaches, I think K-Means may not be the best suitable clustering techniques as you're facing different distributions with different "spread". K-Means creates spherical clusters, as it doesn't assume any differences on the distributions.

You could try using Normal Mixtures, as it will be influenced by distributions and variances differences of your features or Hierarchical Cluster, that doesn't assume any distributions for clustering. You could compare the outcomes of the clustering to see which one(s) make more sense, and the agreement between each method.

 

Hope I did understand your situation,  

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)