Simulating clusters using K Means - Negative Values

Report Inappropriate Content

Hi,

When I simulate clusters from the K Means platform I get some negative simulated values for one of my variables which, in practical terms, can only be positive.

Looking at the original distribution of this variable, it is non-normal and bounded at zero (so something like a log-normal distribution fits it well).

Is there a way to ensure the data generated from the cluster simulation remains positive?

Many thanks,

Alicia

Victor_G · Mar 14, 2025 3:14 AM

Hi @Alicia_500,

Welcome in the Community !

Clustering can be done with different algorithms, depending on your objectives, data types, and the criterion on which you are creating the clustering : based on distributions, on points density, on hierarchical structures/links between points, ...

You can have a look at available algorithms based on your data types here : Overview of Platforms for Clustering Observations

If you need more infos about how to use the different algorithms, you can watch this video : Clustering | JMP

There is also a very nice blog by @Chelsea-Parlett explaining the differences between clustering methods : Clustering methods for unsupervised machine learning (jmp.com)

Concerning your use case, with the relative low information provided and absence of data to test some approaches, I think K-Means may not be the best suitable clustering techniques as you're facing different distributions with different "spread". K-Means creates spherical clusters, as it doesn't assume any differences on the distributions.

You could try using Normal Mixtures, as it will be influenced by distributions and variances differences of your features or Hierarchical Cluster, that doesn't assume any distributions for clustering. You could compare the outcomes of the clustering to see which one(s) make more sense, and the agreement between each method.

Hope I did understand your situation,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Simulating clusters using K Means - Negative Values

Re: Simulating clusters using K Means - Negative Values