Hi @AvgRegression52,
Your idea seems counter-intuitive for an unsupervised algorithm like K-Means, since it seems you already have apriori information about the different clusters/groups. Out of curiosity, why would you change the location of centroïds found by K-Means algorithm ?
There may be a workaround to get results like you expect :
- In your datatable, create two new rows ("Centroid1" and "Centroid 2") with the X and Y values you would like to have for these two centroïds (in your example, [5,5] and [100,100]).
- Then, create a numerical column "Weight", where all the previous observations are at a low value (for example : 1 or less), and the 2 newly added "centroïds" rows have very large values (for example : 1000000 or more).
- Then, when launching K-Means analysis platform, use X and Y as "Y, columns", and use the new column "Weight" in "Freq" or "Weight" to bias the centroïds locations. Launch K-Means with 3 clusters (or any number you prefer).
This way, the two newly added rows will have "artificially" a lot more frequency (or importance) than the other rows of your table, and two of the clusters centres will be heavily biased to be as close as possible to the coordinates of these two centroïds rows coordinates (see capture "Biased_K-Means").
Use this workaround with extra caution, as the last cluster centre location could be affected by this process, it depends on the distribution of points in the dimensions : in your example, the three clusters are well defined and far apart, so adding a lot more frequency/weight to two new "centroïd" coordinates values (that are at the extremes of the X and Y distributions) won't affect the location of the third cluster centroïd ("in the middle").
The datatable sample is also attached with the "biased" K-Means script if you want to have a look.
I hope this workaround will help you,
Victor GUILLER
L'Oréal Data & Analytics
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)