When I cluster the same exact dataset in JMP and in R, using Ward's method, I get results that are non-identical. R provides two options (ward.D and ward.D2), wherein the latter squares the values in the distance matrix before clustering them. According to Murtagh & Legendre (2014), that's what JMP does too. But even when I specify method="ward.D2" in R, the output I get is non-trivially different: roughly 15% of the observations are mismatched with respect to what I get from JMP. So, Question 1: Am I wrong to expect that the same method should produce the same results in HCA? I wouldn't expect that in k-means clustering (without setting a seed, at least), but for HCA the results are totally stable *within* a given package, so I assumed they should also be the same *between* packages. Question 2: If they *should* be the same... why aren't they? Possibility A: there's a difference in how the distance matrix is calculated Possibility B: there's a difference in how Ward's method is implemented Possibility C: the cutree() function in R does something different than the "Select number of clusters" function in JMP Possibility X... ??? I've uploaded the dendrogram from R (apologies if that's poor netiquette!); you can see the problem I'm talking about by taking a close look at the labels along the x-axis. Those labels all end in a digit from 1-5, which represents the number of the cluster they were assigned to in JMP. As you can see, the green group ("3") was the only one where there was a perfect 1:1 match between JMP and R. Grateful for any guidance, -Matt
... View more