Level I

Help to explanation of Hierarchical Clustering - Ward's method

Hi all,


I try to understand the example of hierarchical clustering (picture below) with a distance matrix from JMP here:



Applying Ward's formula (picture below) from here: 

Wards method.png

My question: How does JMP calculate the initial distance of 58,689863 between NY and Philadelphia by this formula? I have tried to apply the formula myself, but can't reach the same result. Any help is much appreciated. 




Best regards, Thomas

Super User

In the example case you are giving it the distances, so JMP isn't really calculating the distances until clusters are created and it needs to calculate the distances between centroids.  So the expected distance should be 83 which is found in the flight data table.  This is the expected value, and it's exactly what you get when using any other method (Average, Centroid, Single, Complete).  Looking at all the distances the Ward method populates they are ~0.707 of what I'd expect them to be.  It appears JMP is applying some constant.  That said, it would be nice for someone on the JMP team to comment.