SA https://datatab.net/tutorial/hierarchical-cluster-analysis
Cluster distance
Method to get distance
\begin{eqnarray*} d_{euc} (x, y) & = & \sqrt{ \sum_{i=1}^{n}(x_{i} - y_{i})^2 } \\ d_{man} (x, y) & = & \sum_{i=1}^{n} | (x_{i} - y_{i}) | \\ d_{cor} (x, y) & = & 1 - \frac { \displaystyle \sum_{i=1}^{n}(x_{i} - \overline{x}) (y_{i} - \overline{y})} { \sqrt{ \displaystyle \sum_{i=1}^{n}(x_{i} - \overline{x})^2 \displaystyle \sum_{i=1}^{n}(y_{i} - \overline{y})^2 }} \\ d_{eisen} (x, y) & = & 1 - \frac {\left| \displaystyle \sum_{i=1}^{n} x_{i} \; y_{i} \right| } { \sqrt{ \displaystyle \sum_{i=1}^{n}x_{i}^{2} \displaystyle \sum_{i=1}^{n} y_{i}^2 }} \\ d_{kend} (x, y) & = & 1- \displaystyle \frac { n_{c} - n_{d} } { \displaystyle \frac{1}{2} n(n-1)} \\ \end{eqnarray*}
There are many R functions for computing distances between pairs of observations:
# Subset of the data set.seed(123) ss <- sample(1:50, 15) # Take 15 random rows df <- USArrests[ss, ] # Subset the 15 rows df.scaled <- scale(df) # Standardize the variables dist.eucl <- dist(df.scaled, method = "euclidean") plot(dist.eucl) # Reformat as a matrix # Subset the first 3 columns and rows and Round the values round(as.matrix(dist.eucl)[1:3, 1:3], 1) # Compute library("factoextra") dist.cor <- get_dist(df.scaled, method = "pearson") # Display a subset round(as.matrix(dist.cor)[1:3, 1:3], 1) library(cluster) # Load data data(flower) head(flower, 3) # Data structure str(flower) # Distance matrix dd <- daisy(flower) round(as.matrix(dd)[1:3, 1:3], 2) library(factoextra) fviz_dist(dist.eucl)