SA https://datatab.net/tutorial/hierarchical-cluster-analysis

Cluster distance

Method to get distance

\begin{eqnarray*} d_{euc} (x, y) & = & \sqrt{ \sum_{i=1}^{n}(x_{i} - y_{i})^2 } \\ d_{man} (x, y) & = & \sum_{i=1}^{n} | (x_{i} - y_{i}) | \\ d_{cor} (x, y) & = & 1 - \frac { \displaystyle \sum_{i=1}^{n}(x_{i} - \overline{x}) (y_{i} - \overline{y})} { \sqrt{ \displaystyle \sum_{i=1}^{n}(x_{i} - \overline{x})^2 \displaystyle \sum_{i=1}^{n}(y_{i} - \overline{y})^2 }} \\ d_{eisen} (x, y) & = & 1 - \frac {\left| \displaystyle \sum_{i=1}^{n} x_{i} \; y_{i} \right| } { \sqrt{ \displaystyle \sum_{i=1}^{n}x_{i}^{2} \displaystyle \sum_{i=1}^{n} y_{i}^2 }} \\ d_{kend} (x, y) & = & 1- \displaystyle \frac { n_{c} - n_{d} } { \displaystyle \frac{1}{2} n(n-1)} \\ \end{eqnarray*}

There are many R functions for computing distances between pairs of observations:

# Subset of the data
set.seed(123)
ss <- sample(1:50, 15)   # Take 15 random rows
df <- USArrests[ss, ]    # Subset the 15 rows
df.scaled <- scale(df)   # Standardize the variables

dist.eucl <- dist(df.scaled, method = "euclidean")
plot(dist.eucl)

# Reformat as a matrix
# Subset the first 3 columns and rows and Round the values
round(as.matrix(dist.eucl)[1:3, 1:3], 1)


# Compute
library("factoextra")
dist.cor <- get_dist(df.scaled, method = "pearson")

# Display a subset
round(as.matrix(dist.cor)[1:3, 1:3], 1)

library(cluster)
# Load data
data(flower)
head(flower, 3)
# Data structure
str(flower)

# Distance matrix
dd <- daisy(flower)
round(as.matrix(dd)[1:3, 1:3], 2)

library(factoextra)
fviz_dist(dist.eucl)