Videos

Topic Access
Hierarchical Clustering | Stanford University (15 min) https://www.youtube.com/watch?v=rg2cjfMsCk4
Hierarchical Clustering in R (44 min) https://www.youtube.com/watch?v=9U4h6pZw6f8
   
   
The k Means Algorithm | Stanford University (13 min) https://www.youtube.com/watch?v=RD0nNK51Fp8
K Means Clustering: Pros and Cons of K Means Clustering ( 24 min) https://www.youtube.com/watch?v=YIGtalP1mv0
 Clustering: K-means and Hierarchical (customer segmentation) (17 minutes) optional  https://www.youtube.com/watch?v=QXOkPvFM6NU

 

 

Articles

Topic Access
Hierarchical Clustering in R https://www.datacamp.com/community/tutorials/hierarchical-clustering-R
Hierarchical Cluster Analysis https://uc-r.github.io/hc_clustering
K-Means Clustering in R https://www.datanovia.com/en/lessons/k-means-clustering-in-r-algorith-and-practical-examples/
K-Means Clustering in R Tutorial https://www.datacamp.com/community/tutorials/k-means-clustering-r
K-means Cluster Analysis https://uc-r.github.io/kmeans_clustering
   

https://search.r-project.org/CRAN/refmans/factoextra/html/fviz_nbclust.html

library(factoextra)

library(cluster)

#Find the optimum Number of clusters

#Partitioning methods, such as k-means clustering require the users to specify the number of clusters to be #generated.
#fviz_nbclust(): Dertemines and visualize the optimal number of clusters using different methods: within cluster #sums of squares, average silhouette and gap statistics.
# one of the following is enough to tell us optimum k

#df is better to be a two column data frame because we can later visualize the clusters in a two dimensional space

factoextra::fviz_nbclust(x=df, FUNcluster = kmeans, method = “silhouette”)
factoextra::fviz_nbclust(x=df, FUNcluster = hcut, method = c(“silhouette”))
factoextra::fviz_nbclust(x=df, FUNcluster = kmeans, method = c(“silhouette”))
factoextra::fviz_nbclust(dx=f, FUNcluster = luster::pam, method = c(“silhouette”))
factoextra::fviz_nbclust(x=df, FUNcluster = cluster::clara, method = c(“silhouette”))

########################################################

 

#to visualize the clusters, create a distance matrix
DistanceMatrix<-factoextra::get_dist(df, method = “euclidean”)

 

 

 

####################################

#create a Hierarchical cluster object

#method: one of
#”ward.D”, “ward.D2”, “single”, “complete”,
#”average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC).

#stats::hclust  performs a hierarchical cluster analysis using a set of dissimilarities for the n objects being clustered. #1)Initially, each object is assigned to its own cluster and then the algorithm proceeds iteratively,
#2) at each stage joining the two most similar clusters,
# 3)continuing until there is just a single cluster. 

Hclustermodel<-stats::hclust(DistanceMatrix, method=”ward.D”)

#cut it to K=2

#stats::cutree Cuts a Tree into Groups of Data is necessary for Hierarchical  Cluster Objects

#it is at this step that a cluster is assigned to each point

CutModel<-stats::cutree(Hclustermodel, k=2)

#CutModel will contain the cluster each point belongs to

CutModel

# Visualizing plotting the cut cluster

plot.new()

#factoextra::fviz_cluster Can’t handle an object of class hclust therefore we have to use the list trick

factoextra::fviz_cluster(list(data = df, cluster = CutModel))

 

#for hierarchical clustering, typically resulting from agnes() or diana()

hc<-cluster::agnes(df, method = “ward”)

cluster::pltree(hc, cex = 0.6, hang = 1, main = “Dendogram Hierachical Cluster made by anges Agglomerative Nesting”)

 

#####################################

#Kmeans clustering

#Kmeans model model contains the mean of each cluster in a n dimensional space and Each point assigned to a cluster

#Perform k-means clustering on a data matrix.

Kmeans_result<- stats::kmeans(df, centers=3)

#Visualize the clusters

#object = an object of class “partition” created by the functions pam(),
#clara() or fanny() in cluster package;
#”kmeans” [in stats package];
#”dbscan” [in fpc package];
#”Mclust” [in mclust]; “hkmeans”,
#”eclust” [in factoextra].
#Possible value are also any list object with data and cluster components (e.g.: object = list(data = mydata, cluster = myclust)).

factoextra::fviz_cluster(object=Kmeans_result, data = df)

If there are more than two dimensions (variables)  in the data frame, fviz_cluster will perform principal component analysis (PCA) and plot the data points according to the first two principal components that explain the majority of the variance.

https://uc-r.github.io/hc_clustering

https://uc-r.github.io/kmeans_clustering

https://stackoverflow.com/questions/65844978/caret-classification-thresholds