Clustering

Make sure yous system has rattle.data package installed: ‘install.library(“rattle.data”)’, then load it and load weather data set:

require(rattle.data)
## Loading required package: rattle.data
help(weather)
data(weather)

The dist() function produces dissimilarity/distance matrix which is the input of hierarchical clustering in R:

hc <- hclust(dist(weather), method="ave")
## Warning in dist(weather): NAs introduced by coercion
plot(hc)

You may later decide that seven cklusters is OK:

plot(hc)
rect.hclust(hc, k=7)

groups <- cutree(hc, k=7)

Exercise: use the built in iris dataset and produce a hierarchical cluster.

Nearest neighbour clustering: K-Means

To apply this on iris dataset first remove the species column, which is itself a cluster:

iris2 <- iris
iris2$Species <- NULL

Now produce the clusters

kmeans.result <- kmeans(iris2, 3)

Now plot the results where cluster is used for coloring:

plot(iris2$Sepal.Length, iris2$Sepal.Width, col = kmeans.result$cluster)
points(kmeans.result$centers[,c("Sepal.Length", "Sepal.Width")], col=1:3, pch=8, cex=2)