How Nearest Centroid Classifier works?

The Nearest centroid classifier (NCC) says that we should classify a data point to a class whose centroid is closest to this data point.

The algorithms follows:

Suppose cl represents the set of indices which belong to class l. And n = |cl|

1.Training step :

We compute the centroids(CTs) for each of the classes as:

CTl = 1niclxi

2.Prediction step :

a. Given a new data point xnew, compute the distance between xnew and each centroids as

distance : ||xnewCT_l||2 (Euclidean distance)

b. Assign the class to this new point which has minimum distance value.

Let us taken an example. We have to classify fruits into two classes : Apple and Orange, based on their height and width.

Our inputs (x) are :

x1=[5,6], x2=[5,7], x3=[4,3], x4=[5,7], x5=[6,4] and corrresponding labels (y) are

𝑦1='AP' 𝑦2='AP' 𝑦3='AP' 𝑦4='ORG' 𝑦5='ORG'

Here xi = [width, height] , 'AP' = 'Apple', 'ORG' = 'Orange'.

Now, centroids for two classes are :

CTAP = 13(5+5+4,6+7+3) = (143,163)

CTORG = 12(5+6,7+4) = (112,112)

Suppose, you got a new test data point : (3, 7) i.e xnew, and you want to classify this point. We can calculate the distance between new point and our centroids as:

||xnewCTAP|| = || (3,7) - (143,163) || = 2.357

||xnewCTORG|| = || (3,7) - (112,112) || = 2.915

Here, the new data point is classified as 'Apple' as the new data point is closest to the centroid of data points that belong to class 'Apple'

In [ ]:
 

Comments