The Nearest centroid classifier (NCC) says that we should classify a data point to a class whose centroid is closest to this data point.

The algorithms follows:

Suppose $ c_l $ represents the set of indices which belong to class l. And n = |$c_l$|

1.Training step :¶

We compute the centroids(CTs) for each of the classes as:

$CT_l$ = $ \frac{1}{n} \sum_{i \in c_l} x_i $

2.Prediction step :¶

a. Given a new data point $x_{new}$, compute the distance between $x_{new}$ and each centroids as

distance : $|| x_{new} - $CT_l$||_2$ (Euclidean distance)

b. Assign the class to this new point which has minimum distance value.

Let us taken an example. We have to classify fruits into two classes : Apple and Orange, based on their height and width.

Our inputs (x) are :

x1=[5,6], x2=[5,7], x3=[4,3], x4=[5,7], x5=[6,4] and corrresponding labels (y) are

𝑦1='AP' 𝑦2='AP' 𝑦3='AP' 𝑦4='ORG' 𝑦5='ORG'

Here $x_i$ = [width, height] , 'AP' = 'Apple', 'ORG' = 'Orange'.

Now, centroids for two classes are :

$ CT_{AP} $ = $ \frac{1}{3} (5 + 5 + 4, 6 + 7 + 3) $ = ($\frac{14}{3}, \frac{16}{3} $)

$ CT_{ORG} $ = $ \frac{1}{2} (5 + 6, 7 + 4) $ = ($\frac{11}{2}, \frac{11}{2} $)

Suppose, you got a new test data point : (3, 7) i.e $x_{new}$, and you want to classify this point. We can calculate the distance between new point and our centroids as:

$ ||x_{new} - CT_{AP} || $ = || (3,7) - ($\frac{14}{3}, \frac{16}{3} $) || = 2.357

$ ||x_{new} - CT_{ORG} || $ = || (3,7) - ($\frac{11}{2}, \frac{11}{2} $) || = 2.915

Here, the new data point is classified as 'Apple' as the new data point is closest to the centroid of data points that belong to class 'Apple'

In [ ]:

Menu

How Nearest Centroid Classifier works?

1.Training step :¶

2.Prediction step :¶