The Nearest centroid classifier (NCC) says that we should classify a data point to a class whose centroid is closest to this data point.

The algorithms follows:

Suppose $c_l$ represents the set of indices which belong to class l. And n = |$c_l$|

##### 1.Training step :¶

We compute the centroids(CTs) for each of the classes as:

$CT_l$ = $\frac{1}{n} \sum_{i \in c_l} x_i$

##### 2.Prediction step :¶

a. Given a new data point $x_{new}$, compute the distance between $x_{new}$ and each centroids as

distance : $|| x_{new} -$CT_l$||_2$ (Euclidean distance)

b. Assign the class to this new point which has minimum distance value.

Let us taken an example. We have to classify fruits into two classes : Apple and Orange, based on their height and width.

Our inputs (x) are :

x1=[5,6], x2=[5,7], x3=[4,3], x4=[5,7], x5=[6,4] and corrresponding labels (y) are

𝑦1='AP' 𝑦2='AP' 𝑦3='AP' 𝑦4='ORG' 𝑦5='ORG'

Here $x_i$ = [width, height] , 'AP' = 'Apple', 'ORG' = 'Orange'.

Now, centroids for two classes are :

$CT_{AP}$ = $\frac{1}{3} (5 + 5 + 4, 6 + 7 + 3)$ = ($\frac{14}{3}, \frac{16}{3}$)

$CT_{ORG}$ = $\frac{1}{2} (5 + 6, 7 + 4)$ = ($\frac{11}{2}, \frac{11}{2}$)

Suppose, you got a new test data point : (3, 7) i.e $x_{new}$, and you want to classify this point. We can calculate the distance between new point and our centroids as:

$||x_{new} - CT_{AP} ||$ = || (3,7) - ($\frac{14}{3}, \frac{16}{3}$) || = 2.357

$||x_{new} - CT_{ORG} ||$ = || (3,7) - ($\frac{11}{2}, \frac{11}{2}$) || = 2.915

Here, the new data point is classified as 'Apple' as the new data point is closest to the centroid of data points that belong to class 'Apple'

In [ ]: