The Nearest centroid classifier (NCC) says that we should classify a data point to a class whose centroid is closest to this data point.
The algorithms follows:
Suppose cl represents the set of indices which belong to class l
. And n = |cl|
1.Training step :¶
We compute the centroids(CTs) for each of the classes as:
CTl = 1n∑i∈clxi
2.Prediction step :¶
a. Given a new data point xnew, compute the distance between xnew and each centroids as
distance
: ||xnew−CT_l||2 (Euclidean distance)
b. Assign the class to this new point which has minimum distance
value.
Let us taken an example. We have to classify fruits into two classes : Apple and Orange, based on their height and width.
Our inputs (x) are :
x1=[5,6], x2=[5,7], x3=[4,3], x4=[5,7], x5=[6,4]
and corrresponding labels (y) are
𝑦1='AP' 𝑦2='AP' 𝑦3='AP' 𝑦4='ORG' 𝑦5='ORG'
Here xi = [width, height] , 'AP' = 'Apple', 'ORG' = 'Orange'.
Now, centroids for two classes are :
CTAP = 13(5+5+4,6+7+3) = (143,163)
CTORG = 12(5+6,7+4) = (112,112)
Suppose, you got a new test data point : (3, 7) i.e xnew, and you want to classify this point. We can calculate the distance between new point and our centroids as:
||xnew−CTAP|| = || (3,7) - (143,163) || = 2.357
||xnew−CTORG|| = || (3,7) - (112,112) || = 2.915
Here, the new data point is classified as 'Apple' as the new data point is closest to the centroid of data points that belong to class 'Apple'