In this post, we will be continuing from our previous post:
Before starting with the implementation, let's discuss few important points in cross validation.
- Using Cross validation (CV), we splits our dataset into k folds (k generally setup by developer)
- Once you created k folds, you use each of the folds as test set during run and all remaining folds as train set.
- With cross validation, one can assess the average model performance (this post) or also for the hyperparameters selection (for example : selecting optimal neighbors size(k) in kNN) or selecting good feature combinations from given data features.
import math from collections import Counter import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline # making results reproducible np.random.seed(42)
df = pd.read_csv( 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None, sep=',') df.columns = ['CLASS', 'ALCOHOL_LEVEL', 'MALIC_ACID', 'ASH', 'ALCALINITY','MAGNESIUM', 'PHENOLS', 'FLAVANOIDS', 'NON_FLAVANOID_PHENOL', 'PROANTHOCYANINS', 'COLOR_INTENSITY', 'HUE', 'OD280/OD315_DILUTED','PROLINE'] # Let us use only two features : 'ALCOHOL_LEVEL', 'MALIC_ACID' for this problem df = df[['CLASS', 'ALCOHOL_LEVEL', 'MALIC_ACID']] df.head()