In the basic nearest neighbor classifier, the only thing that matters is the class label of the nearest neighbor. But the nearest neighbor may sometimes be noisy or otherwise misleading. Therefore, it may be better to also consider the other nearby data points in addition to the nearest neighbor.
This idea leads us to the so called k-nearest neighbor method, where we consider all the k nearest neighbors. If k=3, for example, we'd take the three nearest points and choose the class label based on the majority class among them.
The program below uses the library sklearn to create random data. Our input variable X has two features (compare to, say, cabin size and cabin price) and our target variable y is binary: it is either 0 or 1 (again think, for example, "is the cabin awesome or not.")
Complete the following program so that it finds the three nearest data points (k=3) for each of the test data points and classifies them based on the majority class among the neighbors. Currently it generates the random data, splits it into training and test sets, calculates the distances between each test set items and the training set items, but it fails to classify the test set items according to the correct class, setting them all to belong to class 0. Instead of looking at just the nearest neighbor's class, it should use three neighbors and pick the majority class (the most common) class among the three neighbours, and use that as the class for the test item.