How to write a homemade K Nearest Neighbors model

Jessica Kimbril
3 min readAug 31, 2020

--

I am currently in my first half of computer science and for this project, I chose to write a data science algorithm. I got to pick from a list and I chose a knn algorithm. Why? Because it is so easy to build and implement. Working on this project I have a much better understanding of how knn models work.

This article is geared toward readers who already have a basic understanding of python and predictive models.

So to start out, we need to build a rough draft of where we are going with our algorithm.

So the first thing we need to do is create a function that can find the euclidean distance between two vectors. According to Wikipedia, the euclidean distance is defined as, “In mathematics, the Euclidean distance or Euclidean metric is the “ordinary” straight-line distance between two points in Euclidean space. With this distance, Euclidean space becomes a metric space.”

Now that we have our function made, we can start working on building our class.

So in the init function, we are initializing k=3 which means the number of nearest neighbors we are setting to three. After that, we are going to create our fit function, which will be used to fit our data. We can then start on the predict function. We are setting knn_predictions to i. So we are saying, for i in P, run predictions on i. This is where our little helper function comes in. So we are finding the Euclidean distance and then sorting by indices. It is going to return the first item. Then we are going to set our y_train, and then finish by using a counter. What this is doing is finding the single most common item.

So how does this compare to a KNN Classifier from Sklearn?

It is pretty close in its accuracy score, if not the same. I used to iris dataset as it seems to be relatively standard.

Here is the code for the sklearn model (after the data has been processed by train, test, split).

Here is the code for using my knn model using the same processed data.

Here are the predictions for both models. I kept running the code and it keeps outputting the same predictions, so I am happy with the way this turned out.

--

--