RION ANGELES
RION ANGELES
  • BLOG
  • ABOUT ME
  • CONTACT ME
  • BLOG
  • ABOUT ME
  • CONTACT ME

Not A Distant Cousin - K Nearest Neighbor

12/7/2015

0 Comments

 
Picture
Figure 1
I missed the programmatic workflow of Python, so I decided to switch back to Python for this Algorithm. Also, please note that the K Nearest Neighbor can be found and implemented quickly in Scikit-learn, but I wanted to code it from scratch. I discovered the K Nearest Neighbor when I was exploring other simple machine algorithms. Initially, I mistook this algorithm to be a close relative of the K means algorithm. 

The main difference lies is that the K Nearest Neighbor (KNN) is a supervised classification whereas the K means algorithm is unsupervised with a hint of grouping by clustering.
​
How's it work? The KNN functions by taking a number (k) of points in proximity, surrounding an unclassified point and relies on the classification by selecting the majority winner.


For example: In Figure 1, assuming we use a k=3, the algorithm would select the 3 closest points to the green (unclassified) circle. The KNN would then classify the green point based on the winning majority of classifications; in this case the green point would be classified as a red triangle. I programmed a fairly rudimentary version of the algorithm in Python and ran it on a few sample random points with a small data set. 
knn.py

    

The knn_classify function works in the way i described earlier; it sorts x, y coordinates based on distance to the point to classify and selects the number of neighbors specified as a parameter. It then allows each point to cast their "vote" as to what the new point should be classified as. 

main

    
The dataset is a list of x,y coordinates with a programming language as its classification. Running the main block with the dataset yields the following classification for the sample point. 
Output

    
The Limitations: While my KNN algorithm was fairly simple to implement from a code standpoint, the sample point used fairly unexciting 2 dimension x,y coordinate. Unfortunately, the KNN algorithm is susceptible to the Curse of Dimensionality (ooh spooky), a condition in which a data space grows exponentially for every dimension added. This is sub-optimal because there will inherently be less redundancy for the algorithm to learn, and the increased memory/computational requirements for the algorithm. 

Final Thoughts: The KNN in my opinion was simple to understand, but I found the practical applications of the KNN is quite fascinating. KNN is used often is visual/image recognition. The Curse of Dimensionality can be alleviated by performing dimensional reduction on the possible data spaces. An example of this would be handwriting recognition. One could argue that any number or letter can be written in a finite number of ways. With that assumption, you can limit the space that the algorithm must function in.  

0 Comments



Leave a Reply.

    Rion Angeles

    Attention to detail? Nah, attention to the whole picture.

    View my profile on LinkedIn

    Archives

    April 2017
    January 2016
    December 2015
    April 2015
    March 2015
    February 2015
    November 2014

    RSS Feed

    Categories

    All
    Business Intelligence
    Clustering
    Data Science
    Etsy
    Machine Learning
    Manufacturing
    Marketing
    Optimization
    Predictive Analytics
    Unsupervised

Proudly powered by Weebly