RION ANGELES
RION ANGELES
  • BLOG
  • ABOUT ME
  • CONTACT ME
  • BLOG
  • ABOUT ME
  • CONTACT ME

Not A Distant Cousin - K Nearest Neighbor

12/7/2015

0 Comments

 
Picture
Figure 1
I missed the programmatic workflow of Python, so I decided to switch back to Python for this Algorithm. Also, please note that the K Nearest Neighbor can be found and implemented quickly in Scikit-learn, but I wanted to code it from scratch. I discovered the K Nearest Neighbor when I was exploring other simple machine algorithms. Initially, I mistook this algorithm to be a close relative of the K means algorithm. 

The main difference lies is that the K Nearest Neighbor (KNN) is a supervised classification whereas the K means algorithm is unsupervised with a hint of grouping by clustering.
​
How's it work? The KNN functions by taking a number (k) of points in proximity, surrounding an unclassified point and relies on the classification by selecting the majority winner.



Read More
0 Comments

All The Pretty Clusters: K-Means in RStudio

4/15/2015

1 Comment

 
Picture
A colleague of mine introduced me to a workflow tool commonly used my statisticians, analysts, and data scientists: R-Studio. I had heard rumors of folks using R-Statistic, but I had never bothered with it because I was too busy crudely coding in Python. Admittedly, I thought Python had everything that R-Statistic could offer and I was even arrogant enough to believe Python was superior. Well, they're both fantastic tools for data science with their respective strengths and weaknesses. But I digress,  I'll get to those differences in a different post. I grabbed one of the most ubiquitously used datasets from the UCI Machine Learning Repository and decided to try out a clustering algorithm. Read on for more!

Read More
1 Comment

Decisions, Decisions, Decisions, Tree?

2/15/2015

0 Comments

 
Picture
I decided to learn Python and was recently admitted into Northwestern's Predictive Analytics Graduate program. So I figured, meh! Why not learn both at the same time? So my first foray into predictive analytics had to do with a supervised classification model called a decision tree. What is a decision tree? In essence it's a predictive algorithm that just so happens to be (when drawn out / visualized), well... a tree. I first encountered decision trees in the book published by O'Reilly called, Data Science for Business. I used most of Joe McCarthy's primer as the guide to my programming exercises and modified it a bit to better suit my nuances in programming style. It was the first predictive model they described and one of the more interesting ones in my opinion because of its relative simplicity. They cited using a data set of mushrooms samples, courtesy of UCI. The aim of the tree was to predict whether any additional samples based on its attributes, was either poisonous or edible. Which leads me to the question; how did they manage to find out whether the samples from the original data set was poisonous or edible? The sample data set can be found here: Mushrooms. 


Read More
0 Comments

    Rion Angeles

    Attention to detail? Nah, attention to the whole picture.

    View my profile on LinkedIn

    Archives

    April 2017
    January 2016
    December 2015
    April 2015
    March 2015
    February 2015
    November 2014

    RSS Feed

    Categories

    All
    Business Intelligence
    Clustering
    Data Science
    Etsy
    Machine Learning
    Manufacturing
    Marketing
    Optimization
    Predictive Analytics
    Unsupervised

Proudly powered by Weebly