RION ANGELES
RION ANGELES
  • BLOG
  • ABOUT ME
  • CONTACT ME
  • BLOG
  • ABOUT ME
  • CONTACT ME

Not A Distant Cousin - K Nearest Neighbor

12/7/2015

0 Comments

 
Picture
Figure 1
I missed the programmatic workflow of Python, so I decided to switch back to Python for this Algorithm. Also, please note that the K Nearest Neighbor can be found and implemented quickly in Scikit-learn, but I wanted to code it from scratch. I discovered the K Nearest Neighbor when I was exploring other simple machine algorithms. Initially, I mistook this algorithm to be a close relative of the K means algorithm. 

The main difference lies is that the K Nearest Neighbor (KNN) is a supervised classification whereas the K means algorithm is unsupervised with a hint of grouping by clustering.
​
How's it work? The KNN functions by taking a number (k) of points in proximity, surrounding an unclassified point and relies on the classification by selecting the majority winner.



Read More
0 Comments

A Very OSEMN Data Science / Data Analytic Workflow

3/15/2015

0 Comments

 
While preparing for my matriculation at Northwestern, I decided to get my hands dirty in some more data analytics. Upon delving deeper into the myriad of techniques, I soon stumbled upon a not-yet industry standard of Data Science/Analytic workflow.
Picture
Such workflow, much OSEMN, very structure.
My first encounter with the OSEMN (pronounced, "Awe-some") workflow was during a data science research stint. I was bothered by the lack of a standardized and widely accepted work-flow pattern to organize the process of solving problems with Data Science/Analytics. Dataists, through a post by Hilary Mason describes the first iteration of the now coined, OSEMN workflow. 

What is it? Well it's a fairly straightforward set of steps that a Data Scientist/Analyst would supposedly perform to solve the ubiquitous problems in their lives. 

Note: Typing out Scientist/Analyst is proving to be quite tedious. From here on out, I'm penning my own term. This term will be known as, "Scanalyst." Ostensibly because the duties of Data Scientists seem to largely overlap with Data Analysts, though in truth, it's out of laziness. If you wanna read more on the difference between Data Scientists vs Data Analysts, check over here.

OBTAIN - Well what's a scanalyst to do without data. The first step is to retrieve usable data. Typically, this will already be pre-determined. Data can be pulled asynchronously, such as a Python script that periodically pulls data from an online resource, or synchronously in the case of a simple SQL query targeted against a database.

Read More
0 Comments

Decisions, Decisions, Decisions, Tree?

2/15/2015

0 Comments

 
Picture
I decided to learn Python and was recently admitted into Northwestern's Predictive Analytics Graduate program. So I figured, meh! Why not learn both at the same time? So my first foray into predictive analytics had to do with a supervised classification model called a decision tree. What is a decision tree? In essence it's a predictive algorithm that just so happens to be (when drawn out / visualized), well... a tree. I first encountered decision trees in the book published by O'Reilly called, Data Science for Business. I used most of Joe McCarthy's primer as the guide to my programming exercises and modified it a bit to better suit my nuances in programming style. It was the first predictive model they described and one of the more interesting ones in my opinion because of its relative simplicity. They cited using a data set of mushrooms samples, courtesy of UCI. The aim of the tree was to predict whether any additional samples based on its attributes, was either poisonous or edible. Which leads me to the question; how did they manage to find out whether the samples from the original data set was poisonous or edible? The sample data set can be found here: Mushrooms. 


Read More
0 Comments

    Rion Angeles

    Attention to detail? Nah, attention to the whole picture.

    View my profile on LinkedIn

    Archives

    April 2017
    January 2016
    December 2015
    April 2015
    March 2015
    February 2015
    November 2014

    RSS Feed

    Categories

    All
    Business Intelligence
    Clustering
    Data Science
    Etsy
    Machine Learning
    Manufacturing
    Marketing
    Optimization
    Predictive Analytics
    Unsupervised

Proudly powered by Weebly