RION ANGELES
RION ANGELES
  • BLOG
  • ABOUT ME
  • CONTACT ME
  • BLOG
  • ABOUT ME
  • CONTACT ME

A/B Testing - The Closest You'll Ever Come to Being an Actuary

11/23/2014

3 Comments

 
I was recently tasked with acquiring a certain number sense in regards to A/B testing. After Googling around a bit, I noticed that much of the information on A/B Testing were fairly rudimentary introductions that simply skimmed the concepts of said testing. There was a myriad of products and services that offered A/B Testing, but these got straight to the results of the sample tests. What I wanted was the nitty, gritty, gory, superfluous details of the math. Understanding the Wikipedia definition is one thing, actually being able to wield the math was another. What I thought would be a simple 30 minute Wikipedia read, quickly spiraled into a furious mad dash to derive the underlying principles of what is essentially, everything I forgot in AP Statistics back in high school. 15 hours and a bottle of 2 buck chuck later, I had an elementary grasp of the glorified math behind the aptly named, A/B Testing. 
What Is It? Well the best way I can describe it is as follows: It's a process used within marketing and business intelligence to measure the effectiveness of certain changes to their product (in most cases, it's their website). Unfortunately for me, the folks that developed the industry standard around A/B testing managed to spin up their own jargon and parlance surrounding what could have, in my opinion, kept the original Statistics based terminology. Needless to say, much of the time I spent practicing A/B testing was spent researching what the Marketing and BI terminology meant. I'll try to elaborate on this later. In its basic statistical form, an A/B Test is a hypothesized, single or two-tail, multi-variate, (hopefully) randomized test, that compares the difference between the sampled multi-variates. In hindsight, I can see why they decided to shorten it to A/B Testing.

In layman's terms, the individuals that manage a website may want to optimize the product and increase the occurrence of a certain action that end-users perform, such as "Add to Cart" or "Sign Up." To do this, they may change certain factors of the of the end-user action, such as changing the color of the "Add to Cart" button, or increasing the size of the "Sign Up" button. The term they use for this "change" are referred to as, "recipes, treatments, or variations," while the rate of end-user action is called, a "conversion." In the world of manufacturing, they refer to these as a key performance index, or KPI for short. KPIs can vary greatly, and are composed of measures and dimensions. For the rest of this post, I'll only used the word conversion.

That's enough for now, for a basic definition of A/B Testing, now it's time for a fake yet somewhat real-world example!

Let's assume a certain company wants to increase the conversion rate for a user pressing the "Sign Up" button. They want to do this by speculating that changing the wording on the button will increase conversion rates. They decide to change the wording to an obnoxiously large and capitalized, "SIGN UP NOW." Great, now that's all settled, let's set up the data.
  • We have to assume a Null and Alternative Hypothesis to quantify our confidence. I'm going to use the following hypothesis.'
Picture


  • Then we use the following sample data retrieved from online tools to run the test on a sample population below. Turns out the change we made caused a 48% increase in conversion rates. While some may attempt to put a causal relationship on this metric alone, this does not prove that the change was the cause of the increase in conversion rate.
Picture


  • Alright, now it's time to Math! Much math! The data we've calculated so far is pretty useless unless we're able to apply a statistical test. It turns out that we can derive the Standard Error aka the Standard Deviation of the sampled population through the use of the formula below. We'll have to do this to both the control sample population and the treatment sample population.
Picture
Standard Error / Standard Deviation of the sample population
Picture
SE of Control Sample
Picture
SE of Treatment Sample


  • So after calculating the Standard Error for both populations we now have enough information to apply a statistical distribution model. Please note that some may refer to the Standard Error as a Margin of Error, or Error Interval, or even Confidence Interval. Let's talk about our data a bit. We have 2 sample populations, with one being our control and one being our treatment sample. Because we know our sigma (SE or Standard Deviation) we can use a single tail normal distribution and utilize an old fashioned Z table (yes, the kind you find in the back of your old Statistics textbook). We can use the Conversion rates we calculated earlier as the mean of the normal distributions of both populations.  Also, we're particularly interested in the difference in results between these 2 populations. You catch that? Yep, we're interested in the difference, which means some long-dead statistician calculated that the difference of 2 normally distributed samples is also, you guessed it, a normal distribution of the difference of the 2 sample populations. I'll get straight to the point, you can use a formula. Take a gander at the beauty below.
Picture
Picture
  • Now that we have our z score we can use a z table for a single tail normal distribution to explain our results. The industry standard is to use a 95% confidence level, so looking at the z table in an old statistics textbook states that a 95% confidence level for a single tail test to the right is a z score of 1.6 standard deviations. That being said, our derived z score is 3.02 standard deviations from the mean. This is well above the needed 1.6 and puts us at a confidence level above 95%. According to the z table, a z score of 3.02 gives us a 99.89% confidence level.
Picture


So what's it all mean? Well, with a z score of 3.02, that effectively puts our p score at .0011. This is substantially lower than our error level of .05 which means that we can safely reject our null hypothesis, and conclude that the changes we made to the button did in fact cause a rise in our conversion rates. 
3 Comments
Travis Deyle link
11/24/2014 02:51:43 am

You should really check out some of Evan Miller's writings on the subject. Eg. he has a good statistical significance calculator that tells you how many events you need [1], and a some good (mathy) descriptions about the pitfalls of repeated significance testing [2].

[1] http://www.evanmiller.org/ab-testing/sample-size.html

[2] http://www.evanmiller.org/how-not-to-run-an-ab-test.html

You might also look into Bayesian significance methods and multi-armed bandit algorithms, which can both be interesting alternatives to 1-1 A/B testing.

<Prepare to lose a few days reading! Heh.>

Reply
Best Maths Tutor in Sydney link
4/18/2016 03:40:29 am

The learning lab is providing the best Maths and English tutor in Sydney and if you are looking for maths tutor for your child contact us now!

Reply
Principle of Marketing link
12/12/2017 10:01:13 am

Very Informative and useful... Keep it up the great work.

Reply



Leave a Reply.

    Rion Angeles

    Attention to detail? Nah, attention to the whole picture.

    View my profile on LinkedIn

    Archives

    April 2017
    January 2016
    December 2015
    April 2015
    March 2015
    February 2015
    November 2014

    RSS Feed

    Categories

    All
    Business Intelligence
    Clustering
    Data Science
    Etsy
    Machine Learning
    Manufacturing
    Marketing
    Optimization
    Predictive Analytics
    Unsupervised

Proudly powered by Weebly