Aaron Gallant
Aaron Gallant
Data Science Curriculum Lead
Unlike relative databases, you don't need to be a high-level expert to start exploring MongoDB. Since it’s a NoSQL database, you don't have to know SQL. You can work with MongoDB using JavaScript or any other major programming languages.
Chukwuemeka Okoli
ML engineer at Ledios
Former Petroleum engineer
Unlike relative databases, you don't need to be a high-level expert to start exploring MongoDB. Since it’s a NoSQL database, you don't have to know SQL. You can work with MongoDB using JavaScript or any other major programming languages.
Chukwuemeka Okoli
ML engineer at Ledios
Former Petroleum engineer
No items found.
No items found.

The idea behind A/B testing is to present different content to different user groups, collect the reactions and behavior of users, and use these results to build an improved product or marketing strategy. This is the method of comparing multiple versions of a page, button, form, landing page, and so on.

While we explain A/B testing in more detail in “A/B Testing: How to Arrive at the Best Product”, we figure you should get to see this method in action. Let’s walk through the A/B test below through the lens of a data analyst.

Situation when A/B testing is required

Meet data analyst, George. He works for a company that has created a neural network that draws art by a user description, that can be completed in a mobile app. The product was successfully released and users are excited! But George noticed that about 30% of potential customers do not complete the registration. To combat this, he needs to increase the number of active users, ultimately getting more people to finish the registration process.

So, George’s coworkers develop several possible solutions. Now, it’s time to test them out. 

Collect the data

To start A/B testing, George needs to collect user data. Specifically, what users often click on, where they stay longer, and so on. So he researches the audience and conducts what is called conversion research. This includes diagnosing a website, Google Analytics, qualitative research, and mouse tracking.

But George doesn’t do this alone. He collects all the data and the conversion research results using a special framework like ResearchXL. This is a framework for conversion optimization that performs each of the above tests. Generally, data analysts can use different tools that are used throughout the company.

Metric specifying

George has collected data, researched the situation, and now he and his colleagues will have to prioritize exactly what the company will test.

To do this, he needs to define a metric. This is an indicator that will show whether the test version of the website is more successful than the original.

There are a lot of metrics to consider, like:

  • Conversion rate
  • Orders and clicks number
  • Rate of users who returned to the website or who left the website at a certain step

Recall that George found a problem with the registration page. From this, he concluded that the most effective metric is the rate of registered users.  

Therefore, George can monitor the change in this metric to see if the new version of the signup page is better than the previous one. If the conversion rate increases, he’s on the right track.

Hypothesis formulation

A hypothesis is an assumption that’s either confirmed or disproved by an experiment.

The app George is working on has a registration page. He needs to determine what exactly discourages potential users. Fields, perhaps? Or text size? What if our data analyst put a new image on it that attracts attention? The formulated hypothesis will help make sense of it. 

Imagine you are driving on a new route to a new coffee shop when, suddenly, you come to a fork in the road. You have two choices: go straight ahead or turn left. If you go straight, will you arrive at the coffee shop? And if you don't, you’ll still make it, right? This is about how data scientists and data analysts formulate hypotheses for A/B testing.

In A/B testing there are two kinds of hypotheses: the “control” or A version and the “treatment” or B version. Both versions have corresponding hypotheses. The A version has the null hypothesis. It says that the B version is better than A or they don’t differ. And the B version has the alternative hypothesis. It says that the B version is different from the A one.

For example, the original or null hypothesis is that you will arrive at the coffee shop if you go forward. The null hypothesis is always positive. “If I go ahead, I will arrive at the coffee shop I want.” That is, it assumes that the results of A and B do not differ. The task is to disprove this hypothesis to solve the original problem.

The alternative hypothesis assumes that B is different from A. “If I go ahead, I won’t arrive at the coffee shop I want.”

But how about George’s app? How is he going to raise the number of registered users? The point is the current registration page has a banner and registration form. He decided to test a banner on the form. Because it’s more visible than fields, positioning, or text size.

First, he needs to find out if the changed image on it will increase the percentage of registrations. So a test hypothesis is “People will want to register in the app when we add a larger image”. And a null one is it won’t work.

Variations creation

After coming up with a hypothesis, George asked the developers to create a new version of a registration page, or a B one. It will reflect the changes he wants to test. Developers changed the image on the registration page. They made it bigger, funny, and used the most attractive colors. 

Next, our data analyst identified the control and experimental groups. To do this, he answered the question: “Which users do you want to test: all users or just users from a certain country?”

The answer to these questions depends on the initial goal. Usually, data analysts select users by type, the platform they use, geography, and other criteria.

George wants to increase the number of registered users in the United States. So it does not make sense to include users from France in A/B testing. But if George had decided to roll it out globally, then he would. 

The next step is to determine the minimum number of users. To do this, George used a special online calculator. With the baseline conversion rate at 30%, it would be great if this metric increased by at least 10%. George used a calculator to see what sample size was needed. 

The calculator showed there could be at least 337 users in each of the two groups.

Time matters as well. Usually, the total sample size is divided by the daily traffic. The result is the number of days it takes to run the test. As a rule, it is at least one week. The point is “day-of-week effects”. User behavior varies throughout the week. Saturdays are different from Tuesdays. Therefore, if you take a sample in less than a week, it won’t be representative.

George allowed two weeks for testing.

Test running

It’s time for the experiment. The company allows two groups of users to sign up for the app on different pages, on null and test. A/B testing was performed on closed test pages.

To conduct the testing effectively, George followed the ground rules. At the beginning of the experiment, he checked if everything really works.

And he didn’t rush to finish the experiment before it was over, because he knows the A/B testing of different registration pages should take the full two weeks. But at the end of the first, some data analysts see people are more likely to sign up for the app and they stopped the testing. 

Even if the result seems obvious, finishing the test at this point is a big mistake. The reason is simple — until you have collected the appropriate amount of data, you can’t be sure your results are valid.

Analysis of results

This is the best part! George finally got the data and used Python and statistics to understand what the data said.

It seems George has conducted A/B testing well! The data showed that the B version won. A/B testing confirmed that there is a difference between the versions, leading him to reject the null hypothesis and accept the B version is superior. 

So George was right — the B version is more productive than the A version. The big image attracted the user's attention, and the number of registrations increased.

But it could have gone the other way. We could fail to reject the null hypothesis, so the A version will stay by default. It means there is no difference between versions. And George had to figure out why the new version didn't work as expected.

How to master A/B testing

To become an A/B testing expert, you have to practice through trial and error because every experiment is unique. And in each of them, there is something to learn.

Data scientists and data analysts do not conduct A/B testing the way marketers do. Instead, they use Python. It improves the quality and reproducibility of the experiment.

You can get all this at Practicum’s Data Science Bootcamp! In the nine-month online program, you will learn essential IT and data skills, like how to use Python for data processing and A/B testing.

Your path to the programming world will be easier with an experienced tutor, code reviewers, and tech support, who are always there to assist you with any issues and celebrate your triumphs. Additionally, our Career Team will give you the tools you need to find the role of your dreams. But if you don’t get a job within six months of graduating, you will get 100% of your tuition back.

Last but not least

As you can see, A/B testing is an important method in a data-driven world where businesses need to decide by facts and numbers. And whether your A/B test was successful or not, treat each experiment as a learning opportunity. Use what you have learned to formulate a hypothesis in the next test – all your previous testing experience will help you to meet new challenges.

No items found.
No items found.

Ready to hustle?

Jumpstart your new tech career by becoming a Practicum student.
Apply now