Closely following the launch of the A/B Testing service for iOS, Android, and Fire OS apps, we have just released an update addressing one of our most popular feature requests. You can now track up to ten goals in a single A/B test, which means you can see how your experiment affects up to ten metrics at once. This is especially powerful when the metrics aren’t entirely independent and it would be difficult to create A/B tests to isolate them from each other. Let me illustrate with an example.
Say you have a mobile game that generates revenue using a combination of in-app purchasing (IAP) and mobile ads. You know that player engagement is the key to monetization, so you decide to test a hunch that more challenging levels will keep players in the game longer.
You create an A/B test project for your app, adding an experiment that allows you to adjust the overall difficulty of each level. Since you can have up to five variations for each test (see A/B/n testing for more information), you decide to measure player engagement when the game is much harder, slightly harder, slightly easier, and much easier than normal. “Normal” will be a variation of its own, called the Control.
In this case, you create a test variable called difficultyMultiplier, which your code can access and use to modify its behavior for each user. For the control group (60% of players in this example), difficultyMultiplier is 1.00, indicating no change from the default difficulty. The other groups see a slightly different value for difficultyMultiplier, depending on how hard the game should be for those players.
To measure the effect of changing this variable, you define a view event and a conversion event, which your code records as they happen and reports to the A/B Testing service. For the purposes of this test, you consider it a view whenever a player starts a new game session. A conversion is registered if he/she plays for five minutes or more. The A/B Testing service tabulates these events by variation and reports on the conversion rate for each group of users.
Say you run the experiment and discover your hunch was right: harder levels are played longer, leading to an increase in the average amount of time players engage with your game. The logical next step would be to ratchet up game difficulty. But what if improved engagement isn’t the whole story? Changing the difficulty may affect other metrics you care about, but you can’t always tell based on a single type of conversion event. For example, how does this change the way people share their progress on Facebook, a major customer acquisition channel? How does it impact ad click-thru rates? Does it impact how users rate the game? Setting multiple goals can help you detect such unintended consequences and choose the variation that delivers balanced results.
Now that the latest version of the A/B Testing service allows a single view event to be associated with up to ten different conversion events (goals), you can measure and compare the impact of each variation along more than one axis. Each goal can be maximized or minimized independently. For example, here you are trying to maximize game sessions, in-app purchases, ad clicks, and Facebook shares while minimizing one-star reviews, all in the same experiment.
When generating reports, the A/B Testing service includes the results for all goals associated with an experiment, organized by variation. The service highlights the “best” variation with respect to each goal, so you can tell at a glance which one resulted in the most game sessions, for example (Variation C), or maximized shares on Facebook (Variation A).
When goals overlap or depend on one another, as they do here, there may be no single variation that definitively “wins” every goal. A report like the one above, however, can help you make an educated choice, weighing the trade-offs of each alternative. In this case, Variation B looks like a good candidate since it succeeded in minimizing one-star reviews and came close to winning several other goals as well. When you look at the big picture, Variation B appears to have the best performance overall.
The orange checkmarks indicate which results achieved statistical significance—that is, where there are enough measurements to be confident that the observed change is actually due to the test variation. More details are available for each individual goal, so you can drill down on the ad clicks, for example, associated with each variation:
With the addition of up to ten goals for a single experiment, the A/B Testing service expands its flexibility and becomes an even more powerful tool for refining your app and optimizing it based on customer behavior. For more information on A/B testing, multiple goals, and how you can incorporate them into your mobile app or game, check out the online documentation.