About A/B Testing


A/B testing allows you to measure and compare real-time feedback from your users by simultaneously deploying two versions of the same skill, so you can experiment with certain variables and decide which version receives a better response. This process helps you make data driven launch decisions and new feature releases.

For example, you can use preconfigured test metrics to identify if a new update causes issues with a skill or you can try to increase customer engagement by testing new features.

What can I do with A/B tests?

You decide what types of tests you want to run on your skill, depending on the hypothesis you're evaluating. However, there are some limitations to the types of skill attributes you can test.

  • You can run A/B tests on the following skill attributes: Anything served by your skill AWS Lambda or skill code, including APL-A data.
  • You can't run A/B tests on the following skill attributes: New locale launches, invocation name changes, permission changes, account linking changes, ISP product pricing changes (such as free trial length), in-skill purchase prompts, previewed content, interaction model changes, and skill manifest changes.

The following table describes some sample tests that you could run.

Test category Example test

Endpoint

  • Split your skill code into two different versions with conditional logic.

How A/B tests work

When a customer invokes your skill, they randomly receive one of two versions of your skill, either a control version or a treatment version.

  • Control version (C) – The current experience of your live skill, before you started your test.
  • Treatment version (T1) – The new experience of your skill. This is the version of the skill you're testing, which contains your updated code changes.

To make sure you receive an accurate comparison, A/B tests are conducted blind, meaning users aren't aware whether that they receive the control version or the treatment version. At the end of the test, you can choose whether to publish the treatment version available to all users or revert back to the control version.

Types of A/B tests you can run

  • Endpoint-based test – You use a single version of a live skill to run your A/B test. You define your control and treatment experiences by adding conditional statements to the skill code of your live skill. These statements branch your skill into your C and T1 versions.

Eligibility criteria

To run an A/B test, your skill must meet the following eligibility criteria.

Endpoint-based A/B test

To run an endpoint-based A/B test, your skill must meet the following eligibility criteria.

  • Your skill must be live.
  • Your skill must use a custom voice interaction model (custom skill).
  • Your skill must have a sufficient number of monthly users.

How to split your test into C and T1

Use the following instructions to separate the T1 and C behavior of your skill.

Endpoint-based test

When you run an endpoint-based test, you must branch your skill code into your C and T1 versions. The following code example illustrates how you create these branches.

For step-by-step details on how to create an endpoint-based test, see Set up an Endpoint-based A/B test.

NodeJS example

const test = handlerInput.requestEnvelope.context.Experimentation.activeExperiments[0];
      if (test) {

        const treatment = test.treatmentId;

        if (treatment == interfaces.alexa.experimentation.TreatmentId.T1) {
          return handlerInput.responseBuilder.speak("treatment response")
                .getResponse();
        } else {
          return handlerInput.responseBuilder.speak("control response")
                .getResponse();
        }
      } else  {
        return handlerInput.responseBuilder.speak("not exposed to treatment")
              .getResponse();
      }

Lifecycle of an A/B test

As you run your A/B test, it operates in one of the following primary states: created, enabled, running, stopped, and deleted. These states dictate what actions your test can perform at a given moment time.

There are also the following secondary states, which are used to transition your test between the primary states, including: enabling, stopping, and failed.

A/B testing state diagram

You must activate the following states to complete A/B test test: create test, start test, and stop test.

The following workflow diagram illustrates this lifecycle.

Create A/B test (required)
Delete A/B test (optional)
Update A/B test (optional)
Enable A/B test (optional)
Start A/B test (required)
Stop A/B test (required)

Transitioning to states

You use the ASK CLI or SMAPI APIs to transition your A/B test between the following states.

  • Create A/B test API – Targets the CREATED state.
  • Delete A/B test API – Targets the DELETED state.
  • Manage A/B test API – Targets the ENABLED, STOPPED, RUNNING states.

For more details about using each individual API with the corresponding states, see A/B Testing SMAPI APIs.

State details

The following tables provides specific implementation details about each state.

Primary states

State Value Description Next steps

Create test

CREATED

Creates your test with the settings you provide.

Transition to this state when you're designing your test.

  • This state doesn't start or enable any test settings for your users.
  • You can update test settings as needed.

Stay in this state for as long as you want to adjust your test settings.

  • To test your test, transition to the ENABLED state.
  • To delete your test, transition to the DELETED state.
  • To start your test without QA testing, transition to the RUNNING state.

Delete test

DELETED

Deletes your test.

Transition to this state if you want to completely delete your test. You can delete a test if the test has not been enabled yet.

Wait until ASK deletes your test.

Enable test

ENABLED

Deploys your test settings, but doesn't start your test.

Transition to this state when you want to QA your settings before you push them live to users.

  • Only skills that you own, with a customer treatment override set to T1 , receive your T1 experience.
  • Data collection doesn't start. Your testing doesn't impact the results of your test.

Stay in this state for as long as you want to QA your test.

  • To start your test and push it live to users, transition to the RUNNING state.
  • To abandon your test, transition to the STOPPED state.

Start test

RUNNING

Starts your test.

Transition to this state when you want your users to begin receiving your T experience.

  • Your test is shown to the percentage of skill users you defined in your exposurePercentage value. Customers are randomly assigned the T1 or C experience.
  • Data collection begins.

Stay in this state for as long as you want to run your test.

  • To stop your test and analyze your results, transition to the STOPPED state.

Stop test

STOPPED

Ends your test.

Transition to this state when you want to stop your test to analyze your results.

  • All users receive your C experience.
  • Metrics are no longer collected.
  • You can't restart your test after you transition to this state.

Stay in this state as long as you want to analyze your test metrics.

Secondary states

State Value Description Next steps

Enabling test

ENABLING

Transitory state between CREATED and ENABLED.

This state is temporary and you can't perform any actions on your test during it.

Your test also transitions to this state automatically if you choose to bypass the ENABLED state, because a test can't start without enabling it first.

Wait until your test transitions to ENABLED.

Stopping test

STOPPING

Transitory state between RUNNING and STOPPED.

This state is temporary and you can't perform any actions during it.

Wait until your test transitions to STOPPED.

Failed test

FAILED

Your test didn't enable or start.

You can't transition to this state. Your test transitions to this state if your test configurations failed to deploy when trying to enable or start the test.

Check your test configurations and try again.


Was this page helpful?

Last updated: Oct 13, 2023