A/B Testing Metric and Configuration Attributes
Configuration attributes and metrics are the building blocks to an A/B test. They help you define the purpose of your test and track the outcome of your experiments. Using the correct configuration attributes and metrics is critical to running a useful test.
Configuration attributes
Configuration Attributes | Definition |
---|---|
Control |
The group of customers in your A/B test that continue to receive your current skill experience. You use the metrics captured from this group to establish a baseline for your test. |
Guardrail Metrics |
Metrics that you set up to track and detect unexpected regressions caused by your new treatment experience. For more details about the metrics you can use as guardrails, see Metrics. To track your guardrail metrics, view the metrics on the Analytics tab. The A/B test doesn't send alerts based on changes in guardrail metrics. |
Hypothesis |
An assumption you make before your A/B test starts, with the goal of predicting or defining the outcome of your test. You only use this field to document the purpose for their A/B Test and it doesn't impact the test outcome. |
Key Metrics |
Metrics that you set up to track and detect expected changes caused by your new treatment experience. These metrics should help you determine if your A/B test benefits your skill. For more details about the metrics you can use as a key metric see Metrics. |
Traffic Exposure |
The number of customers that have enabled your skill and can participate in your A/B test. For example, if you have 100 total customers and you set your Traffic Exposure to 40 percent, you're including 40 customers in your test. In this case, your test includes 20 customers in your C group and 20 customers in your T1 group. The remaining 60 customers aren't included in the test and receive the default behavior equivalent to C, however, they don't contribute to the test metrics. |
Treatment |
The group of customers that receive your new skill experience when your test is running. |
P-Value |
The probability of seeing a particular result (or more extreme) from zero, assuming that the null hypothesis is TRUE. |
User Count |
The number of users included in the skill version you're testing. |
Percent Diff |
The relative percent difference between the mean of T1 group and the C group. |
Confidence Interval |
A way of presenting the uncertainty associated with a given measurement of a parameter of interest. |
Metrics
You can designate any of the following metrics as either a key metric or a guardrail metric.
You should select one to three key metrics which track changes in your customer behavior, as they relate to your hypothesis. For example, if your hypothesis states that you might increase customer subscriptions by changing the location of your ISP upsell messaging, than you might select the following as key metrics: ISP : Offer Accept Rate and ISP : OPS.
For an example of how to use these metrics in a test, see Set up an Endpoint-based A/B test.
Metric Name | Description |
---|---|
Customer Friction |
A metric calculated from various customer interaction patterns and other contextual signals to predict if a customer perceived friction or not. |
ISP : OPS |
The amount of revenue generated from ISP sales. |
ISP : Sales |
The number of sales (quantity) generated from ISP offers. |
ISP : Offer Accept Rate |
The total number of offers accepted divided by the total number of offers delivered. Note that accepted offers are counted prior to the payment being complete and successful. |
ISP : Offer to Purchase Conversion |
The total number of ISP purchases completed divided by the total number of offers delivered. |
Skill Next Day Retention |
How often a customer uses your skill in a day or a set of consecutive days. |
Skill Utterances |
Tracks customer sessions with skills. This metric maps to one dialog per session with a skill, regardless of the number of interactions (multi turns) within a session. |
Skill Active Days |
The active number of days a customer is using your skill. This number is calculated from your skill's Dialog data. An active day is counted if a customer has at least one dialog on that day. |
Related topics
- About A/B Testing
- Set up an Endpoint-based A/B test
- Configuration Attributes and Metrics
- Troubleshoot A/B Tests
Last updated: Oct 13, 2023