About the Automatic Speech Recognition (ASR) Evaluation tool

The Automatic Speech Recognition (ASR) Evaluation tool allows you to batch test audio files to measure the ASR accuracy of the skills that you've developed. With the ASR Evaluation tool, you can batch test your test sample audio utterances against ASR models and compare expected transcriptions with the actual transcriptions. The tool generates an evaluation report with accuracy metrics and pass/fail result for each test utterance, which you can use to resolve accuracy issues.

To learn more about ASR, see What Is Automatic Speech Recognition?.

This page provides an introduction to using the ASR Evaluation tool through the Alexa developer console and an introduction to the available ASR APIs, if you prefer to run ASR Evaluations programmatically.

Benefits of ASR Evaluation

If the users of your skill aren't getting their expected responses from Alexa, ASR Evaluation can help you troubleshoot speech recognition issues and improve skill performance. ASR Evaluation can help pinpoint commonly misrecognized words for your skill. You can then potentially improve recognition accuracy for those words by mapping them back to the skill model as sample utterances and slot values.

For example, if you have a coffee-related skill where you expect users to ask Alexa to "order a mocha," ASR evaluation results might show you that sometimes Alexa misunderstands the word "mocha" as "milk." To mitigate this issue, you can map an utterance directly to an Alexa intent to help improve Alexa's understanding within your skill.

Overview of the ASR Evaluation process

Use the following process to run the ASR Evaluation tool and improve the speech recognition accuracy and interaction model for your skill:

  1. Create an annotation set of recorded utterances to use for testing.
  2. Run the ASR evaluation tool.
  3. Use the results from an ASR evaluation to improve your skill's accuracy and interaction model.

ASR APIs

If you prefer to create your annotation sets and run the ASR Evaluation programmatically instead of using the developer console, Amazon also provides a set of APIs for these tasks.

Available ASR APIs

The following ASR APIs are available:

API call flow

The following process describes the expected order to call your APIs to run an ASR evaluation:

  1. Create your audio catalog:
    1. Call Create a catalog to create your new catalog.
    2. Call Associate a catalog to a skill to associate your new catalog with the skill that you're evaluating.
    3. Create the upload for your catalog by calling Catalog content upload.
    4. Upload your .zip file of audio files to the S3 URL returned by Catalog content upload. Audio files must be .mp3, .wav, .aiff, or .ogg format.
    5. After the upload completes, call Complete Upload.
    6. Call Get Upload to get the ingestion status of your upload.
  2. Create your annotation set:
    1. Call the Create annotation set API to create your empty annoation set.
    2. Call the Update annotation set annotations API to add your uploaded audio utterances to the annotation set.
  3. Run the ASR evaluation by calling the Post ASR evaluation API.
  4. Return your ASR evaluation status and results by calling the Get ASR evaluation status API and Get ASR evaluation results API.

ASR API error codes

For a reference of errors that can apply to all ASR APIs, see Automatic Speech Recognition (ASR) API Error Reference.