Create an Annotation Set for Automatic Speech Recognition (ASR)

The Automatic Speech Recognition (ASR) Evaluation tool allows you to test audio files to measure the ASR accuracy of your skills. Before you run an Automatic Speech Recognition (ASR) evaluation, you create a set of sample audio utterances. This set of utterances is called an annotation set.

Prerequisites

To create your annotation set for ASR testing, you must have the following items:

  • An Amazon developer account.
    To create an account, see Create Your Amazon Developer Account.
  • A set of sample utterances for testing. You have two options for these utterances:
    • Create a .zip file pre-recorded audio files containing your utterances compressed into a single .zip file.
    • Record your utterances directly from your computer when you create your annotation set.
  • (Optional) A CSV or JSON file of utterance transcriptions for your annotation set. You can upload this CSV file of transcriptions to avoid having to manually add the expected transcription for each utterance in your annotation set.

The .zip file of utterances has the following requirements:

  • The compressed .zip file can't be larger than 10 MB.
  • The audio files must be in one of the following formats:
    • mp3
    • wav
    • aiff
    • ogg
  • The .zip file can't contain more than 1000 files.
  • Individual audio files can't be larger than 3 MB in size.
  • Individual audio filenames can't contain non-ascii characters.

Create an automated annotation set of audio files

Before running ASR testing on a set of sample utterances, take the following steps to generate a set of pre-recorded audio utterances for testing.

To create an automated annotation set of audio files

  1. With your Amazon developer credentials, log in to the Alexa developer console.
  2. From the developer console, navigate to the Build tab.
  3. Under the Custom left nav tab, click Annotation Sets to display the NLU Evaluation page.
  4. On the NLU Evaluation Page, click the ASR Evaluation tab to go to the ASR Annotation Sets page.
  5. Under Automated Test Sets, click the Generate Test Set button.
  6. Select the data source for your test set:

    • Interaction Model – Use sample utterances in your skill's interaction model to create the test set.
    • Frequent Utterances – Use utterances frequently spoken to your skill to create the test set.
    • Utterances Recommendation Engine – Generate grammatical variations of sample utterances to create the test set.
  7. Click Generate Test Set and wait for your test set to generate.

  8. Review the values for Filename and Expected Transcription in your generated test sets.

    Click the speaker icon to review the audio. The file to be played maps to the values uploadId and filePathInUpload from Update Annotation Set Annotations for Automatic Speech Recognition (ASR) API. The expectedTranscription value maps to Expected Transcription in the developer console.

  9. In the upper-right corner, click Evaluate Model to run the evaluation.

  10. Review and troubleshoot issues with your skill models. The following list describes the expected pass rate and recommendations for improvements for each data source:
  • Interaction Model – An Interaction Model test set should have a pass rate greater than 95%. One common cause of errors is conflicting utterances across similar intents.
  • Frequent Utterance – A Frequent Utterance test set should have a pass rate over 80%. Because this test set contains the utterances that your users are saying to your skill, you can use this test set to review how your development model responds (or will respond when pushed to production) to live customer utterances.
  • Utterances Recommendation Engine – The Utterances Recommendation Engine test set should have a medium pass rate. The utterances in this set are variations of your sample utterances and could preempt what a user might say to your skill and your skill's expected response. Review the utterances in this test set, and remove utterances that aren't relevant to your skill. After updating your test set, review all utterances that map to AMAZON.FallbackIntent, if enabled, to find possible unsupported use cases.

Create an annotation set of audio files manually

As an alternative to generating an annotation set, you can create the annotation set manually. You can either upload a .zip file of pre-recorded utterances or record your utterances as part of creating your annotation set. If you have already uploaded your audio files to an Amazon S3 bucket, you can also upload a CSV file of expected utterance transcriptions and weights to create your annotation set.

To create an annotation set of audio files

  1. With your Amazon developer credentials, log in to the Alexa developer console.
  2. From the developer console, navigate to the Build tab.
  3. Under the Custom left nav tab, click Annotation Sets to display the NLU Evaluation page.
  4. On the NLU Evaluation Page, click the ASR Evaluation tab to go to the ASR Annotation Sets page.
  5. Under User Defined Test Sets, click the + Annotation Set button to create a new annotation set.
  6. At the prompt, name your annotation set. The page refreshes and displays your newly named annotation set.

  7. Add utterances to your annotation set by using either of the following options:

    You can save partially completed annotation sets with audio and file paths or expected transcriptions. However, you cannot evaluate partial sets until the sets are complete.

  8. To the upper-left, click the Save Annotation Set button to save your annotation set.

When you have finished adding utterances to your annotation set, you can edit your utterance metadata. See Edit metadata for an utterance.

Record audio utterances for an annotation set

To record audio utterances for an annotation set

  1. From the page for your annotation set, press and hold the Press and Hold to record button.
  2. Speak your utterance.
  3. Release the button when you've finished recording.

After recording your utterance, you can edit its metadata. For more details, see Edit metadata for an utterance.

Upload a pre-recorded set of audio utterances

If you have already recorded a set of audio utterances and compressed them into a .zip file, you can do a batch upload of your utterances.

To upload a pre-recorded set of audio utterances

  1. From the page of your annotation set, click the Upload button.

    A file navigator window opens.

  2. Use the file navigator to navigate to and select the .zip file containing your utterances.
  3. Click Open to upload the file.

    Your .zip file is uploaded to an AWS S3 bucket.

When you have finished adding uploading your utterances, you can edit your metadata for individual utterances. See Edit metadata for an utterance.

Upload a CSV or JSON file of expected transcriptions for an annotation set

If you've already uploaded an annotation set of audio files to an AWS S3 bucket, you can bulk edit the metadata for those files. To avoid having to manually add expected transcriptions for each utterance in your annotation set, you can upload a CSV or JSON file to your annotation set to bulk upload all of your transcriptions at one time.

To upload a CSV or JSON file of expected transcriptions for an annotation set

  1. Create your file with three fields:
    • filePathInUpload – Path in the uploaded zip file for the utterance. For example, consider a zip file containing a folder named 'folder' and with an audio file named audio.mp3 in that folder. The path is folder/audio.mp3. Use a forward slash ('/') to concatenate directories.
    • expectedTranscription – Expected transcription for the utterance.
    • evaluationWeight – Assigned weight indicating the importance of the utterance in evaluation.

    The following image shows an example CSV file with valid column headings:

    Sample CSV file
    Sample CSV file

    The following image shows an example JSON file with valid column headings:

    Sample CSV file
    Sample CSV file
  2. At the right side of the page for your annotation set, click the Bulk Edit button to open an Upload Annotation Set prompt.

  3. Navigate to your CSV or JSON file and click Open.

  4. On the Upload prompt, click the Submit and Save.

    The Expected Transcription and Weight fields automatically populate with the values from your CSV.

    Uploaded CSV file
    Uploaded CSV file

Edit metadata for an utterance

After creating your annotation set, you can edit the metadata for each utterance to help improve the accuracy of your ASR evaluation results.

To edit metadata for an utterance

  1. From the page for your annotation set, you can perform the following tasks:

    • Listen to an utterance.
    • Add the expected transcription.
    • Assign a weight to the utterance for ASR evaluations.
  2. To listen to an utterance, click the speaker icon next to the utterance.

    Listen to an utterance
    Listen to an utterance
  3. To add the expected transcription, click the Expected Transcription field for the utterance, and enter the actual text transcription for the utterance.

  4. To assign a weight to the utterance, choose a numeric weight from the Weight drop-down list for the utterance.

    The weight for the utterance indicates the importance of the utterance. For example, if for your skill, you expect the word "coffee" to be important for users, assign a higher weight to utterances containing the word "coffee." Weight values are on a scale of 1-10, with 10 granting the highest weight to an utterance.

You can now run your ASR Evaluation. See Run an Automatic Speech Recognition (ASR) Evaluation.