Create an Annotation Set for Automatic Speech Recognition (ASR)

The Automatic Speech Recognition (ASR) Evaluation tool allows you to batch test audio files to measure the ASR accuracy of the skills that you've developed. With ASR, you can batch test your test sample audio utterances against ASR models and compare expected transcriptions with the actual transcriptions. The tool generates an evaluation report with accuracy metrics and pass/fail result for each test utterance, which you can use to resolve accuracy issues.

This page describes how to create an annotation set of sample audio utterances for use with an ASR testing run.

Prerequisites

You'll need the following items to create your annotation set for ASR testing:

  • An Amazon developer account. See developer.amazon.com to create your account, if necessary.
  • A set of sample utterances for testing. You have two options for these utterances:
    • Create a .zip file pre-recorded audio files containing your utterances compressed into a single .zip file.
    • Record your utterances directly from your computer when you create your annotation set.
  • (Optional) A CSV or JSON file of utterance transcriptions for your annotation set. You can upload this CSV file of transcriptions to avoid having to manually add the expected transcription for each utterance in your annotation set.

The .zip file of utterances has the following requirements:

  • The compressed .zip file can't be larger than 10 MB.
  • The audio files must be in one of the following formats:
    • mp3
    • wav
    • aiff
    • ogg
  • The .zip file can't contain more than 1000 files.
  • Individual audio files can't be larger than 3 MB in size.
  • Individual audio filenames can't contain non-ascii characters.

Create an annotation set of audio files

Before running ASR testing on a set of sample utterances, you will need a set of pre-recorded audio utterances for testing. The set of utterances used for a test run is called an "annotation set". You can either upload a .zip file of pre-recorded utterances or record your utterances as part of creating your annotation set. If you have already uploaded your audio files to an AWS S3 bucket, you can also just upload an CSV file of expected utterance transcriptions and weights to create your annotation set.

To create an annotation set of audio files

  1. With your Amazon developer credentials, log in to the Alexa developer console.
  2. From the developer console, navigate to the Build tab.
  3. Under the Custom left nav tab, click Annotation Sets to display the NLU Evaluation page.
  4. On the NLU Evaluation Page, click the ASR Evaluation tab to go to the ASR Annotation Sets page:

    endpoint latency screen
    Annotation Sets tab
  5. Click the + Annotation Set button to create a new annotation set.
  6. At the prompt, name your annotation set.

    The page refreshes and displays your newly named annotation set:

    Empty annotation set
    Empty annotation set
  7. Add utterances to your annotation set by using either of the following options:

    You can save partially completed annotation sets with audio and file paths or expected transcriptions. However, you cannot evaluate partial sets until the sets are complete.

  8. To the upper-left, click the Save Annotation Set button to save your annotation set.

When you have finished adding utterances to your annotation set, you can edit your utterance metadata. See Edit metadata for an utterance.

Record audio utterances for an annotation set

To record audio utterances for an annotation set

  1. From the page of your annotation set, press and hold the Press and Hold to record button.
  2. Speak your utterance.
  3. Release the button when you've finished recording.

After recording your utterance, you can edit its metadata. For more details, see Edit metadata for an utterance.

Upload a pre-recorded set of audio utterances

If you have already recorded a set of audio utterances and compressed them into a .zip file, you can do a batch upload of your utterances.

To upload a pre-recorded set of audio utterances

  1. From the page of your annotation set, click the Upload button.

    A file navigator window opens.

  2. Use the file navigator to navigate to and select the .zip file containing your utterances.
  3. Click Open to upload the file.

    Your .zip file is uploaded to an AWS S3 bucket.

When you have finished adding uploading your utterances, you can edit your metadata for individual utterances. See Edit metadata for an utterance.

Upload a CSV or JSON file of expected transcriptions for an annotation set

If you've already uploaded an annotation set of audio files to an AWS S3 bucket, you can bulk edit the metadata for those files. To avoid having to manually add expected transcriptions for each utterance in your annotation set, you can upload a CSV or JSON file to your annotation set to bulk upload all of your transcriptions at one time.

To upload a CSV or JSON file of expected transcriptions for an annotation set

  1. Create your file with three fields:
    • filePathInUpload – Path in the uploaded zip file for the utterance. For example, consider a zip file containing a folder named 'folder' and with an audio file named audio.mp3 in that folder. The path is folder/audio.mp3. Use a forward slash ('/') to concatenate directories.
    • expectedTranscription – Expected transcription for the utterance.
    • evaluationWeight – Assigned weight indicating the importance of the utterance in evaluation.

    The following image shows an example CSV file with valid column headings:

    Sample CSV file
    Sample CSV file

    The following image shows an example JSON file with valid column headings:

    Sample CSV file
    Sample CSV file
  2. At the right side of the page for your annotation set, click the Bulk Edit button to open an Upload Annotation Set prompt.

  3. Navigate to your CSV or JSON file and click Open.

  4. On the Upload prompt, click the Submit and Save.

    The Expected Transcription and Weight fields automatically populate with the values from your CSV.

    Uploaded CSV file
    Uploaded CSV file

Edit metadata for an utterance

After creating your annotation set, you can edit the metadata for each utterance to help improve the accuracy of your ASR evaluation results.

To edit metadata for an utterance

  1. From the page for your annotation set, you can perform the following tasks:

    • Listen to an utterance.
    • Add the expected transcription.
    • Assign a weight to the utterance for ASR evaluations.
  2. To listen to an utterance, click the speaker icon next to the utterance.

    Listen to an utterance
    Listen to an utterance
  3. To add the expected transcription, click the Expected Transcription field for the utterance, and enter the actual text transcription for the utterance.

  4. To assign a weight to the utterance, choose a numeric weight from the Weight drop-down list for the utterance.

    The weight for the utterance indicates the importance of the utterance. For example, if for your skill, you expect the word "coffee" to be important for users, assign a higher weight to utterances containing the word "coffee." Weight values are on a scale of 1-10, with 10 granting the highest weight to an utterance.

You can now run your ASR Evaluation. See Run an Automatic Speech Recognition (ASR) Evaluation.