Dialog Evaluation Tool

An artificial intelligence (AI) tool is only as good as the data it learns from. To design a great user experience, it is important to design the training data appropriately. The dialog evaluation tool for Alexa Conversations is one way of working with AI to deliver better training data and effectively improve the AI and experience for your users.

With the dialog evaluation tool, you can offer feedback on Alexa Conversations AI tool’s actions. This capability enables you to design high-quality conversational training data with reduced effort. You can do this by working with the tool interactively to see the predicted action and response at each turn of the dialog. You provide feedback to the tool, correcting it as needed. You can also use the tool to reduce the amount of time you spend designing the training data. You can iterate quickly, making corrections as needed.

How the tool works

Use the ask dialog --evaluate command to start evaluating your Alexa Conversations skill. Start a dialog by beginning with your invocation phrase. You are now playing the ‘user’. Type your discussion with Alexa with the same words you would use if you were talking naturally. Then, see how your skill responds. Satisfied with your skill's responses? Keep the conversation going. If your skill responses don't seem right, tell the tool with feedback. When you do so, this launches a correction flow in which you can provide the corrected output for the tool to learn from. After you’ve corrected the error, keep the conversation going as you would otherwise, and let the tool note all your feedback. Your feedback is critical. It reinforces positive behavior and suppresses negative behavior through improved training signals. The following example shows how you interact with the evaluation tool.

> User: Open Cupcake Shop
> Alexa: Hey you.  What can I help with you?

> User: I’d like to order a cake
> Alexa: What kind of cake are you thinking?
Do you accept this response [y/n]? y

> User: A yellow half sheet cake
> Alexa: Great. That is a half yellow cake *with* *sheet icing*.
 Do you accept this response [y/n]? n
In correction mode:
  Prediction: type U0 {Color color0, Size size0, Icing icing0}
              u0 = received<U0>(Invoke, "A {yellow|color0} 
               {half|size0} {sheet|icing0} cake.")
  Is this correct [y/n]: n
  Correction for type (press enter if no change): 
    type U1 {Color color0, Size size0} 
  Correction for event (press enter if no change):
    u0 = received<U1>(Invoke, "A {yellow|color0} 
     {half sheet|size0} cake.")
  ----------------------------------
  Prediction: response(confirmCakePropsApla, 
     ConfirmArgs {arguments=[color, size]}, surfaceForm?)
  Is this correct [y/n]: y
  ----------------------------------
  End of turn [y/n]: y
> Alexa: Great. That is a yellow half sheet cake.

Do you want to see more details? Use the verbose option to give you the details of the predictions by default (as opposed to just in correction mode). The following example shows how the verbose option works.

> User: I’d like to order a cake
[acdl]: received(Invoke, "I’d like to order a cake.")
[info]: User request act is Invoke
> Alexa: What kind of cake are you thinking?
[acdl]: response(generalCakeRequestApla, Request {arguments = [color, size, icing]})
[info]: Alexa responds with generalCakeRequestApla to Request the
 arguments color, size and icing for PlaceOrderAPI
Do you accept this response [y/n]? y

The dialog evaluation tool helps you iterate improvements quickly and design a quality user experience.

Dialog Evaluation Tool

How the tool works

Related articles