As the developer community around Alexa has continued to grow (great news), we’ve been getting more feedback. This feedback is important to us, and we’re doing our best to gather as much as possible. Many of us on the Alexa team read our developer forums on a daily basis, and we talk to developers at industry events almost every week. Suggestions are coming in for new Alexa features, and we’re getting questions on our certification and testing process. Specifically, the questions we hear the most are:
- Why even test Alexa skills? These are different than mobile apps and developers can make changes without resubmitting. So what’s the point?
- What’s the process by which you test skills? Given the differences from mobile apps, what are the steps being taken?
- Once a skill is published, how do you maintain any control over the experience?
- What are the top failures and what help can you offer in those areas?
We’ll address those questions here and provide details on how we test, why some skills are failing, and include resources to help you build higher quality skills. We know that Alexa will only succeed with the help of the developer community, so as we build out Alexa, your ideas are important inputs. Thank you for your support.
Why Certify Alexa Skills?
The Amazon Skill Certification process is in place to give us the best chance of providing customers high-quality content. As with any testing process, there may be ways to game it, but the review process helps to provide customers a better experience when interacting with Alexa. Many developers are also still learning the best practices around Voice User Interface (VUI) design, so we are trying to help them, too. Our goal is to provide actionable feedback and recommendations, to help you resolve technical issues with your skills, and improve the overall skill experience.
Our philosophy is to let customers decide which skills are valuable, which is why we launched “Ratings and Reviews” in 2015. But in some cases, we make decisions that we think are best for customers. As an example, we have observed many skills that do not do what developers intended. About a quarter of skills fail because, through 1:1 testing, we discover that the skill does not respond to the utterances that were intended (based on the developer intent). In this case, we do not want to allow potentially broken experiences into the store. This is best for customers and developers. As best practices for voice design continue to evolve, and the quality of skill submissions improves, we will continue to adjust the certification process. The goal is to make the process as nimble, valuable, and transparent as possible.
How Alexa Skills Are Tested
Due to the dynamic nature of Alexa skills, every single submitted Alexa skill is certified through manual testing by our team. The process typically involves the following:
- All skill submissions and re-submissions are first checked for adherence to Amazon Alexa Skills policy guidelines, and the skill end-point is then verified for compliance to security requirements.
- Then the skill’s metadata, description, and information on the home card are verified to ensure they match the functionality the skill provides.
- The core functionality of the skill is verified by launching the skill to test the validity of the invocation name, testing various skill responses, and using real example phrases. Various combinations of defined intents and slots are also used to test out the skills functionality. This enables us to test both the skill’s stream management and error handling. We ensure that the skill prompts the customers at the right instances, and interactions have a graceful exit path.
The goal of this in-depth approach is to test the Alexa skill just as an actual customer would. This ensures that the skill’s core functionality matches the skill’s description and publishing information. We also want to ensure that skills use the most current feature set of the Alexa Skills Kit where applicable, including Amazon Account Linking, custom-slot types, built-in intents, and short-audio streaming, etc. In the final phase, the Alexa skill is manually evaluated by Alexa platform subject matter experts to test the voice interface of the skill. This testing focuses on deep-diving into the internals of the skill’s language schema to make certain that the skill’s language model meets standards.
Ultimately, all of this manual testing is done to provide the best customer experience possible. A published Alexa skill will ideally interact with Alexa platform to improve the skill’s recognition patterns, while not adversely impacting the language models of Alexa platform. This includes verifying that example interactions are included in the sample utterances. We make sure that all intents and slots have associated sample utterances that can respond to common one-shot utterances, and also review slots and associated types.
Ongoing Testing of Published Alexa Skills
Given that the skill interaction is all controlled by the developer’s server-side code (and not by uploading code into Alexa), the certification team regularly retests the entire live skill catalog. So while a developer could change the experience without republishing (and in some cases this would be encouraged), we are trying our best to ensure that the basic experience is maintained. We do this via the following:
- Monitoring customer reviews of skills in the Alexa app
- Manually testing sample utterances of all published skills every 24 hours to verify basic functionality
- Programmatically pinging Alexa skill endpoints at frequent intervals to ensure 24x7 availability of skills
Top Reasons for Alexa Skill Certification Failures
Below are the top reasons that skills fail certification. These reasons fall into two buckets: functional test failures and user experience test failures. We hope that by sharing this data, we will help you prioritize your testing efforts before you submit a skill for certification. We are also using this data to identify parts of the testing that we can automate, or enable you to test on your own through tools that we provide. Our goal is that developers can test and identify some of the more commonly occurring issues on the developer portal before submitting the skill for certification. This will help minimize the (re)submission iterations needed to publish skills.
Functional Test Failures
A checklist for functional tests can be found here. Below are the top functional failure categories:
- Example phrases: check if example phrases include Wake word, invocation name, and provide the right text to speech (TTS) response
- Intent response: skill should provide proper response for example phrases, one shot, and modal phrases for sample utterances
- Home card: skill titles that are displayed should be proper and should not contain any code reference.
- Invocation name: check if the invocation name contains space between each word, invocation name is not more than three words, and skill provides correct TTS responses for ask and tell modals.
User Experience Test Failures
A checklist for voice and user experience tests can be found here. Below are top user experience failure categories:
- Skill doesn’t prompt users for needed information to complete the task (when slot values are missed)
- Not all customer-facing example phrases (as provided in the skill’s description) are included as sample utterances in the skill’s intent schema
- Stream management (leaving the skill open or closed inappropriately)
- Lack of, or insufficient use of, error handling
- Confusing or overbearing prompting
- Lack of help and graceful exit handling
- Intents or slots in the intent schema with no related sample utterances
- Sample utterances do not cover the most common ways to ask the skill in a “one-shot” manner
- Custom slot types do not correctly incorporate custom slot into the intent schema and sample utterances
Guidelines & Checklists to Help Developers
Based on what we have learned so far in skill certification testing, we have published guides to assist you. These resources are in addition to the Developer Portal documentation, forums, and knowledge base that developers are already using.
Improvements to the Skill Certification Process
Voice experiences are a relatively new category of digital content and we are still learning and improving. We are working on several improvements to the certification process.
- Recruiting and training additional content testers to help improve the response time on certification feedback
- Improving the frequency and quality of communication with actionable feedback
- Streamlining testing operations to improve turnaround time on resubmissions
- Updating certification test-cases so that we only test for items that have a material impact on the skill’s customer experience (less subjective feedback)
- Building automated pre-certification tools so developers can test their skills prior to submission
We have also added a new Alexa category on the Developer Portal contact us page so you can now more directly reach a person on the Alexa certification team when you encounter issues.
We are excited about the passion that you have brought to the Alexa platform, and together, we are working to deliver new experiences to our customers each and every week. The certification process will continue to improve over next few weeks, with the goal of providing benefits to both developers and our customers. If you have any questions and want to talk to someone on the team, visit our weekly Alexa Dev Chats, which will kick off in two weeks. We will rotate different roles through these chats, so you can directly talk to people on the engineering team, certification team, and business teams. Please check back for exact dates and times. We look forward to working with you.