Alexa.Gadget.SpeechData Interface
This interface sends your gadget speechmark data. Speechmarks are metadata that enable your gadget to synchronize speech with visual experiences. One example of an action that your gadget can take based on this data is to lip sync to Alexa's text-to-speech (TTS).
Supporting this interface
To support this interface, the gadget must respond to the Echo device's Discover
directive with a Discover.Response
event that includes the following entry in its array of Capabilities
:
{
"type": "AlexaInterface",
"interface": "Alexa.Gadget.SpeechData",
"version": "1.0",
"configurations": {
"supportedTypes": [
{
"name":"viseme"
}
]
}
}
Directives
This interface includes one directive: Speechmarks
, as described next.
Speechmarks directive
This directive provides speechmark data to your gadget. The .proto file contents are as follows:
message SpeechmarksDirectiveProto {
Directive directive = 1;
message Directive {
alexaGadgetSpeechData.SpeechmarksDirectivePayloadProto payload = 2;
header.DirectiveHeaderProto header = 1;
}
}
message DirectiveHeaderProto {
string namespace = 1;
string name = 2;
string messageId = 3;
string dialogRequestId = 4;
}
message SpeechmarksDirectivePayloadProto {
repeated SpeechmarksData speechmarksData = 2;
message SpeechmarksData {
int32 startOffsetInMilliSeconds = 3;
string type = 2;
string value = 1;
}
int32 playerOffsetInMilliseconds = 1;
}
SpeechmarksDirectiveProto
The fields in this message are as follows:
Field | Description | Type |
---|---|---|
directive |
Contains a complete Speechmarks directive. |
Directive |
Directive
The fields of the message are as follows:
Field | Description | Type |
---|---|---|
header |
Contains the header for this directive. | DirectiveHeaderProto |
payload |
Contains the payload for this directive. | SpeechmarksDirectivePayloadProto |
DirectiveHeaderProto
The fields of the message are as follows:
Field | Description | Type |
---|---|---|
namespace |
The namespace of this directive, which is Alexa.Gadget.SpeechData . |
string |
name |
The name of this directive, which is Speechmarks . |
string |
messageId |
An ID that uniquely defines an instance of this directive. This string can be empty. | string |
dialogRequestId |
A unique ID that correlates this directive with a specific voice interaction from a user. You can ignore this field. | string |
SpeechmarksDirectivePayloadProto
The fields of the message are as follows:
Field | Description | Type |
---|---|---|
speechmarksData |
An object that represents speechmark data. It specifies the type of data, values, and offset. | SpeechmarksData |
playerOffsetInMilliseconds |
Where the speech currently is in its stream, in milliseconds. | int32 |
SpeechmarksData
The fields of the message are as follows:
Field | Description | Type |
---|---|---|
type |
The type of speechmark data that this directive contains. Currently, the only possible value is "VISEME" . Viseme is a mouth position that corresponds to a spoken sound. |
string |
value |
The value of the speechmark. | string |
startOffsetInMilliSeconds |
The start offset of the value , in milliseconds. To determine how to sync speechmark data to Alexa's speech, use startOffsetInMilliSeconds minus playerOffsetInMilliseconds .
For example, say your gadget receives the following viseme speechmark data at playerOffsetInMilliseconds = 7000 :value : "t" , startOffsetInMilliSeconds : 3000 value : "a" , startOffsetInMilliSeconds : 5000 value : "p" , startOffsetInMilliSeconds : 9000 value : "e" , startOffsetInMilliSeconds : 11000 Because playerOffsetInMilliseconds is 7000 , your gadget should start considering the values at "p" , and ignore the earlier values.If playerOffsetInMilliseconds and startOffsetInMilliSeconds are both zero, the gadget should process the data immediately.
|
int32 |
Last updated: Mar 31, 2022