Communication Protocol
Introduction
Communication with the EasyVR module uses a standard UART interface compatible with 3.3-5V TTL/CMOS logical levels, according to the powering voltage VCC.
A typical connection to an MCU-based host:
The initial configuration at power on is 9600 baud, 8 bit data, No parity, 1 bit stop. The baud rate can be changed later to operate in the range 9600 – 115200 baud.
The communication protocol only uses printable ASCII characters, which can be divided in two main groups:
- Command and status characters, respectively on the TX and RX lines, chosen among lower-case letters.
- Command arguments or status details, again on the TX and RX lines, spanning the range of capital letters.
Each command sent on the TX line, with zero or more additional argument bytes, receives an answer on the RX line in the form of a status byte followed by zero or more arguments.
There is a minimum delay before each byte sent out from the EasyVR module to the RX line, that is initially set to 20 ms and can be selected later in the ranges 0 – 9 ms, 10 – 90 ms, and 100 ms – 1 s. That accounts for slower or faster host systems and therefore suitable also for software-based serial communication (bit-banging) or communication proxies/bridges.
Since the EasyVR serial interface also is software-based, a very short delay might be needed before transmitting a character to the module, especially if the host is very fast, to allow the EasyVR to get back listening to a new character.
The communication is host-driven and each byte of the reply to a command has to be acknowledged by the host to receive additional status data, using the space character. The reply is aborted if any other character is received and so there is no need to read all the bytes of a reply if not required.
Invalid combinations of commands or arguments are signaled by a specific status byte, that the host should be prepared to receive if the communication fails. Also a reasonable timeout should be used to recover from unexpected failures.
If the host does not send all the required arguments of a command, the command is ignored by the module, without further notification, and the host can start sending another command.
The module automatically goes to lowest power sleep mode after power on. To initiate communication, send any character to wake-up the module.
Arguments Mapping
Command or status messages sent over the serial link may have one or more numerical arguments in the range -1 to 31, which are encoded using mostly characters in the range of uppercase letters. These are some useful constants to handle arguments easily:
ARG_MIN
‘@’ (40h) | Minimum argument value (-1) |
ARG_MAX
‘`’ (60h) | Maximum argument value (+31) |
ARG_ZERO
‘A’ (41h) | Zero argument value (0) |
ARG_ACK
‘ ‘ (20h) | Read more status arguments |
Having those constants defined in your code can simplify the validity checks and the encoding/decoding process. For example (in pseudo-code):
# encode value 5 FIVE = 5 + ARG_ZERO # decode value 5 FIVE – ARG_ZERO = 5 # validity check IF ARG < ARG_MIN OR ARG > ARG_MAX THEN ERROR
Just to make things clearer, here is a table showing how the argument mapping works:
ASCII | @ | A | B | C | … | Y | Z | ^ | [ | ] | _ | ` | |
HEX | 40 | 41 | 42 | 43 | … | 59 | 5A | 5B | 5C | 5D | 5E | 5F | 60 |
Value | -1 | 0 | 1 | 2 | … | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
Command Details
This section describes the format of all the command strings accepted by the module. Please note that numeric arguments of command requests are mapped to upper-case letters (see above section).
Some commands share the same lower case letter, because there were no command identifiers available when the protocol has been expanded, but the first argument is used to discriminate.
CMD_BREAK
‘b’ (62h) | Abort recognition, training or playback in progress if any or do nothing
Known issues: In firmware ID 0, any other character received during recognition will prevent this command from stopping recognition that will continue until timeout or other recognition results. |
Expected replies: STS_SUCCESS, STS_INTERR, STS_AWAKEN (if sleeping) |
CMD_SLEEP
‘s’ (73h) | Go to the specified power-down mode |
[1] | Sleep mode (0-8):
0 = wake on received character only 1 = wake on whistle or received character 2 = wake on loud sound or received character 3-5 = wake on double clap (with varying sensitivity) or received character 6-8 = wake on triple clap (with varying sensitivity) or received character |
Expected replies: STS_SUCCESS |
CMD_ID
‘x’ (78h) | Request firmware identification |
Expected replies: STS_ID |
CMD_DELAY
‘y’ (79h) | Set transmit delay |
[1] | Time (0-10 = 0-10 ms, 11-19 = 20-100 ms, 20-28 = 200-1000 ms) |
Expected replies: STS_SUCCESS |
CMD_BAUDRATE
‘a’ (61h) | Set communication baud-rate |
[1] | Speed mode:
1 = 115200 2 = 57600 3 = 38400 6 = 19200 12 = 9600 |
Expected replies: STS_SUCCESS |
CMD_LEVEL
‘v’ (76h) | Set SD level |
[1] | Strictness control setting (1-5):
1 = easy 2 = default 5 = hard A higher setting will result in more recognition errors. |
Expected replies: STS_SUCCESS |
CMD_KNOB
‘k’ (6Bh) | Set SI knob to specified level |
[1] | Confidence threshold level (0-4):
0 = loosest: more valid results 2 = typical value (default) 4 = tightest: fewer valid results Note: knob is ignored for trigger words |
Expected replies: STS_SUCCESS |
CMD_MIC_DIST
‘k’ (6Bh) | Set the microphone operating distance |
[1] | Fixed to (-1) |
[2] | Distance settings (1-3):
1 = “headset” (around 5cm from speaker’s mouth) 2 = “arm’s length” (default setting, from about 50cm to 1m) 3 = “far mic” (up to around 3m) |
Expected replies: STS_SUCCESS |
CMD_TRAILING
‘t’ (74h) | Set trailing silence for SI recognition |
[1] | Fixed to (-1) |
[2] | Amount of silence at the end of an utterance (0-31):
0 = 100ms … in steps of 25ms 31 = 875ms |
Expected replies: STS_SUCCESS |
CMD_FAST_SD
‘f’ (66h) | Set fast operating mode for SD/SV recognition |
[1] | Fixed to (-1) |
[2] | Operating mode (0 = normal/default, 1 = fast/low-latency) |
Expected replies: STS_TOKEN, STS_TIMEOUT |
CMD_LANGUAGE
‘l’ (6Ch) | Set SI language |
[1] | Language:
0 = English 1 = Italian 2 = Japanese 3 = German 4 = Spanish 5 = French |
Expected replies: STS_SUCCESS |
CMD_TIMEOUT
‘o’ (6Fh) | Set recognition timeout |
[1] | Timeout (-1 = default, 0 = infinite, 1-31 = seconds) |
Expected replies: STS_SUCCESS |
CMD_RECOG_SI
‘i’ (69h) | Activate SI recognition from specified word set |
[1] | Word set index (0-3) |
Expected replies: STS_SIMILAR, STS_TIMEOUT, STS_ERROR |
CMD_TRAIN_SD
‘t’ (74h) | Train specified SD/SV command |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
[2] | Command position (0-31) |
Expected replies: STS_SUCCESS, STS_RESULT, STS_SIMILAR, STS_TIMEOUT, STS_ERROR |
CMD_GROUP_SD
‘g’ (67h) | Insert new SD/SV command |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
[2] | Position (0-31) |
Expected replies: STS_SUCCESS, STS_OUT_OF_MEM |
CMD_UNGROUP_SD
‘u’ (75h) | Remove SD/SV command |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
[2] | Position (0-31) |
Expected replies: STS_SUCCESS |
CMD_RECOG_SD
‘d’ (64h) | Activate SD/SV recognition |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
Expected replies: STS_RESULT, STS_SIMILAR, STS_TIMEOUT, STS_ERROR |
CMD_ERASE_SD
‘e’ (65h) | Erase training of SD/SV command |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
[2] | Command position (0-31) |
Expected replies: STS_SUCCESS |
CMD_NAME_SD
‘n’ (6Eh) | Label SD/SV command |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
[2] | Command position (0-31) |
[3] | Length of label (0-31) |
[4-n] | Text for label (ASCII characters from ‘A’ to ‘`’)
EasyVR Commander encodes digits 0-9 as A-J prefixed by ‘^’ |
Expected replies: STS_SUCCESS |
CMD_COUNT_SD
‘c’ (63h) | Request count of SD/SV commands in the specified group |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
Expected replies: STS_COUNT |
CMD_DUMP_SD
‘p’ (70h) | Read SD/SV command data (label and training) |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
[2] | Command position (0-31) |
Expected replies: STS_DATA |
CMD_MASK_SD
‘m’ (6Dh) | Request bit-mask of non-empty groups |
Expected replies: STS_MASK |
CMD_RESETALL
‘r’ (72h) | Reset all (erase commands/groups and messages) |
‘R’ (52h) | Confirmation character |
Expected replies: STS_SUCCESS |
CMD_RESET_SD
‘r’ (72h) | Reset only commands and groups |
‘D’ (52h) | Confirmation character |
Expected replies: STS_SUCCESS |
CMD_RESET_RP
‘r’ (72h) | Reset only message recordings |
‘M’ (52h) | Confirmation character |
Expected replies: STS_SUCCESS |
CMD_QUERY_IO
‘q’ (71h) | Configure, query or modify general purpose I/O pins |
[1] | Pin number (1 = pin IO1, 2 = pin IO2, 3 = pin IO3) |
[2] | Pin mode (0 = output low, 1 = output high, 2 = input*, 3 = input strong**, 4 = input weak***)
* High impedance input (no pull-up) **Strong means ~10K internal pull-up ***Weak means ~200K internal pull-up (default after power up) |
Expected replies: STS_SUCCESS (mode 0-1), STS_PIN (mode 2-4) |
CMD_PLAY_SX
‘w’ (77h) | Wave table entry playback |
[1-2] | Two positive values that form a 10-bit index to the sound table (index = [1] * 32 + [2], 0 = built-in “beep”, 1-1023 = sound index) |
[3] | Playback volume (0-31, 0 = min volume, 15 = full scale, 31 = double gain) |
Expected replies: STS_SUCCESS, STS_ERROR |
CMD_PLAY_DTMF
‘w’ (77h) | Play a DTMF key tone or dial tone |
[1] | Fixed to (-1) |
[2] | Index of phone tone to play (0-9 for digits, 10 for ‘*’ key, 11 for ‘#’ key and 12-15 for extra keys ‘A’ to ‘D’, -1 for the dial tone) |
[3] | Tone duration minus 1 (0-31 in 40ms units for keys, in seconds for the dial tone) |
Expected replies: STS_SUCCESS |
CMD_DUMP_SX
‘h’ (68h) | Read wave table data |
Expected replies: STS_TABLE_SX, STS_OUT_OF_MEM |
CMD_DUMP_SI
‘z’ (7Ah) | Read custom and built-in grammars data |
[1] | Index of SI grammar to read (0-31) or (-1) to get the total count of SI grammars (including the first 4 built-in word sets) |
Expected replies: STS_GRAMMAR, STS_COUNT |
CMD_SEND_SN
‘j’ (6Ah) | Send a SonicNetTM token |
[1] | Length of token (4 or 8 in bits) |
[2-3] | Two positive values that form an 8-bit token index (index = [2] * 32 + [3], 0-15 for 4-bit tokens or 0-255 for 8-bits tokens) |
[4-5] | Two positive values that form a 10-bit delay for token output since the next sound playback (delay = [4] * 32 + [5], 0 = send immediately, 1-1023 = delay in units of 27.46ms) |
Expected replies: STS_SUCCESS |
CMD_RECV_SN
‘f’ (66h) | Receive a SonicNetTM token |
[1] | Length of token (4 or 8 in bits) |
[2] | Rejection level (0-2 = higher values mean fewer results, 1 = default) |
[3-4] | Two positive values that form a 10-bit timeout for token detection (timeout = [3] * 32 + [4], 0 = wait forever, 1-1023 = timeout in units of 27.46ms) |
Expected replies: STS_TOKEN, STS_TIMEOUT |
CMD_LIPSYNC
‘l’ (6Ch) | Start real-time lip-sync |
[1] | Fixed to (-1) |
[2-3] | Activation threshold (10-bit value = [2] * 32 + [3]):
270 = default setting |
[4-5] | Timeout option (8-bit value = [4] * 16 + [5]):
0 = no timeout (can be interrupted) 1-255 = duration in seconds |
Expected replies: STS_LIPSYNC |
CMD_RECORD_RP
‘r’ (72h) | Record a message |
[1] | Fixed to (-1) |
[2] | Message index (0-31) |
[3] | Data format (8) |
[4] | Timeout option (0-31):
0 = no timeout (can be interrupted) 1-31 = duration in seconds |
Expected replies: STS_SUCCESS, STS_ERROR |
CMD_PLAY_RP
‘p’ (70h) | Play a message recording |
[1] | Fixed to (-1) |
[2] | Message index (0-31) |
[3] | Playback options (bit-mask):
Bit 2 (4) = playback speed (0 = normal, 1 = fast) Bit 1-0 (0-3) = volume attenuation (0 = normal, 1 = -2.2dB, 2 = -4.5dB, 3 = -6.7dB) |
Expected replies: STS_SUCCESS, STS_ERROR |
CMD_ERASE_RP
‘e’ (65h) | Erase a message recording |
[1] | Fixed to (-1) |
[2] | Message index (0-31) |
Expected replies: STS_SUCCESS, STS_ERROR |
CMD_VERIFY_RP
‘v’ (76h) | Verify file-system integrity for message recordings |
[1] | Fixed to (-1) |
[2] | Type of operation:
0 = check only 1 = check and fix errors |
Expected replies: STS_SUCCESS, STS_ERROR |
CMD_SERVICE + SVC_EXPORT_SD
‘~’ (7Eh) | Service protocol expansion |
‘X’ (58h) | Export command raw data |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
[2] | Command position (0-31) |
Expected replies: STS_SERVICE + SVC_DUMP_SD |
CMD_SERVICE + SVC_IMPORT_SD
‘~’ (7Eh) | Service protocol expansion |
‘I’ (49h) | Import command raw data |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
[2] | Command position (0-31) |
[3-514] | Raw command data (encoded as hex nibbles – byte = [n] * 16 + [n+1]) |
[515-518] | Data checksum (16-bit sum of all 256 bytes, starting at 1234h) |
Expected replies: STS_SUCCESS, STS_INTERR (if checksum fails) |
CMD_SERVICE + SVC_VERIFY_SD
‘~’ (7Eh) | Service protocol expansion |
‘V’ (56h) | Verify command raw data |
[1] | Group index (0 = trigger, 1-15 = generic, 16 = password) |
[2] | Command position (0-31) |
Expected replies: STS_SUCCESS, STS_RESULT, STS_SIMILAR, STS_ERROR |
Status Details
Replies to commands follow this format. Please note that numeric arguments of status replies are mapped to upper-case letters (see the related section).
STS_MASK
‘k’ (6Bh) | Mask of non-empty groups |
[1-8] | 4-bit values that form 32-bit mask, LSB first |
In reply to: CMD_MASK_SD |
STS_COUNT
‘c’ (63h) | Count of commands or total number of SI grammars |
[1] | Integer (0-31 = command/grammar count, -1 = 32 commands/grammars) |
In reply to: CMD_COUNT_SD, CMD_DUMP_SI |
STS_AWAKEN
‘w’ (77h) | Wake-up (back from power-down mode) |
In reply to: Any character after power on or sleep mode |
STS_DATA
‘d’ (64h) | Provide command data |
[1] | Training information (-1=empty, 1-6 = training count, +8 = SD/SV conflict, +16 = SI conflict) Known issues: In firmware ID 0, command creation/deletion might cause other empty commands training count to change to 7. Treat count values of -1, 0 or 7 as empty training markers. Never train commands more than 2 or 3 times. |
[2] | Conflicting command position (0-31, only meaningful when trained) |
[3] | Length of label (0-31) |
[4-n] | Text of label (ASCII characters from ‘A’ to ‘`’) EasyVR Commander encodes digits 0-9 as A-J prefixed by ‘^’ |
In reply to: CMD_DUMP_SD |
STS_ERROR
‘e’ (65h) | Signal recognition error |
[1-2] | Two positive values that form an 8-bit error code (error = [1] * 16 + [2], see appendix) |
In reply to: CMD_RECOG_SI, CMD_RECOG_SD, CMD_TRAIN_SD, CMD_PLAY_SX |
STS_INVALID
‘v’ (76h) | Invalid command or argument |
In reply to: Any invalid command or argument |
STS_TIMEOUT
‘t’ (74h) | Timeout expired |
In reply to: CMD_RECOG_SI, CMD_RECOG_SD, CMD_TRAIN_SD |
STS_LIPSYNC
‘l’ (6Ch) | Lip-sync streaming data |
[1-N] | Mouth position (0-31): 0 = fully closed 31 = fully open Note: New values are available at request around every 27ms until timeout occurs or command is interrupted. A new status is sent at the end (see STS_TIMEOUT, STS_INTERR). |
In reply to: CMD_LIPSYNC |
STS_INTERR
‘i’ (69h) | Interrupted recognition |
In reply to: CMD_BREAK while in training, recognition or playback |
STS_SUCCESS
‘o’ (6Fh) | OK or no errors status |
In reply to: CMD_BREAK, CMD_DELAY, CMD_BAUDRATE, CMD_TIMEOUT, CMD_KNOB, CMD_LEVEL, CMD_LANGUAGE, CMD_SLEEP, CMD_GROUP_SD, CMD_UNGROUP_SD, CMD_ERASE_SD, CMD_NAME_SD, CMD_RESETALL, CMD_QUERY_IO, CMD_PLAY_SX, etc. |
STS_RESULT
‘r’ (72h) | Recognized SD/SV command or Training similar to SD/SV command |
[1] | Command position (0-31) |
In reply to: CMD_RECOG_SD, CMD_TRAIN_SD |
STS_SIMILAR
‘s’ (73h) | Recognized SI word or Training similar to SI word |
[1] | Word index (0-31) |
In reply to: CMD_RECOG_SI, CMD_RECOG_SD, CMD_TRAIN_SD |
STS_OUT_OF_MEM
‘m’ (6Dh) | Memory error (no more room for commands or sound table not present) |
In reply to: CMD_GROUP_SD, CMD_DUMP_SX |
STS_ID
‘x’ (78h) | Provide firmware identification |
[1] | Version identifier (0 = VRbot, 1-7 = EasyVR 2, 8-15 = EasyVR 3, 16+ = EasyVR 3 Plus) |
In reply to: CMD_ID |
STS_PIN
‘p’ (70h) | Provide pin input status |
[1] | Logic level (0 = input low, 1 = input high) |
In reply to: CMD_QUERY_IO |
STS_TABLE_SX
‘d’ (64h) | Provide sound table data |
[1-2] | Two positive values that form a 10-bit count of entries in the sound table (count = [1] * 32 +[2]) |
[3] | Length of table name (0-31) |
[4-n] | Text of table name (ASCII characters from ‘A’ to ‘`’) |
In reply to: CMD_DUMP_SX |
STS_GRAMMAR
‘z’ (7Ah) | Provide custom grammar data |
[1] | Some flags for this grammar (currently16 is returned for trigger grammars, 0 for commands) |
[2] | Number of commands in this grammar (0-31) |
[3] | Length of first command label (0-31) |
[4-n] | Text of first command label (ASCII characters from ‘A’ to ‘`’) |
… | Repeat last two fields for all the commands in this grammar |
In reply to: CMD_DUMP_SI |
STS_TOKEN
‘f’ (66h) | Detected a SonicNetTM token |
[1-2] | Two positive values that form the index of a received token (index = [1] * 32 + [2], 0-15 for 4-bit tokens or 0-255 for 8-bits tokens) |
In reply to: CMD_RECV_SN |
STS_SERVICE + SVC_DUMP_SD
‘~’ (7Eh) | Service protocol expansion |
‘D’ (44h) | Provide command raw data |
[1-512] | Raw command data (encoded as hex nibbles – byte = [n] * 16 + [n+1]) |
[513-516] | Data checksum (16-bit sum of all 256 bytes, starting at 1234h) |
In reply to: CMD_SERVICE + SVC_EXPORT_SD |