Recently, Amazon programmers announced their plan to present a different approach to emotion-classifying AI. The paper will be presented during the International Conference on Acoustics, Speech, and Signal Processing. It seems that their strategy basically is to provide the AI systems with a data set of thousands of utterances from 10 different speakers. The goal is to create a more accurate neural network, making AI Systems pick up emotions in users’ voices more easily.
Listening to someone’s voice can give you many clues on what the person is actually trying to say. The tone’s fluctuations can be used to accentuate ideas and even change the meaning of the pronounced words. Emotions are entirely contained in the sound of someone’s voice. Studying emotion can be used in many ways. For example, it can be used to detect early signs of dementia or heart attack, and it can offer the chance to enhance the responsiveness of AI systems. Voice assistants like Google Assistant, Siri, and Alexa could soon have the ability to pick up users’ tone of voice, understanding their emotions better.
Amazon Works On Better Voice Emotion Detection In AI Systems
This idea is not new, as Amazon has been working for a long time to give Alexa the ability to determine people’s mood or emotional state. Researchers classified emotions based on three measures: valence, activation, and dominance. The training consists of 3 steps. The first one uses unlabelled data to and the encoder and decoder individually. Next, the encoder is tuned by attempting to distinguish real representations from artificial ones.
Lastly, the encoder is tuned to make sure the latent emotion is represented in such a way that helps the encoder predict emotional labels. After several experiments, researchers say that their AI System gained a 3% rise in the accuracy of assessing valence. Even more, when the system was faced with a sequence of representations of 20 milliseconds each, the accuracy improved by 4%.