General questions

Our API is for developers, businesses and researchers interested in adding voice-driven Emotions Analytics to their applications, services and research projects.
Voice-driven Emotions Analytics analyzes emotions from a speaker’s voice, as they speak. It is a passive, non-invasive, continuous method of sensing how a speaker feels in real time.
Our unique engine adds an additional layer of emotional data points to your existing data. We have a variety of different outputs which can be used and combined depending on your needs. Learn more from some of our use cases.
We analyze a variety of different features which include: a variety of moods and mood groups, valence, arousal and temper. View the detailed definitions of our features here.
Like body language our vocal intonations are language and culture agnostic and to date we have analyzed more than 40 languages in 170 countries. Try your language now!
We are offering a 30 day free trial period for anyone who is interested in voice-driven Emotions Analytics. Sign up here to get your free API key.
Drop us an email at We are eagerly waiting to hear from you.

Technical Questions

This sample rate is the best frequency rate for voice analysis. This rate does not support too low and too high frequencies that are not relevant for human emotions, and the file size is very small which allows you to transfer it with minimum network usage. For free API keys we support only this bitrate for highest performance. If you wish to use files with higher frequency this will influence the speed of response, since the file size is larger and the network transmission speed is slower.
Wave PCM is the uncompressed format of audio recordings. The emotion recognition software examines very fine elements in the audio signal. These elements are lost when using low quality recording devices / noisy recording environment, and when signals are decoded from high compression encoders such as those used for Voice over IP. Bottom-line, the signal quality affects the recognition performance, thus it is recommended to use high quality signals.
Beyond Verbal’s recognition engine analyzes the voice signal using sliding window mechanism with a 10 seconds window size and 5 seconds overlap. Our research team came to conclusion that emotions is continuously changing process. Measuring emotions with consequential segments leaves the joints between segments without analysis. To provide more precise analysis that reflects continuous changes we decided to employ overlapping segments which analyzes 10 seconds segments with 5 seconds overlaps (shift) . This way odd segments measure emotions at the joints between even segments.
Our Emotions Analytics engine requires a minimum 13 seconds of uninterrupted good quality speech to produce a single analysis result. We highly recommend you read our Voice Input Guidelines before starting an analysis.
Confidence score is provided in the engine's output for each analyzable segment corresponding to the emotion parameter & group (e.g. low arousal). Confidence score reflects the likelihoods of correct identification of the emotion group (e.g. low arousal). This score is also used by Beyond Verbal to automatically exclude segment analyses that do not exceed a predetermined threshold. The confidence score can also be used by the users of the API to determine which score level they'd like to use – depending on their specific application. In case the engine determines that the segment is not analyzable (i.e. beneath the predetermined threshold) the analyzer respond would be Unknown in the Group field. For more info, click here .
Yes, we support real time analyses or post bulk analysis. For real time analysis use HTTP chunked transfer encoding.
Currently our engine only analyzes a single speaker per session. If you have two speakers on separate channels you may analyze them on different sessions.
We provide simple RESTful API documentation and a bunch of sample codes for several platforms such as Android, IOS, .Net and JavaScript.
You are welcome to test our web based JavaScript demo application . Just upload your file or try one of our preloaded ones.
There could be a variety of different reason for not getting an analysis. The most common reason is, too little voice. Our engine requires a minimum of 13 seconds of continuous voice to produce a single batch analysis. This 13 second duration should exclude prolonged silence or background noise. Check out our Voice Input Guidelines for more useful tips on recording good quality audio.

Did not find what you are looking for? Send us your question to