You will soon hear smile, sadness or excitement in machines' voices

Noé Tits, after graduating as Electrical Engineer, he pursued a PhD at Numediart Institute – UMONS on the application of Machine Learning techniques for expressive speech synthesis.

Currently, he takes care of the R&D of Flowchase’s speech technology to automatically analyze and provide feedback to English learners on their pronunciation using Machine/Deep Learning and Signal Processing paradigms. Flowchase is a mobile app that allows you to boost your pronunciation in English thanks to voice technology.

During 2020, he studied the controllability of an Expressive Text-to-Speech system trained on a dataset for a continuous control. By controllability, we mean the capacity of modifying at will the expressiveness of the synthesized speech thanks to parameters.

Controllability is evaluated with both an objective and a subjective experiment. The objective assessment is based on a measure of correlation between acoustic features and a latent representation of expressiveness. The subjective assessment is based on a perceptual experiment in which users are shown a 2D interface for Controllable Expressive Text-to-Speech and asked to retrieve a synthetic utterance by exploring it. To consult his work: https://www.mdpi.com/2227-9709/8/4/84/htm

Furthermore, according to Noé Tits, to be able to interact properly with intelligent systems, one major challenge is to make this interaction as intuitive and natural as possible for users.

When working with vocal interaction, this corresponds to synthesizing a natural voice with an expressiveness consistent with the context. The different possibilities in this area would be interesting for, e.g., the development of virtual characters with expressive voices, for animation movies, synthetic audiobooks…