Text-to-audio

ElevenLabs brings one of the most realistic Text to Speech and Voice Cloning model. The node is available in the multimodal section.

To get it running, two nodes are needed: an input node where the text should be written and the “Text to Audio” node that will generate the audio. This audio could be either played in Stack AI platform or downloaded.

Whisper model

Defaults voices sare available using Stack AI’s API Key (default configuration). The list is the following one:

['Rachel', 'Domi', 'Bella', 'Antoni', 'Elli', 'Josh', 'Arnold', 'Adam', 'Sam']

Setting up custom API Key is available in the settings section of the “text to audio” node. Voice appears now as a parameter that could be inputed in the deployment. 3. Text to Image