Audio to Text
Transcribe audio to text
Audio-To-Text
Deepgram model
Deepgram model
Deepgram is an audio-to-text model that is able to transcribe audio files in real-time, with a high degree of accuracy.
The models available are:
nova-2
nova
enhanced
base
whisper
(i.e., Deepgram’s own hosting of OpenAI’s model)
OpenAI Whisper model
Whisper model
Whisper is an audio-to-text model developed by OpenAI, that embodies a versatile speech recognition solution, having been developed using a substantial assortment of different audio data. Its capabilities span beyond mono-lingual speech recognition, extending to translating speech and pinpointing languages.
When using audio-to-text, two nodes will be required at minimum: the model node (right now only OpenAI model is available) and an output node.
- The model node will request an url that should contain the audio file (i.e., .mp3, .wav).
- The output node will display the result of the transcription performed by the model.
The result of the model node could be instead sent to an LLM node for processing. In the video below, the transcription is sent to the LLM that combines this information with data coming from an url that has been retrieved and processed.
Whisper model
Audio input options
URL as input
The most common way to use the audio-to-text node is to provide an URL that contains the audio file. This URL could be the one of a file stored in a cloud storage service (e.g., Google Drive, Dropbox, etc.) or the one of a file stored in a server.
Upload file as input
You can also upload a file from your computer and it will get transcribed.
Record your voice as input
Finally, you can record your voice directly from the browser to test the workflow.