Speech

Transcription

We use a combination of Deepgram and Whisper for our transcription pipeline. At the moment, we don’t support other STT providers. Our pipeline includes:

Noise cancellation: Filters background interference
Voice isolation: Focuses on the speaker, even in noisy environments
Turn-taking detection: Helps the agent know when to speak vs. listen

We continuously improve the pipeline, and updates are rolled out automatically.

If you’re experiencing low transcription quality, let us know. We can explore fine-tuning options or support bring-your-own models.

Speech synthesis

We offer six high-quality built-in voices, each professionally cloned and tuned for:

Natural prosody and intonation
Accurate pronunciation
Low-latency streaming

Each voice is replicated across multiple model providers to ensure reliability. If one provider goes down, we can switch seamlessly with no interruption. We recommend starting with a built-in voice for most deployments.

In case you’d like to use a custom voice, we support ElevenLabs and Cartesia. Get in touch if you’d like to bring your own.

Getting started

Conversational agents

Procedural Agents

Tools

Channels

Versioning

Troubleshooting

Transcription

Speech synthesis

Getting started

Conversational agents

Procedural Agents

Tools

Channels

Versioning

Troubleshooting

​Transcription

​Speech synthesis

Transcription

Speech synthesis