Transcription

We use a combination of Deepgram and Whisper for our transcription pipeline. At the moment, we don’t support other STT providers.

Our pipeline includes:

  • Noise cancellation: Filters background interference
  • Voice isolation: Focuses on the speaker, even in noisy environments
  • Turn-taking detection: Helps the agent know when to speak vs. listen

We continuously improve the pipeline, and updates are rolled out automatically.

If you’re experiencing low transcription quality, let us know. We can explore fine-tuning options or support bring-your-own models.

Speech synthesis

We offer six high-quality built-in voices, each professionally cloned and tuned for:

  • Natural prosody and intonation
  • Accurate pronunciation
  • Low-latency streaming

Each voice is replicated across multiple model providers to ensure reliability. If one provider goes down, we can switch seamlessly with no interruption. We recommend starting with a built-in voice for most deployments.

In case you’d like to use a custom voice, we support ElevenLabs and Cartesia. Get in touch if you’d like to bring your own.