Fundamentals

A conversational agent in Operator is a language model powered by context, instructions, and tools—wrapped in infrastructure built for real-time interaction. Unlike basic chatbots, agents can retrieve external data, make informed decisions, and escalate to humans when appropriate.

Runtime modes

Conversational agents support two distinct conversation runtime modes, each with its own technical characteristics and capabilities:

Real-time conversations

Voice calls and real-time chats are real-time conversations where customers expect immediate responses and natural interaction flow.

Real-time conversations run as a dedicated process connected via WebSocket APIs, maintaining active state throughout the entire interaction with sub-500ms response times.

Key Characteristics:

Dedicated process: Each conversation runs in its own isolated worker for the full duration.
Persistent WebSocket: Maintains a live connection for real-time audio or text streaming.
Low latency: Responses typically under 500ms for natural-feeling back-and-forth.
Interruptible: Handles user interruptions smoothly with advanced turn-taking logic.
Stateful runtime: Maintains live context throughout the conversation.

Async conversations

SMS, email, and async chat are asynchronous conversations where messages may arrive hours or days apart, but agents maintain context and respond promptly and coherently whenever a new message comes in.

Async conversations use a suspend/resume model where the agent process sleeps between messages but can quickly resume and respond when new messages arrive.

Key characteristics:

Suspend/resume lifecycle: The agent process pauses when idle and resumes on new activity.
Responsive: Replies within seconds when a new message is received.
Persistent context: Maintains conversation state across suspend/resume cycles.
Time-tolerant: Designed for long gaps—hours or days—between messages.
Cross-session continuity: Preserves context across multiple interactions over time.

Conversation lifecycle

Starting conversations

Conversations begin through several trigger mechanisms:

Incoming calls: When customers call a phone number connected to your agent
API-triggered calls: When you programmatically initiate calls via the API
Async messages: When customers send SMS, email, or chat messages to channels connected to your agent

Active conversation state

Once triggered, the agent maintains active conversation state:

Real-time: Dedicated process runs continuously, handling streaming audio/text
Async: Process activates on new messages, maintains context between activations
Context management: Tracks conversation history, tool usage, and business logic state
Multi-modal handling: Manages voice, text, and tool interactions simultaneously

Ending conversations

Conversations terminate through natural completion:

Real-time conversations:

Agent or caller hangs up the phone
WebSocket connection is closed

Async conversations:

Inactivity timeout (configurable duration)
Explicitly ending the conversation using conversation.end

Memory and context

Conversational agents maintain memory across conversations to deliver personalized, continuous service. Memory is built from two main sources: context provided at call initiation, and customer identification (e.g. phone number), which enables preloading relevant history.

This memory enables agents to:

Handle callbacks gracefully: If a customer returns a missed call, the agent can pick up where it left off.
Recall prior interactions: Understand why the customer reached out previously and continue the conversation naturally.

By default, agents remembers last two weeks of conversations per customer.

See Context for more information on how to preload context to your agent.

Multi-channel behavior

Operator agents run the same core logic across all communication channels, with adaptations tailored to each medium.

Channel behavior

Voice: Optimized for spoken interaction: shorter responses, natural phrasing, and real-time responsiveness.
Text: Supports longer, more detailed replies with links, formatting, and structured content.
Channel transitions: Customers can move between channels (e.g. SMS to voice) without losing context.

Technical considerations

Real-time channels require consistent low latency and handle interruptions gracefully, while async channels can support longer processing times and more complex responses.
Voice-specific features like DTMF handling and call transfers are only available during voice conversations, while text-specific features like rich formatting and media sharing apply to text-based channels.

Getting started

Understanding these fundamentals helps you design agents that work effectively across different conversation types and business scenarios.

For simple use cases: Start with basic prompting and single-channel deployment to understand the core concepts.
For complex scenarios: Leverage multi-channel capabilities, memory management, and sophisticated tool integration as your use case evolves.

Learn about agent prompting to start building your agent’s conversation logic, or explore channels to understand deployment options.

Getting started

Conversational agents

Tools

Channels

Versioning

Runtime modes

Real-time conversations

Async conversations

Conversation lifecycle

Starting conversations

Active conversation state

Ending conversations

Memory and context

Multi-channel behavior

Channel behavior

Technical considerations

Getting started

Getting started

Conversational agents

Tools

Channels

Versioning

​Runtime modes

​Real-time conversations

​Async conversations

​Conversation lifecycle

​Starting conversations

​Active conversation state

​Ending conversations

​Memory and context

​Multi-channel behavior

​Channel behavior

​Technical considerations

​Getting started

Runtime modes

Real-time conversations

Async conversations

Conversation lifecycle

Starting conversations

Active conversation state

Ending conversations

Memory and context

Multi-channel behavior

Channel behavior

Technical considerations

Getting started