Fundamentals
Understanding how conversational agents work in the Operator platform
A conversational agent in Operator is a language model powered by context, instructions, and tools—wrapped in infrastructure built for real-time interaction. Unlike basic chatbots, agents can retrieve external data, make informed decisions, and escalate to humans when appropriate.
Runtime modes
Conversational agents support two distinct conversation runtime modes, each with its own technical characteristics and capabilities:
Real-time conversations
Voice calls and real-time chats are real-time conversations where customers expect immediate responses and natural interaction flow.
Real-time conversations run as a dedicated process connected via WebSocket APIs, maintaining active state throughout the entire interaction with sub-500ms response times.
Key Characteristics:
- Dedicated process: Each conversation runs in its own isolated worker for the full duration.
- Persistent WebSocket: Maintains a live connection for real-time audio or text streaming.
- Low latency: Responses typically under 500ms for natural-feeling back-and-forth.
- Interruptible: Handles user interruptions smoothly with advanced turn-taking logic.
- Stateful runtime: Maintains live context throughout the conversation.
Async conversations
SMS, email, and async chat are asynchronous conversations where messages may arrive hours or days apart, but agents maintain context and respond promptly and coherently whenever a new message comes in.
Async conversations use a suspend/resume model where the agent process sleeps between messages but can quickly resume and respond when new messages arrive.
Key characteristics:
- Suspend/resume lifecycle: The agent process pauses when idle and resumes on new activity.
- Responsive: Replies within seconds when a new message is received.
- Persistent context: Maintains conversation state across suspend/resume cycles.
- Time-tolerant: Designed for long gaps—hours or days—between messages.
- Cross-session continuity: Preserves context across multiple interactions over time.
Conversation lifecycle
Starting conversations
Conversations begin through several trigger mechanisms:
- Incoming calls: When customers call a phone number connected to your agent
- API-triggered calls: When you programmatically initiate calls via the API
- Async messages: When customers send SMS, email, or chat messages to channels connected to your agent
Active conversation state
Once triggered, the agent maintains active conversation state:
- Real-time: Dedicated process runs continuously, handling streaming audio/text
- Async: Process activates on new messages, maintains context between activations
- Context management: Tracks conversation history, tool usage, and business logic state
- Multi-modal handling: Manages voice, text, and tool interactions simultaneously
Ending conversations
Conversations terminate through natural completion:
Real-time conversations:
- Agent or caller hangs up the phone
- WebSocket connection is closed
Async conversations:
- Inactivity timeout (configurable duration)
- Explicitly ending the conversation using
conversation.end
Memory and context
Conversational agents maintain memory across conversations to deliver personalized, continuous service. Memory is built from two main sources: context provided at call initiation, and customer identification (e.g. phone number), which enables preloading relevant history.
This memory enables agents to:
- Handle callbacks gracefully: If a customer returns a missed call, the agent can pick up where it left off.
- Recall prior interactions: Understand why the customer reached out previously and continue the conversation naturally.
By default, agents remembers last two weeks of conversations per customer.
See Context for more information on how to preload context to your agent.
Multi-channel behavior
Operator agents run the same core logic across all communication channels, with adaptations tailored to each medium.
Channel behavior
- Voice: Optimized for spoken interaction: shorter responses, natural phrasing, and real-time responsiveness.
- Text: Supports longer, more detailed replies with links, formatting, and structured content.
- Channel transitions: Customers can move between channels (e.g. SMS to voice) without losing context.
Technical considerations
- Real-time channels require consistent low latency and handle interruptions gracefully, while async channels can support longer processing times and more complex responses.
- Voice-specific features like DTMF handling and call transfers are only available during voice conversations, while text-specific features like rich formatting and media sharing apply to text-based channels.
Getting started
Understanding these fundamentals helps you design agents that work effectively across different conversation types and business scenarios.
- For simple use cases: Start with basic prompting and single-channel deployment to understand the core concepts.
- For complex scenarios: Leverage multi-channel capabilities, memory management, and sophisticated tool integration as your use case evolves.
Learn about agent prompting to start building your agent’s conversation logic, or explore channels to understand deployment options.