Donna
How to Build an Event-Driven Voice-to-Voice Architecture for Low-Latency AI Agents

Written by Xander Berk
One of the challenges in building Donna, our voice agent for field sales, is eliminating awkward silences while maintaining intelligence. When a user asks "What's the status of the opportunity for Acme Corp?", a simple voice agent goes silent for seconds while it queries the CRM, validates the entity, and generates a response. That delay destroys conversational naturalness.
Building real-time AI voice agents requires an architecture that can maintain conversational fluidity while still executing heavy reasoning, validation, and context-loading operations. In Donna we architected the system around an event-driven, fully parallelized pipeline that separates speech-to-text (STT), text-to-text (LLM reasoning), and text-to-speech (TTS). This decoupling eliminates up to seconds of awkward silence and enables both low latency and high reliability. The reasoning behind using a composed pipeline instead of a voice-to-voice model, such as the Openai Realtime API will be tackled in a future post.
Pipeline Overview
The Donna pipeline consists of three core components:
- Speech-to-Text (STT) → rapid transcription of the user's incoming audio (listening)
- Text-to-Text (LLM) → Donna's reasoning, state-updates, and response generation (thinking)
- Text-to-Speech (TTS) → real-time audio synthesis back to the user (speaking)

Each stage is event-driven via LiveKit, allowing all processing blocks to run concurrently rather than sequentially.
Why Event-Driven Parallelization Matters
Traditional voice agents run STT → LLM → TTS as a strictly linear, synchronous pipeline. Each component blocks the next: the LLM waits for STT to finish, TTS waits for the LLM to complete, and any additional operation either extends the response time or gets skipped entirely. This introduces two critical problems:
- Latency: the user must wait for the entire chain to complete before receiving audio, resulting in seconds of silence during complex operations.
- Fragility: STT misrecognitions or missing context propagate directly into the LLM call, degrading accuracy with no opportunity for correction.
By contrast, our event-driven architecture treats each STT result as an event that can trigger multiple independent workflows simultaneously. These workflows can be leveraged for specific conversational features to increase the intelligence and relevance of Donna while maintaining fluidity. Let’s dive deeper into some concrete improvements that we implemented!
Parallel Correction Layer for More Accurate Transcripts
As soon as STT emits a transcription event, we immediately execute multiple parallel workflows:
Main path: Keep the conversation flowing. The raw transcript is submitted directly to the LLM, which begins reasoning and generating a response. This keeps latency minimal and Donna starts "thinking" and responding using the earliest possible interpretation, without waiting for any validation or enrichment to complete.
Side paths: Enrich and validate in parallel. While the main conversation loop continues, independent workflows execute asynchronously:
Transcript correction and enrichment: A semantic-validation pipeline corrects the transcript by:
- Matching ambiguous names to known entities (entity-resolution)
- Repairing product names, organizations, quantities, and domain-specific terminology
- Fixing punctuation and utterance boundaries (where one thought ends and another begins)
- Replacing low-confidence STT fragments with higher-confidence deterministic matches from structured domain data (CRM records, product catalogs)
CRM operations: Data retrieval (fetching status, customer history) and writes (updating records based on conversation outcomes) happen in the background without blocking the response.
This architecture optimizes the speed-accuracy tradeoff: the main path keeps the conversation natural and responsive, while side paths progressively improve transcript quality and update systems without the user ever noticing the work happening behind the scenes.
The result is a system that achieves production-grade transcript accuracy from real-time STT models, behavior that would be impossible with either path alone and makes Donna feel smarter over time.
Why This Architecture Enables the Future of Voice AI
The fundamental constraint in building intelligent voice agents is simple: every additional capability you want to add (deeper reasoning, compliance validation, quality monitoring, context retrieval) requires processing time. In a synchronous architecture, each new capability directly adds latency. You're forced to choose between a fast but shallow agent, or a smart but frustratingly slow one.
Event-driven parallelization breaks this constraint. The main conversational path stays constant regardless of how many side paths you enable. Want to add real-time compliance monitoring? Spin up a new side path. Need sentiment analysis to adjust tone? Another side path. Multi-turn reasoning that considers conversation history? Side path. Each new capability runs independently, never blocking the user-facing response.
This architectural pattern is what will enable voice AI systems to become genuinely sophisticated. As models improve and domain requirements grow, the ability to compose multiple specialized processes without sacrificing conversational naturalness becomes essential. The gap between "responds quickly" and "responds intelligently" only closes when you can do both simultaneously.
This isn't theoretical, it's the architectural reality that determines which voice agents can scale in complexity and which hit a ceiling. Systems built on synchronous pipelines will always face the intelligence-latency tradeoff. Systems built on event-driven parallelization can continuously add intelligence without compromising the user experience.
Building for Scale, Consistency, and the Future
When we designed Donna's architecture, we made a deliberate choice: optimize for modularity and extensibility, not just immediate performance. The event-driven, asynchronous design enables us to build a voice agent that can grow in intelligence without sacrificing conversational naturalness.
The voice AI landscape is evolving rapidly, with different teams taking different approaches. We chose event-driven parallelization because it aligns with our goals for building a scalable, consistent, and modular system. This architecture gives us the flexibility to:
Add specialist capabilities incrementally: compliance monitoring, CRM enrichment, objection handling, or product lookup can be added as independent modules without refactoring the core pipeline
Run powerful retrieval pipelines without blocking: vector search over large document stores, predictive "look-ahead" planning, and context enrichment happen in parallel with conversation
Deploy domain-expert validators: multiple specialized evaluators can audit outputs for correctness, compliance, or completeness in real-time
Adapt processing dynamically: different conversation contexts can trigger different combinations of background processes
This isn't speculative, it's how Donna operates today. The parallel correction layer, the CRM enrichment, the multiple evaluators, all of these are independent, event-driven processes that we can enable, disable, or modify without touching the core conversation loop.
As we scale Donna across different industries and use cases, this architectural flexibility becomes increasingly valuable. The ability to compose specialized agents and evaluators without adding latency is what allows us to maintain conversational naturalness while continuously expanding Donna's capabilities.
Conclusion
By structuring Donna as an event-driven STT → LLM → TTS pipeline, augmented with parallel transcript validation and asynchronous context loaders, we built a system that is both:
- low-latency: immediate response generation with no awkward silences
- high-accuracy: production-grade transcript correction and domain-specific enrichment running in parallel
This architectural approach enables real-time, production-grade voice-to-voice AI systems that can operate reliably in complex field-sales environments. More importantly, it gives us the flexibility to continuously expand Donna's intelligence by adding new reasoning capabilities, validators, and domain expertise,... without ever compromising conversational naturalness.
Related articles
Donna just announced the closing of a $4.8 M seed round, led by Frontline Ventures, with participation from existing funds: Pitchdrive, Fortino Capital, WeLoveFounders, Syndicate One, New School, Everywhere Ventures, and a group of seasoned angel investors. The funding will be used to deepen Donna’s product leadership and scale go-to-market operations across Europe and the United States.
Donna

Donna, the proactive AI assistant for field sales reps, has partnered with US-based Plaud, maker of AI-powered wearable devices, to revolutionize how sales teams capture and use customer interactions. Announced at Dreamforce in San Francisco, the integration allows Donna to automatically record, summarize, and update CRM data directly from Plaud’s smart devices—cutting admin time by up to 75% and boosting close rates by 20%. Together, the two companies bring hands-free productivity to sales professionals everywhere, turning every conversation—from the car to the boardroom—into actionable insights.
Donna

At Donna, we are transforming how sales professionals work by automating CRM data entry and allowing them to focus on selling. Our innovative AI assistant recently secured significant funding, underscoring our potential to revolutionize the sales industry. Donna was featured in 30+ press articles around the globe, showing that sales people are eager to work with Donna soon.
Donna
Talk to Donna today
Discover how Donna can simplify life for field sales teams

FAQs
Got questions? Donna got answers. Here’s what field sales teams ask most.
What is Donna and how does she help field sales teams?
Donna is a proactive AI assistant for field sales reps delivering hyper-personalized briefings, capturing every detail, and killing the admin time. She helps sales reps save time by preparing meetings, taking notes, updating the CRM, and drafting follow-ups automatically. With Donna, sales teams spend less time on admin and more time selling. Faster execution, stronger CRM adoption, and more wins, without longer hours. Happier, sharper teams start today with Donna.
Does Donna take notes during meetings automatically?
Yes. Donna listens, online and in person, to your meetings or calls, captures key points, and structures them into clean notes. Everything is stored and ready for review, so you can stay focused on the customer instead of typing. If you are not comfortable having a notetaker in your meeting, you can always update Donna afterwards.
Can Donna really update my CRM for me?
Absolutely. Donna automatically updates or creates contacts, opportunities, prepares quotes in your CRM and drafts follow-up mails. All data stays accurate and up to date without manual entry.
What tools and CRMs does Donna integrate with?
Donna integrates with Salesforce, SAP, Microsoft Dynamics 365, Outlook, Google Calendar, and more. Even if your CRM includes custom objects and fields, Donna connects seamlessly to keep everything in sync. Find all integrations here.
How much time can sales reps save by using Donna?
Sales teams typically spend less time on admin by 75%. By automating meeting prep, note-taking, and CRM updates, Donna helps reps reclaim time to focus on customers and close more deals.
Is Donna secure and GDPR-compliant?
Yes. Donna is ISO 27001 certified and fully compliant with GDPR, CCPA, and SOC 2. All data is encrypted in transit and never used to train AI models.
Our CRM is customized. Can Donna handle this?
Yes. Donna works with both standard and custom CRM objects and fields. Whether your setup is simple or highly customized, Donna connects seamlessly and keeps all your data accurate and up to date.
How is Donna different from other AI sales tools?
Donna’s purpose is built for field sales. Unlike generic AI assistants, Donna connects with your CRM, captures meeting notes, and updates contacts and opportunities automatically, even on the go. As Donna is deeply integrated into the day-to-day of field sales teams, she delivers a proactive, voice to voice and hyper-personalized experience.
How long does it take to set up Donna?
Donna connects to your CRM and calendar quickly, with most teams fully onboarded in less than two weeks. Setup requires less than a month, and our team supports every step of the process.
