Give Your Characters a Voice — Literally
Assign unique voices to your characters and let Paintbrush automatically read their dialog. Write narration and dialogue in your scenes and every character speaks in their own voice.
AI video tools can generate stunning visuals, but they've always been silent films. You'd generate your scenes, export them, and then spend as long on audio as you did on video — recording voiceover, syncing timing, juggling text-to-speech tools in separate tabs. The gap between visuals and audio has been one of the biggest friction points in the entire workflow.
That changes today. Paintbrush now lets you assign a unique voice to each character and write dialog directly into your scenes. When you generate, every character speaks in their own voice — automatically. No external TTS tools, no manual syncing, no post-production audio editing.
How it works
The system has two parts: voice assignment and scene dialog.
When you edit a character, you'll see a new Voice dropdown. Choose from over 50 professionally recorded voices spanning different genders, ages, accents, and styles — from a deep, calm narrator to a young British storyteller to a raspy old-timer. Each voice has a preview so you can hear exactly what it sounds like before committing.
Then, in any scene, write narration or dialog in the audio text field. Paintbrush generates the speech using ElevenLabs' latest voice synthesis and automatically matches the video duration to the audio. Your scene's visuals and voiceover are perfectly in sync from the start.
Voices that match your characters
The voice library is organized by the attributes that matter most when casting: gender, age, accent, and tone. Looking for an American woman with a warm, conversational delivery? Filter by those traits. Need a British man with gravitas for your villain? He's in there.
Every voice comes with a description — "deep", "calm", "raspy", "warm", "authoritative" — so you can quickly scan for the right fit without listening to all 50+ options. And the preview button lets you hear a sample instantly, right in the character editor.
Once a voice is assigned, it sticks. Every scene where that character has dialog uses the same voice automatically. Recast a character's voice at any point and future generations pick up the change.
Write dialog, not prompts
Each scene has an audio text field where you write what should be spoken during that scene. This can be straight narration, character dialog, or a mix of both. Write naturally — the way you'd write a script.
The audio text is separate from the scene's visual description. Your visual prompt tells the AI what to show; the audio text tells it what to say. This separation means you can describe complex visual action in the prompt while keeping the spoken dialog simple and natural, or vice versa.
Automatic duration matching
One of the most tedious parts of video production is matching audio to video length. Paintbrush handles this automatically. When you generate a scene with audio text, the system first generates the speech, measures its duration, and then generates video to match. A 3-second line of dialog produces a 3-second clip. A 8-second narration produces an 8-second clip.
If the audio runs longer than 10 seconds, Paintbrush automatically splits the scene into continuation segments. Each segment gets its own video generation, and scene chaining ensures visual continuity across the split. You write one block of narration and the system handles the segmentation — no manual splitting required.
Works with Import Story
This is where things get powerful. When you use Import Story to paste in a Reddit post, book passage, or script, the AI already extracts narration text for each scene. Now that characters have voices, those narration lines are ready to be spoken the moment you generate.
The workflow becomes: paste a story, review the breakdown, assign voices to the extracted characters, and generate. You go from a wall of text to a fully voiced, visually consistent animated video in minutes. The narration that used to require a separate recording session is now part of the generation pipeline.
Choosing the right voice
A few tips for getting the most out of character voices:
- Contrast your cast. If you have two male characters, give them noticeably different voices — one deep and measured, one younger and energetic. Distinct voices help viewers follow dialog without visual cues
- Match tone to genre. A horror story benefits from a calm, understated narrator. Comedy works better with expressive, dynamic voices. The voice sets emotional expectations before the visuals even register
- Preview in context. A voice that sounds great in isolation might not fit your character. Listen to the preview while looking at your character's reference sheet — your instinct for the match is usually right
- Keep narration concise. Short, punchy lines produce tighter scenes. If a narration line runs long, consider splitting it into two scenes for better pacing
What this unlocks
Character voices turn Paintbrush from a visual generation tool into a complete production pipeline. The use cases that open up are significant:
- YouTube story channels — Reddit stories, creepypasta, and drama channels can go from text to fully voiced animated video without any external tools
- Children's content — Storybook narration with distinct character voices, perfectly synced to animated scenes
- Explainer videos — A narrator walks viewers through concepts while a mascot character appears on screen, both with consistent voices
- Social media series — Episodic content where characters speak in the same voice across every episode, building audience familiarity
- Prototyping and pitches — Quickly produce voiced storyboards for pitch meetings or concept validation
Audio was the last missing piece in the AI video production chain. Now it's built in.