Complete Guide

How to Create an Audiobook with AI

From manuscript to published audiobook in one afternoon. This guide walks you through every step of creating a professional, multi-voice audiobook using Notevibes — with UI screenshots, tips, and everything you need to publish on Audible, Apple Books, and Spotify.

12 min read 10 steps 550+ AI voices 57 languages

Creating an audiobook used to mean booking a recording studio, hiring voice actors, scheduling engineers, and spending weeks (and thousands of dollars) in post-production. A single finished hour of audiobook narration costs between $5,000 and $15,000 in a traditional studio — and that's for a single voice, in a single language.

AI has changed this entirely. With Notevibes, you can upload your manuscript, let AI detect every speaking character in your book, assign each one a unique voice, generate professional-quality narration for every chapter, and publish your finished audiobook — all in a single sitting.

This guide covers every step in detail, with mockups of the actual Notevibes interface so you know exactly what to expect. Whether you're a self-published author, a publisher looking to expand your catalog, or a content creator exploring audio, this is the complete playbook.

What You'll Need

Your book file

EPUB, Kindle (.mobi/.azw3), PDF, DOCX, or plain text

A Notevibes account

Free to sign up, plans start at $19/month

A few minutes

Upload to finished audiobook in one afternoon

That’s it

No studio, no actors, no audio engineering skills needed

1

Upload Your Manuscript

Start by navigating to the Audiobook Narration page and clicking “Create Your Audiobook.” You'll see a drag-and-drop upload area that accepts five book formats:

Notevibes — Upload Your Book

Drop your book file here

or click to browse

EPUB

.epub

Kindle

.mobi / .azw3

PDF

.pdf

Word

.docx

Plain Text

.txt

EPUB — The gold standard. Notevibes reads the EPUB table of contents, detects chapters automatically, and preserves the full document structure. If your book has chapter headings, they'll be extracted perfectly.

Kindle (.mobi / .azw3) — Direct upload with chapters preserved. Book metadata (title, author, publisher) is extracted automatically.

PDF — AI-powered text extraction with layout detection. Handles multi-column layouts, detects and skips page numbers, headers, and footers. Converts tables to spoken descriptions.

Word (.docx) — Heading styles (H1, H2, H3) are automatically converted to chapter boundaries.

Plain Text (.txt) — Use custom chapter markers (like --- or Chapter N) for flexible delimiter support.

Tip: EPUB gives the best results for chapter detection. If you have your book in multiple formats, EPUB is the way to go. Most ebook creation tools (Calibre, Vellum, Scrivener) can export to EPUB.
2

The Audiobook Workspace

Once your book is uploaded and parsed, you'll land in the audiobook workspace. This is your command center for the entire production process. On the left, you'll see a chapter sidebar listing every chapter detected from your book:

Chapters5 chapters

Chapter 1 — The Beginning

3,420 words

Chapter 2 — Old Friends

4,180 words

Chapter 3 — The Letter

2,870 words

Chapter 4 — Into the Garden

5,210 words

Chapter 5 — The Crossing

3,950 words

Each chapter shows:

  • Chapter title (editable — click to rename)
  • Word count for estimating narration length
  • Status indicator: green (audio generated), orange (content changed since last generation), gray (not yet generated)

You can reorder, duplicate, delete, and create new chapters directly from the sidebar. The main content area shows a rich text editor for the selected chapter — powered by Tiptap — where you can edit your text, format with bold/italic/underline, and add bullet lists.

At the top of the workspace, you'll find the toolbar with the main voice selector, generate button, download options, and the characters panel toggle:

Audiobook Workspace Toolbar
Aoede

Aoede

en-US · Female

The toolbar gives you quick access to:

  • Main voice selector — sets the default narrator voice for the entire book. Click to change it. This voice is used for all narration paragraphs unless overridden by character detection.
  • Characters button — opens the character detection panel (covered in Step 3). The badge shows how many characters have been detected.
  • Credits balance — your remaining generation credits.
  • Generate button — click to generate audio. A dropdown lets you choose between generating the current chapter or the full book.
  • Download button — export options including per-chapter MP3, merged full book, and ZIP archive.
Tip: If the AI misidentified chapter boundaries (especially common with PDFs), you can manually split or merge chapters in the sidebar. The content editor supports full text editing, so you can fix any extraction issues before generating audio.
3

Detect Characters with AI

This is where the magic happens. Click the “Detect Characters” button, and Notevibes AI scans your entire manuscript to identify every speaking character. The AI doesn't just find names — it understands narrative roles, counts dialogue lines, maps every paragraph to the right speaker, and automatically detects your book's genre to shape the narration style. No manual preset selection needed — the AI figures out whether your book is a thriller, romance, fantasy, or memoir and adapts the delivery accordingly.

Characters — The Midnight Garden
Detected Characters5
Aoede
Narratornarrator
Voice: Aoede142 lines
Kore
Elenaprotagonist
Voice: Kore87 lines
Orus
Marcusantagonist
Voice: Orus54 lines
Puck
Old Keepermentor
Voice: Puck31 lines
Leda
Rinnsidekick
Voice: Leda23 lines

For each detected character, the AI determines:

  • Name and any aliases used in the text
  • Role classification: narrator, protagonist, antagonist, mentor, sidekick, supporting, or minor
  • Gender (for voice matching)
  • Dialogue line count and chapter appearances
  • Narrative arc and character evolution across the book
  • A suggested AI voice based on the character’s attributes
  • Per-paragraph delivery directions (“whispered, trembling”, “cold, controlled”) generated from the scene context

The characters panel is fully editable. Rename characters, change their roles, swap voices, or remove false detections. When you're happy with the lineup, click “Apply to All Chapters” to propagate the voice assignments across your entire book.

The AI also auto-detects your book's genre (fiction, thriller, romance, fantasy, sci-fi, memoir, etc.) and uses it to shape character portraits, scene illustrations, and the overall visual theme if you later publish with Storybook Mode. There's no manual genre or narration style selector to worry about — the AI reads your text and adapts automatically.

Tip: Adaptive Delivery is enabled automatically when character detection is active. This generates scene-aware delivery directions for each paragraph — like “whispered, trembling” or “cold, controlled” — so the voice adjusts its emotional tone to match the narrative context. For non-fiction books, you can skip character detection entirely and use a single narrator voice.
4

Assign Unique Voices

Each character needs a distinct voice. The AI suggests voices automatically based on gender, role, and personality, but you can override any assignment. Click on a character's voice to open the voice selector:

Select Voice
Aoede

Aoede

Female · en-US

Kore

Kore

Female · en-US

Leda

Leda

Female · en-US

Orus

Orus

Male · en-US

Puck

Puck

Male · en-US

Charon

Charon

Male · en-US

550+ voices across 57 languages

The voice selector offers 550+ AI voices organized by:

  • Gender: Male, Female, or All
  • Type: All, Starred (your favorites), Recent (last 5 used), Natural (Chirp3-HD), Standard
  • Language: 57 languages including regional accents (American, British, Australian, Indian, Scottish, Nigerian, and more)
  • Preview: Click the play button on any voice card to hear a sample before selecting

The latest Chirp3-HD voices are the most natural-sounding, with 30 unique voice profiles named after astronomical objects: Aoede, Kore, Orus, Puck, Leda, Charon, and more. Each one supports all 57 languages natively — meaning the same voice can narrate your book in English, French, Japanese, or Arabic with authentic accents.

Tip: Use the star button to favorite voices you like. Your starred voices appear at the top of the selector, making it fast to assign consistent voices across projects. For audiobooks with many characters, try to pick voices with distinctly different timbres — a deep male voice and a high female voice are easier to distinguish than two similar-sounding voices.
5

Per-Paragraph Voice Control

After character detection, every paragraph in your book gets assigned to a specific voice. The paragraph gutter — the left-side overlay in the editor — gives you a visual map of who's speaking where:

Chapter 3 \u2014 The Letter
NarratorNarrator

The rain hammered against the window. She knew, even before opening the letter, that everything was about to change.

NarratorNarrator

Reflective, slow — building tension

The envelope was thin — just a single sheet folded in thirds. No return address. Her name in handwriting she hadn’t seen in seventeen years.

ElenaElena

Whispered, trembling

“Wait,” she whispered, pressing the paper flat on the table. “This can’t be right. Unless... unless he knew all along.”

MarcusMarcus

Cold, controlled

“You weren’t supposed to find that.” His voice came from the doorway, flat and measured. “Not yet.”

The paragraph gutter shows:

  • Voice avatar: A visual indicator of which character is speaking this paragraph. Click to change the voice.
  • Speaker name: The character name displayed below the avatar.
  • Sync status: Green dot = audio matches current text. Orange dot = text has changed since audio was generated.
  • Play button: Click to hear this specific paragraph. The currently playing paragraph is highlighted.
  • Scene direction: When Adaptive Delivery is enabled, each paragraph shows an italicized delivery cue (e.g., “Whispered, trembling” or “Cold, controlled”).
Tip: If the AI assigned the wrong speaker to a paragraph (common in scenes with multiple characters talking), just click the voice avatar in the gutter and select the correct character. The change applies immediately and will be used in the next generation.
6

Generate Audio

Click the “Generate” button in the toolbar to start producing your audiobook. A dropdown gives you two options:

Generate Options

Generate Chapter

Generate audio for the current chapter only. Great for testing voice assignments before committing to the whole book.

Generate Full Book

Generate all chapters in parallel. This is the fastest way to produce your complete audiobook.

Character Voices

Detect characters and assign unique voices

Adaptive Delivery

Scene-aware vocal directions per paragraph

When Character Voices is enabled, clicking generate will first run AI character detection (if characters haven't been detected yet), then generate audio with per-character voice assignments. When Adaptive Delivery is on, the AI computes scene-aware delivery directions for each paragraph before synthesis.

After clicking generate, the progress dialog opens and shows real-time status for each chapter:

Generating Audiobook...

The Midnight Garden

2 of 5 chapters complete

Credits used

12,480

Chapter 1 — The BeginningDone
Chapter 2 — Old FriendsDone
Chapter 3 — The LetterGenerating...
Chapter 4 — Into the Garden
Chapter 5 — The Crossing

Behind the scenes, Notevibes processes each chapter through a sophisticated pipeline: paragraphs are grouped into voice segments, chunked for the TTS engine, and synthesized with per-character voices and delivery directions. For dialog-heavy scenes, the engine batches consecutive short dialogue segments from two speakers into a single multi-speaker request — producing more natural conversational flow with fewer audio stitching artifacts.

Tip: Generate a single chapter first to verify you're happy with the voice assignments and narration style. Once everything sounds right, generate the full book. This saves credits and avoids re-generating chapters you're not happy with.
7

Preview & Edit

After generation completes, every paragraph becomes playable. Click the play button in the paragraph gutter to hear individual paragraphs, or use the audio player bar at the bottom to play through the entire chapter.

The editing workflow is non-destructive:

  • Edit text freely — the orange sync indicator tells you which paragraphs need re-generation
  • Change a character’s voice — only the paragraphs using that voice are marked stale
  • Adjust delivery directions — toggle Adaptive Delivery on/off or edit individual directions
  • Regenerate selectively — you can regenerate a single chapter without touching the rest
  • The generated audio for unchanged chapters is preserved and cached
Tip: Pay attention to chapter transitions. Listen to the last paragraph of one chapter and the first paragraph of the next to make sure the pacing feels natural. You can adjust paragraph pause duration in the TTS settings (default: 600ms between paragraphs, 400ms at periods, 200ms at commas).
8

Generate Illustrations

Once your audio is generated, you can bring your audiobook to life visually. Notevibes generates two types of AI illustrations: character portraits and scene illustrations. These power Storybook Mode when you publish your book.

Character Portraits
Character Portraits3 of 5 ready
Narrator

Narrator

Aoede

Elena

Elena

Kore

Marcus

Marcus

Orus

Old Keeper

Puck

Rinn

Leda

Character portraits are generated using your book's character descriptions, genre, and role classifications. The AI applies genre-specific visual styles:

Fantasy

Painterly illustration, golden and amber lighting, detailed textures

Sci-Fi

Sharp digital art, cool blue and neon accents, futuristic rim lighting

Romance

Soft focus, warm golden hour lighting, delicate and emotional

Thriller

Noir-influenced, high contrast, chiaroscuro lighting, moody shadows

Horror

Dark, desaturated palette with eerie undertones and dramatic shadows

Children's

Bright, colorful illustration with soft rounded shapes and warmth

Scene Illustrations \u2014 Chapter 7
Scene Illustrations

The forest clearing

1–3

Through the archway

4–7

Generating...

The stone fountain

8–12

Scene illustrations are generated at key narrative moments in each chapter. The AI uses the paragraph text, delivery directions, and character information to create contextual illustrations. These appear on the left page in Storybook Mode's Illustrated layout, creating a book-spread experience where listeners see the scene while hearing the narration.

Tip: Portrait and scene generation can run in batch mode (cheaper, asynchronous) or fast mode (instant, real-time). If you're not in a rush, batch mode processes everything in the background and notifies you when complete.
9

Download Your Audiobook

Once you're happy with the narration, click the download button. Three export options are available:

Download Your Audiobook

Download Chapter MP3

Current chapter as a single MP3 file

Full Book (Merged MP3)

Popular

All chapters combined into one file

All Chapters (ZIP)

Recommended

Individual MP3 per chapter — Chapter_01.mp3, Chapter_02.mp3...

Audio Quality (auto-applied)

Format

192 kbps CBR MP3

Sample Rate

44.1 kHz

Loudness

-20 dB RMS (broadcast standard)

Peak Level

≤ -3 dB

Every export is broadcast-quality. Notevibes handles volume normalization, peak limiting, and format encoding automatically — no audio engineering tools or post-processing needed. The files are ready to upload directly to Audible, Google Play Books, Apple Books, or any other audiobook distributor.

Tip: If you're distributing through Audible, use the All Chapters (ZIP) download. Audible requires separate files per chapter. The ZIP export names files as Chapter_01.mp3, Chapter_02.mp3, etc. — ready to upload.
10

Publish Your Audiobook

You have two publishing paths: publish directly on the Notevibes Library for instant availability with rich features, or export and distribute to external platforms.

Publish on Notevibes Library

When you generate a book, it's automatically added to your library as a private draft. To publish publicly:

  1. 1.Open the book in your Library and fill in metadata: author, description, genre, language, tags
  2. 2.Optionally generate AI character portraits and scene illustrations
  3. 3.Enable Read Along (word-level sync) for interactive reading
  4. 4.Click “Publish” to make the book available on the Notevibes Library
  5. 5.Share the public URL with your audience

Published books on Notevibes come with Storybook Mode, Read Along, character portraits, scene illustrations, and listener analytics included — features you won't find on any other audiobook platform.

Export to External Platforms

Download your audio files and distribute to any major platform:

Audible

The world’s largest audiobook marketplace

Google Play Books

Reach Android and Google Home users

Apple Books

iPhone, iPad, and Mac listeners

Spotify

Audiobooks on the streaming platform

Kobo

190+ countries worldwide

Findaway Voices

Wide distribution to 40+ retailers

Full Commercial Rights

All paid Notevibes plans include full commercial rights to generated audio. No royalty splits to Notevibes, no per-sale fees, no platform lock-in. Your audiobook, your revenue. Publish and sell on any platform worldwide.

Sell Your Audiobook

Coming Soon

Direct sales powered by Stripe

We're building direct audiobook sales into Notevibes. Set your own price, accept payments through Stripe, and keep your revenue. No middleman, no revenue share to Notevibes. We handle hosting, delivery, and the listening experience — you keep the earnings.

Set your price

You decide what your audiobook is worth

Stripe checkout

Trusted, global payment processing

Instant payouts

Revenue goes directly to your Stripe account

No revenue share

Notevibes takes $0 from your sales

Your audiobook + Storybook Mode + Read Along + Analytics

Buyers get the full Notevibes listening experience. You get a storefront without building one.

Start Creating

What Your Listeners Experience

Publishing on the Notevibes Library gives your audience three ways to experience your audiobook — features you won't find on any other platform. These are powered by the character portraits, scene illustrations, and audio sync you generated in the previous steps.

Storybook Mode

Transforms your audiobook into a visual reading experience. Listeners see beautifully typeset pages with the narration synced to the text — a hybrid of audiobook and illustrated storybook.

The Midnight Garden
Chapter 7 \u2014 The Crossing
The Midnight GardenChapter 7
ElenaElenaprotagonist

The forest was silent save for the distant call of a nightingale. Elena pressed her palm against the ancient oak, feeling the bark rough beneath her fingers.

“We shouldn't have come here,” Marcus whispered from behind the thornwall. “Not tonight.”

She didn't answer. Instead, she stepped forward, through the archway of twisted branches, into the moonlit clearing.

— 127 —
4:12
11:05

Genre-specific themes

Fantasy gets parchment and gold. Romance gets soft pinks. Thrillers get noir contrast. 9 themes auto-applied from detected genre.

Three layout modes

Illustrated (scene + text spread), Classic (portrait badges inline), Cinematic (visual novel style).

Page-turn animations

Slide, flip, or none. Drop caps, ornamental dividers, and chapter title pages.

Character portraits & scenes

Your generated portraits float beside character dialogue. Scene illustrations fill the left page.

Read Along

Word-level highlight sync that follows the narration in real time. Each word lights up as it's spoken, making it easy to follow along.

Chapter 3 \u2014 The Letter
SyncedWord highlight

The rain hammered against the window pane. She knew, even before opening the letter, that everything was about to change.

The envelope was thin — just a single sheet folded in thirds. No return address. Her name in handwriting she hadn't seen in seventeen years.

“Wait,” she whispered, pressing the paper flat on the table. “This can't be right.”

0:42
5:32

3-level highlighting

Active word, surrounding sentence, and current paragraph — each with its own highlight intensity.

Click to jump

Click any word to jump the narration there. Auto-advance between chapters.

Accessibility

Valuable for language learners, readers with dyslexia, and multi-modal reading.

Listener Analytics

Track how listeners engage with every chapter of your audiobook with the built-in analytics dashboard.

12,847

Total Plays

3,291

Unique Listeners

68%

Completion Rate

Plus geographic insights (listener locations by country), device & behavior data (mobile vs. desktop, session length, chapter drop-off points), and daily aggregation for trend analysis.

Traditional Studio vs. AI Narration

Traditional Studio

×

$5,000 – $15,000+ per finished hour

×

2–6 weeks turnaround

×

Schedule actors, engineers, studio time

×

Re-records cost extra per session

×

One language per production

Notevibes AI

Recommended

Starting at $19/month

One afternoon — upload to finished audiobook

Upload your book and click generate

Unlimited revisions included

57 languages from the same manuscript

A 10-hour audiobook produced in a traditional studio would cost between $50,000 and $150,000, take weeks of studio time, and be locked to a single language. With Notevibes, the same audiobook can be created in an afternoon, revised unlimited times, and published in 57 languages — all from a single manuscript.

Frequently Asked Questions

How long does it take to create an audiobook?

Most books go from upload to finished audiobook in a single afternoon. Upload takes seconds, character detection runs in under a minute, and audio generation processes all chapters in parallel. A 10-chapter novel typically generates in 5–15 minutes.

Can AI really detect characters in my novel?

Yes. The AI scans your entire manuscript, identifies every speaking character by name, assigns narrative roles (protagonist, antagonist, narrator, mentor, sidekick, supporting, minor), counts dialogue lines, tracks chapter appearances, and even detects narrative arc and character evolution. You can review and override every assignment.

What if I want to change a voice after generating?

Fully supported. Change any character’s voice in the characters panel, and the affected paragraphs are marked stale (orange indicator). Regenerate only the changed chapters — the rest of your audiobook is preserved.

Can I use this for non-fiction books?

Absolutely. Use the Non-Fiction or Narrator preset for clear, authoritative delivery. Character detection is optional — for non-fiction, you can use a single narrator voice for the entire book. The AI handles tables, citations, and multi-column layouts.

Do I need audio editing skills?

No. Notevibes handles all audio engineering automatically: volume normalization, peak limiting, and noise floor management. The exported files are broadcast-quality and ready to upload to any platform with zero post-processing.

Can I publish in multiple languages?

Yes. All 550+ voices support 57 languages natively. Upload your translated manuscript and generate with the same (or different) voices. The voices produce authentic accents in every supported language.

What’s the difference between Storybook Mode and Read Along?

Read Along adds word-level highlight sync to the text display. Storybook Mode is a full visual experience with illustrated pages, genre themes, character portraits, scene illustrations, drop caps, and page-turn animations. Storybook Mode includes Read Along’s highlighting capabilities.

Do I own the rights to my AI-generated audiobook?

Yes. All paid plans include full commercial rights. No royalty splits to Notevibes, no per-sale fees, no platform lock-in. Publish and sell anywhere.

Ready to Create Your Audiobook?

Upload your manuscript, let AI do the heavy lifting, and publish a professional audiobook — all in one sitting. No studio, no actors, no audio engineering.

Create Your Audiobook

550+ voices · 57 languages · Full commercial rights · Broadcast-quality audio