Digital Narration with ElevenLabs: What I Wish I’d Known

The main lesson I learned—it’s hard; way harder than you think.

Like so many other digital narration tools, ElevenLabs is constantly evolving, and experience will vary significantly based on your project and what you want to achieve.

This tutorial was written in May 2026, after I completed the production of the audiobook for The Fragments, The Spheres Book One. The book is about 143k words long, structured into parts and chapters, and narrated from two POV characters plus a camera POV. Each chapter has an epigraph—an entry in a fictional dictionary. The genre is a cross between dystopian, science fiction, and fantasy.

My aim was an audiobook with a calm, almost understated narration using one voice. I wanted to get it as close to a human-narrated book as I could. The main lesson I learned—it’s hard; way harder than you think.

New here? I am New Zealand-based writer, artist and maker Minu Freitag. If you like this post, why not sign-up to my newsletter to keep in the loop!

Why ElevenLabs?

There are a handful of AI narration tools out there. I landed on ElevenLabs for a few reasons.

  • First, it’s established. ElevenLabs has been around long enough to have an active user community.
  • Second, ElevenLabs studio and the audiobook feature. Rather than treating a book as just another long-form text input, ElevenLabs lets you structure your project by chapter, and export in the formats you actually need.
  • Third, pronunciation dictionaries. The ability to define how a word should sound—and apply it across an entire project—was a significant factor. (See Section 4 for how well this actually works.)
  • Fourth, the ElevenReader platform. ElevenLabs has its own app that lets you publish your audiobook directly to listeners and keep 60% of royalties. It’s not a replacement for broader distribution, but it’s a built-in so you might as well use it.

A word of caution: ElevenLabs isn’t cheap, and the credit system can bite you if you’re not careful. You can up and downgrade your subscription—I am on a Creator Plan during audiobook productions, otherwise the Starter Plan in fine for my needs.

1—Set it up right from the start

You can upload your manuscript, and ElevenLabs will break it down into chapters automatically. I decided to manually create the chapters and broke it down further, using one ElevenLabs chapter (EL-Chapter) for each scene in my book. Breaking the project down this way means you can re-touch individual scenes without touching the rest of the project—a small decision that saves a lot of pain later. Take the time to name each EL-Chapter. I start with a shortcode for the book title, chapter, scene, and the first five words—for example: TF-CH01-S01-The-forest-clung-to-its-fragment. This makes it easier to identify chapters if you want to edit your audiobook externally, especially if you need to make changes later.

I am working on a tutorial on audiobooks in general, including the technical set-up and file breakdowns that will provide more context. I’ll link it here when it’s published.

2—Create your own voice

ElevenLabs voices can expire. Create your own to be sure you have access to it over time.

Voice creation is easy. Write a prompt describing how you want to use the voice: age, gender, accent, and cadence. Play around with the settings and test the voice on all POVs, descriptions, and dialogues. Experiment with the settings until you are sure you have nailed it.

3—Consider creating different cadences for different POVs

I tried using different voices for different POVs, but nothing sounded right, so I decided to use the same voice but create “variants.” One sounds older and more wary, one younger, one has a slight female inflection.

4—Create pronunciation dictionaries

Pronunciation dictionaries help you to fine-tune fictional names and world-building terms.

A pronunciation dictionary is a list of words—usually invented names, places, or world-building terms—paired with a phonetic spelling that tells ElevenLabs how to say them. You create it once and apply it at the project level, so it runs across every chapter automatically.
To build one, go to your project settings and look for the Pronunciation Dictionary option. You’ll enter the word as it appears in your manuscript, then the phonetic equivalent—either using standard spelling (write it the way it sounds) or IPA notation if you’re comfortable with that. For example, if your character’s name is Niuen and ElevenLabs keeps mispronouncing it, you’d enter something like Nyoo-en or the IPA equivalent.


The dictionary, however, does not always work reliably, so you may need to reach for other tools.

5—Workarounds for pronunciation

Emphasis can be set by capitalising a word or, for more emphasis, writing the whole word in uppercase. You can also experiment with writing a word phonetically.

For example: a character name like Niuen might be written as nyoo-en in the script for a natural reading, or Nyooen / NYOOEN if you need the voice to hit it harder. The phonetic approach is often more reliable than the dictionary for high-frequency terms—especially names that appear dozens of times per chapter.

6—Pacing with pauses

I found placing pauses essential to influence pacing. You can also slow or speed up a passage using the speed controls in the voice editor. Mark the paragraph before you change the settings so it only applies to that section. Both tools— pauses and pacing—are worth experimenting with early.

7—Dialogues are horrible

The digital narration for dialogues is horrible in V2, and my trial with V3 was a nightmare. Without the descriptive tags introduced in V3 —in theory the perfect tool, in practice I only got noise—ElevenLabs is just guessing at cadence, and you can only hope it guesses right before you lose your patience or have cleaned out your credits.

Break your dialogues down into manageable blocks, use breaks and emphasis and don’t despair.

8—Adapt your manuscript as needed

I was too hesitant to make changes to the text when it just did not sound right. On the second release candidate, I changed the text right away. It keeps you sane, and no one cares.

9—Quality varies

Another regeneration nightmare is the inconsistency of the output. You might have gotten the perfect cadence, but the quality is so awful you need to start again. Break down larger paragraphs into smaller sections.

10—Mind the export settings

The export settings default to exporting the whole project. A dialogue box tells you how much it’s going to cost. But if you’ve been driven mad by endless regeneration cycles, you might just press export—and there is no undo. Absolutely none. Nada. For an audiobook, the export can take hours, and I once lost 153k credits in one go because I changed a setting on a voice without being aware of the ramifications. Export individual chapters to verify the output. Lock your file. Take a breath and check the settings before you press export.

11—Take your time

I have seen countless YouTube videos where all of this just magically works. There are no issues, everything sounds perfect on the first try—this is not my experience. It took me two to three hours per recorded hour for the first release candidate and another hour per recorded hour to make changes.

This tutorial might not sound like it, but digital narration is an amazing tool. As with all AI tools, the legality of the origin of the training material is in doubt. ElevenLabs’ T&C state that they have paid for the rights to train their AI on human voices. I hope this is true.

In a perfect world, I would love to pay a human narrator, but writing and publishing my books is already an expensive habit as it is, and there is no way I can afford that investment at this moment in time.
I hope you find this tutorial helpful. Let me know what you think on Reddit, Instagram, or Facebook.

See you soon!

Share your love