In the last three years, the popularity of audiobooks has gained tremendous momentum, revolutionizing the industry for both audiobooks and authors.
Over $1.81 billion was generated through audiobook sales in the United States in 2022. The sales grew by 3.43% over 2021, which was the slowest year for this format. Sales revenue for audiobooks in the US increased by over 50% over the last 5 years. 9% of all book sales in the US are coming from audiobooks as of 2022.
Spotify now also makes over 200,000 audiobooks available to everyone who has a US Premium account. Premium accounts get 15 hours of audiobooks. They join platforms such as Spotify, Google Podcasts, Apple Podcasts, and others that allow AI generated audiobooks.
Going by these statistics, audiobooks are clearly on a boom.
But like all new technologies, the audiobook industry is facing its share of controversies, chief of which is the role of artificial intelligence in audiobook production. If you have ever listened to an audiobook in the past, you most likely heard the voice of a professional actor who narrates books for a living. Or if you were lucky, you may have listened to an audiobook narrated by its author, for example Trevor Noah’s Born a Crime.
Now, take a moment to imagine a world where, instead of listening to audiobooks voiced by human actors, you will be listening to audiobooks narrated by synthetic voices generated via artificial intelligence. If this sounds a bit too far-fetched, it is not. AI for audiobook creation is gaining traction fast and shaking the traditional audiobooks industry that chiefly relies on human voice actors.
How Will AI Affect Audiobooks and Authors?
Cited in Publishers Weekly, Hillary Hubert, a board member of Professional Audiobook Narrators Association (PANA) and a member of the steering committee for the Screen Actors Guild – American Federation of Television and Radio Artists (SAG-AFTRA), said the role of AI in the production of audiobooks has ignited much debate among professional narrators. It’s understandable why they would have fears around authors using ai for audiobook creation.
“Use AI to create your nonfiction book, build up your profit, then hire a narrator for a more professional-sounding book.”
Most voice actors and narrators are wary of AI in audiobook production for several reasons. First on the list is the possibility of losing their livelihoods. The audiobook industry has thousands of people working in different roles such as voice actors, audio engineers, audiobook editors, etc. What happens to these people if their work can be automated using AI? This is an important question that the critics of AI in audiobook production have raised.
The second reason for hesitation around AI for audiobooks entails the licensing of an actor’s voice. Let’s say you license your voice to be used by Company X for AI-facilitated audiobook production. How much say can you retain in how your voice is used? What if your voice is used to narrate content that you find questionable? And how well are you going to be remunerated for the usage of your voice? These are some of the key questions that audio narrators are grappling with. Licensing their voices to AI companies may seem akin to ceding the control that they have about how those voices can be used.
Lastly, critics are worried that the use of AI in audiobook production obliterates the connection that currently exists between a voice actor narrating an audiobook and the listener. Is it possible for listeners to appreciate an AI voice as much as they appreciate the voice of Stephen Fry or Neil Gaiman? Using synthetic voices for audiobooks, the argument goes, dilutes the special storytelling experience that listeners get from listening to a human voice.
Who Are the Proponents of AI for Audiobooks?
While most audiobook narrators are looking at AI with skepticism, there is another group of people in the industry who see huge possibilities in using AI for audiobook production. These are chiefly AI entrepreneurs and small publishers. What are the reasons for their optimism?
First, proponents of AI in audiobooks production cite the high costs of producing audiobooks using human book narrators. A Wired article on the use of synthetic voices states that audio narrators charge about 250 dollars per finished hour. Other sources estimate that some talents in the audiobook industry can charge as much as 1000 dollars per finished hour. Once you add editing and other production costs, creating a single audiobook easily costs several thousand dollars.
While big publishers may be fine with these costs, it is simply not feasible for small or individual publishers to spend that much on a single audiobook. This is where AI comes in. With the use of AI, the cost of creating an audiobook is drastically reduced. As an example, DeepZen, an AI company that specializes in producing audiobooks, charges about 120 dollars per finished hour and less depending on the services that the client opts for. Other companies such as Speechki promise to charge even smaller amounts.
Besides reducing the costs of production, AI proponents argue that this new technology is the best way to scale audiobook creation. In an interview by Joanna Penn at The Creative Penn, Tylan Kamis, the CEO of DeepZen, said that there are almost 50 million eBooks in the world but only half a million in audio format. Nearly 90% of audiobooks are also in one language: English. If AI is widely adopted in the production of audiobooks, Tylan believes that more audiobooks could be produced at a faster rate. If the audiobook industry continues to rely on human actors for narration, it will take an extremely long time to translate the millions of books in the world into an audio format. This means that people who benefit most from audiobooks, including those who are print-disabled, will continue missing out. In addition, AI will make it easier to make audiobooks in languages other than English. Instead of finding dozens of actors to narrate a book in different languages, AI can automate the entire process and reduce the time needed to make a book available in different languages.
Lastly, supporters of AI for audiobooks believe that the technology will provide more revenue streams for audio narrators through voice licensing. As an example, Tylan (DeepZen’s CEO) cited the case of Edward Hermann, a renowned audiobook narrator who died in 2014. According to Tylan, DeepZen managed to license his voice for the production of audiobooks. In this manner, his legacy is assured and at the same time, his estate continues to receive an income based on the sales of the audiobooks that DeepZen produces using his cloned voice.
AI Content Creation Tools
If you would like to venture into AI-supported audiobook production, there are a number of platforms you can experiment with. Below we highlight a few examples:
This one is my current favorite.
Podcastle Ai converts text into speech with a variety of features that are sure to make the process simple and efficient. It offers studio-quality recording, audio detection, and voice-to-text translation. The audio detection feature allows you to easily transcribe any sound or dialogue, while the voice-to-text translation tool can be used to convert spoken words into written form. There is a paid plan available that provides access to even more features, but the basic version also provides plenty of useful free tools.
I really like this one for the “magic dust” feature that smooths out the rough spots and makes your voice sound remarkably good. The podcasting component is the most advanced of the programs I’ve seen, with one-click background noise removal (which is also great for audiobooks). $29 a month will get you up to 200,000 words of text-to-speech narration, which is far beyond what’s needed for the typical book.
I’d recommend starting with the free version, which will do text-to-speech narration for roughly 6 – 8 pages, then upgrade.
Audiobooks and podcasts are far from the only tools you can use with this program.
This platform leverages Google’s research and technology to provide publishers with a fast and inexpensive way of creating an audiobook. The service offers a range of accents and genders. To use the platform, you first need to provide an eBook in the EPUB format and also offer it for sale on Google Play. You are also required to have the audio rights to the eBook you provide. The eBook should ideally have little emotion or dialogue. Books with plenty of charts and graphs are also not ideal if you intend to get the most out of this service.
Once your audiobook is ready, you can download it and sell it on any platform that accepts audiobooks narrated by synthetic voices. At the moment, this service is offered at no cost and is available only to users in the United States, Australia, Canada, United Kingdom, New Zealand, and Spain. To learn more, check out Google’s step-by-step guide on how to create an auto-narrated audiobook.
Based in the United Kingdom, DeepZen is a company that provides AI-generated audiobooks and voiceovers. DeepZen claims that all its synthetic voices are generated from licensed replicas of human voice actors. This, combined with experienced audio editors, provides a service that the platform claims is “indistinguishable from traditional narration.” According to the CEO, Tylan Kamis, DeepZen is planning to launch a portal where audio narrators can create synthetic versions of their own voices and use them to create audiobooks in a faster way.
Speechki is another popular AI audiobook production platform that promises “natural-sounding synthetic narration using artificial intelligence.” Speechki claims that it can provide users with an AI-generated audiobook in just 15 minutes and for ten times cheaper than traditional audiobook production. Speechki allows clients to choose from 341 synthetic voices and 77 languages.
Beyond Words promises the user “ethically created AI voices” that are available for unlimited usage. Beyond Words provides clients with the latest text-to-speech voices from WaveNet (Google’s AI audio research program), Microsoft Azure, Yandex, and Amazon Polly. Beyond Words boasts clients such as the United Nations, the Irish Times, and the Japan Times. Packages range from free up to $250 per month.
This platform offers 146 voices in 43 languages to the users. They also claim to produce an audiobook in just 10 minutes. Potential clients have the option of trying the service for free before they make a purchase. If you don’t want to deal with proofreading, the company also provides “white-glove service” that handles the editing and proofreading part.
Speechify is a text-to-speech app that promises ‘natural voices’ that read aloud text on a page or PDF, allowing users to listen instead of read. The app offers both a free trial so you can test their voices and a premium subscription. The Free Plan has 10 standard voices for text-to-speech at regular speed. The Premium Plan is priced at $139/year, and offers 30+ high-quality voices in 20+ languages, with 5x faster listening and features like scanning and skipping.
VoiceDream is another text-to-speech app designed for reading ebooks. It can change the pronunciation of words and allows you to specify how a particular word should be pronounced in a book. The app is exclusive to iOS and is priced at $19.99 and includes one free Acapela voice of your choice. In its most recent update, VoiceDream now includes support for premium voices, including those from Apple iOS text-to-speech. If you use VoiceDream, I recommend choosing the premium voices, as they offer a significant improvement over the standard ones.
Artificial Intelligence Won’t Replace You
Some authors fear it won’t be long before an AI writing generator will perhaps take over story writing itself. Proponents of AI say a story writing AI will always need a human guide. According to Forbes, AI “can’t massage the phrasing or other intangibles.” The use of AI in publishing is already common. Editing programs such as ProWritingAid or Grammarly are examples of how AI can benefit an author’s content creation.