When voice actor Heath Miller sits down in his boatshed-turned-home studio in Maine to record a new audiobook narration, he has already read the text through carefully at least once. To deliver his best performance, he takes notes on each character and any hints of how they should sound. Over the past two years, audiobook roles, like narrating popular fantasy series He Who Fights With Monsters, have become Miller’s main source of work. But in December he briefly turned online detective after he saw a tweet from UK sci-fi author Jon Richter disclosing that his latest audiobook had no need for the kind of artistry Miller offers: It was narrated by a synthetic voice.
Richter’s book listing on Amazon’s Audible credited that voice as “Nicholas Smith” without disclosing that it wasn’t human. To Miller’s surprise, he found that “Smith” voiced a total of around half a dozen on the site from multiple publishers — breaching Audible rules that say audiobooks “must be narrated by a human.” Although “Smith” sounded more expressive than a typical synthetic voice, to Miller’s ear it was plainly artificial and offered a worse experience than a human narrator. It made giveaway mistakes, like pronouncing Covid as “kah-viid” when referring to the pandemic.
Miller tracked down “Smith” —the voice matched a sample posted to SoundCloud by Speechki, a San Francisco startup that offers more than 300 synthetic voices for audiobook publishing across 77 dialects and languages. He and other narrators and audio fans who discussed the artificial audiobooks online reported the titles to Audible, which eventually removed them. Although it wasn’t a large number, discovering that synthetic voices were good enough for some publishers to put them to work prompted Miller to wonder about the future of his art and income. “It’s a little terrifying because it’s my livelihood and that of many people I respect,” he says.
Richter says he chose an artificial voice because the concept and its “uncanny valley” sound suited his book, which has a piece of intelligence software as one of its main characters, and that he was unaware of Audible’s policies. “My intention was never to upset or offend anyone,” he says. Speechki says it recommends publishers identify that narrations are synthetic and that it informs them of Audible’s policies. Will Farrell-Green, a senior director at Audible, said in an emailed statement that the company uses automated and manual processes to enforce its rules but that “due to the volume of content on our service, titles that are not compliant do slip through from time to time. ” Audible’s “human’s only” policy dates back to at least 2014, when synthetic voices were much less convincing, and the company has said the rule helps provide listeners the performances they expect.
Synthetic voices have become less grating in recent years, in part due to artificial intelligence research by companies such as Google and Amazon, which compete to offer virtual assistants and cloud services with smoother artificial tones. Those advances have also been used to make reality-spoofing “deepfakes. ” Speechki is one of several startups developing speech synthesis for audiobooks. It analyzes text with in-house software to mark up how to inflect different words, voices it with technology adapted from cloud providers including Amazon, Microsoft, and Google, and employs proof listeners who check for mistakes. Google is testing its own “Auto-narration” service that publishers can use to generate English audiobooks for free, using more than 20 different synthetic voices. Audiobooks published through the program include an academic history of theater and a novelist’s exploration of cultural attitudes to sex. Google spokesperson Dan Jackson says its auto-narrated books supplement rather than replace professionally narrated books. “Our goal with auto-narration is to make it possible to create a low-cost audiobook for any ebook title and increase content accessibility for those that are unable to read via ebook,” he says.
Some publishers see synthetic voices as a way to tap the growing demand for audiobooks, a segment healthier than other parts of the book business. Total US book publisher revenue declined slightly between 2015 and 2020 and ebook revenue shrank, but audiobook revenue surged by 157 percent, according to the Association of American Publishers. Consumers have steadily grown more comfortable with the format, helped along by technical improvements to mobile apps, smart speakers, and wireless headphones. But due to the cost of a narrator and audio production, most titles never become audiobooks, particularly at smaller publishers, says Brian Carroll, rights manager at Indiana University Press.
IU Press licenses a fraction of its catalog for traditional audio production but is now a customer of Speechki. It plans to release its first synthetically narrated audiobooks later this year. “All the other books at last have a chance of becoming audiobooks now,” says Carroll.
Speechki’s technology has been impressive in tests so far, Carroll says, navigating the academic language of titles on paleontology and philosophy. One book chosen for production is Around the World in 80 Toasts, in which the software has to handle text sprinkled with words from other languages. “We thought if it could do this it would probably be able to do anything, and it did a pretty good job,” Carroll says.