This was published 1 year ago

How Apple’s AI voice cloning works for those at risk of speech loss

By Tim Biggs

October 30, 2023 — 5.05am

For the millions of people who are non-speaking or have lost their ability to speak, smartphones and the rise of text communication have provided many benefits. But there are still countless situations, in person and online, where an inability to speak can be a barrier.

With its latest software updates, Apple has included a new category of accessibility features for speech, joining the existing vision, physical, motor, hearing and cognitive tools. Users can enter text to have their device speak for them in person or on calls, and people at risk of losing their ability to speak can even use Apple’s advanced AI to create a model of their voice, potentially allowing them to continue to sound like themselves.

Apple’s Personal Voice is trained by the user reading out randomised prompts, and it can be used with Live Speech in person or on calls.

“The idea is to try and support individuals who are either non-speaking or at risk of speech loss. For example one in three people diagnosed with ALS [Amyotrophic lateral sclerosis] is likely to lose their ability to speak over the course of their life,” said Sarah Herrlinger, Apple’s senior director of global accessibility policy and Initiatives.

“And we worked with the ALS community to try and understand some of the nuance of their experience, to make sure we made this as simple and easy as possible.”

The main new feature, Live Speech, is straightforward to use on iPhones, iPads, Macs and Apple watches. Once you turn the feature on, each device has a quick shortcut to access the keyboard, and once you type your message and hit “send” your words are spoken aloud. There’s also a button to access phrases you’ve set ahead of time, such as your favourite coffee order, and there are a few different options for voices.

But those with newer devices also have the option of creating a custom voice that sounds like them, with machine learning. Apple calls this feature Personal Voice, and it’s a bit like those synthesised vocal deepfakes you might have heard on YouTube where a Barack Obama soundalike ranks Pokémon, except Apple’s take doesn’t require an enormous amount of training data, is easy to make, and has security and privacy features built in to keep it from being misused.

On iPhone, you tap the side button three times to activate Live Speech.Credit: Tim Biggs

“We believe that both accessibility and privacy should live very well in the same space, and one should not surpass the other,” Herrlinger said.

“We don’t want your files, we don’t want to have any of that. We want to just know that at the end of the day, you have your voice for your own use. It’s all done on device, nothing goes to the cloud.”

To start creating a Personal Voice, you select it from the accessibility settings and your device asks you to read aloud a series of 150 randomly generated phrases. This process provides audio for the model to train on, while also making it difficult to clone other peoples’ voices, since you don’t know what the sentences will be until you begin. Once that’s done, the voice can take hours and lots of power to process, with iPhones waiting until they’re charging overnight to do the work.

When I recorded clips it took around 20 minutes of reading, but if you’re not up to speaking aloud for that long you can stop at any time and come back later. The device also keeps track of what you’re saying so it can move to the next phrase automatically. The crunching of data began once I’d attached my phone to a charger and gone to bed, and in the morning there was a notification saying that the voice model had been completed.

Once it’s all done, you can select one of your Personal Voices (you can create several) to use with Live Speech, for speaking out loud or in apps such as FaceTime.

It’s clear, at least from listening to my own Personal Voice, that I’m not going to fool anyone into thinking it’s me speaking. It still sounds like a robot, albeit a robot doing a decent impersonation of my voice. Sometimes it sounds almost natural, while at other times it says words too fast or uses the entirely wrong intonation. But my kids easily recognised it as me.

The model remains locally on the device where it was made, meaning you need to be able to unlock your device (with a password for example, or your face) to access it. If you do want to use it on multiple devices you can choose to send it to iCloud, where it’s end-to-end encrypted and can be used on a maximum of three devices.

The new features join a large list of accessibility tools on Apple devices, many of which also utilise AI. This includes the Magnifier, which can use the camera to read out everything from labels on buttons to the distance of an upcoming door, or Sound Recognition which can alert users if the phone hears a baby crying, dog barking, water running or other important sound.

But Herrlinger said Apple’s accessibility team wasn’t concerned about exploring what AI could do just for the sake of it.

“Machine learning from an accessibility perspective is not new to us. We love our machine learning teams, and I know we’ll be working with them a lot more,” Herrlinger said.

“But as always, we want to really figure out what are the unique problems of the communities, by hearing from them, by understanding what they would like technology to do, and then solving those specific problems.”

Some accessibility options make it easier for people to use the devices themselves, while others provide functionality that would otherwise require a separate dedicated device, and some help make the regular world more accessible. The benefits, as always, increase as people use more Apple devices and services as well.

For example, many accessibility settings will apply across a person’s entire collection of devices assuming they all come from Apple, and a combination of phone and watch, or computer and earbuds, will unlock new potential for accommodations.

Herrlinger said her team had already ported over many accessibility features to Apple’s upcoming Vision Pro headset and were working on new ones, and that making devices and services easier for everyone to use had benefits that went far beyond the devices themselves.

“I know a couple of people who have built houses ground-up to be HomeKit enabled because they’re quadriplegics who want to be in control of their lives,” she said, referring to Apple’s ecosystem that lets users control a smart home with their devices or voices.

“They don’t want an aide who might come to their home to have a key and can get in and out when they want. They want to have that person ring the doorbell, and they unlock the door via a Smart Lock. Or you know, in the afternoon, not have to call somebody to close the blinds when the sun starts to go down, that they can just say “bring the deck blinds down 50 per cent” and keep working.

“That’s power. That’s dignity and respect. That’s the core of what we want to do, is to give everybody opportunity.”

Get news and reviews on technology, gadgets and gaming in our Technology newsletter every Friday. Sign up here.

License this article

NewsBite

How Apple’s AI voice cloning works for those at risk of speech loss

By Tim Biggs

Save articles for later

Most Viewed in Technology