NewsBite

Human speech becoming elementary to IBM’s Watson

IBM is leveraging the Watson supercomputer’s power to change the rules of online interaction.

Testing Watson's speech recognition

It would be nice to slice the functions of my brain into 18 pieces and sell clones of each as a money-spinner on eBay. In my case, it would end badly. If you deployed my cloned navigation skills and asked for directions, you’d be in Timbuktu rather than Tarcutta.

But that’s exactly what IBM has done with Watson, its supercomputer famous for beating ­humans on the quiz show Jeopardy! in 2011. Instead of reserving Watson for an elite set of activities such as scientific research, IBM is leveraging its considerable powers by making each of its 18 functions available to the public as an online service, via a website.

For example, Watson’s cap­acity to speak with human-like emphasis is cloned in its “text to speech” engine. Paste in text and the engine will read it out with aplomb.

I downloaded a couple of sound clips from federal parliament as .wav files, selected them as input to the “speech to text” engine, and, within seconds, Watson was typing out a pretty accurate rendition of what was said, akin to a proof copy. Take note politicians, and political journalists. However, the audio has to be crystal clear as Watson is unable to deal with speech distortions and background noise.

The supercomputer also offers concept expansion and insights, document conversion, real-time language translation, and it seeks to understand a picture’s content, and context. If the engine noticed some “sold” signs on photos of houses on your Facebook site, it could prompt removalist ads to come your way.

Its AlchemyData News engine indexes news from 250,000 to 300,000 English language newspapers daily. I asked it to collate favourable and unfavourable mentions of various politicians in news coverage, and list company acquisitions in the past 24 hours. You can access all 18 services here.

Testing Watson's speech recognition

IBM offers examples for each service, and trying them out is as simple as pressing a few option buttons. If you’re more serious you’ll need to have a few programming skills to adapt these functions to your need. That includes getting across Github, a hugely popular project hosting and versioning tool that IBM uses with Watson.

Github has tutorials on how to use it, so does lynda.com, which is a great resource for getting across newer computer concepts. You assemble these services on IBM’s Bluemix cloud platform.

A quest for meaning

Underpinning Watson is “machine learning”. Conventionally we input vast arrays of data into computers, which perform calculations and spilt the information back in selected, summarised and sorted forms. Computers like Watson are instead programmed to seek out information and observe the world. The result is an exponentially increasing capacity for accurate and meaningful answers, and an ability to emulate human qualities.

Take Watson’s “personality insights” and “tone analyser” functions. It would be hard for a conventional computer to measure the amount of anger in a slab of text, but Watson does that, having learnt speech patterns representing emotions.

Take Donald Trump’s announcement last year to nominate for the US presidency. From this, Watson concluded Trump was “social, somewhat verbose and can be perceived as shortsighted”.

“You are assertive: you tend to speak up and take charge of situations, and you are comfortable leading groups ... you are relatively unconcerned with tradition: you care more about making your own path than following what others have done.” Watson gave Trump an “agreeableness” rating of 90 per cent, and achievement rating of 16 per cent. I’m not so sure about that.

Likewise “tone analysis”. Watson scored Winston Churchill’s “finest hour” speech as (out of 1.00), 0.63 analytical, 0.90 confident and 0.69 tentative, as having a strong element of emotion, a slice of anger and some disgust and fear.

Martin Luther King’s “I have a dream” speech rated 0.33 analytical, 0.64 confident and only 0.13 tentative. The most prevalent social aspect was his extroversion, and his dominant emotion was disgust (0.23). The sentence that attracted the highest rank for anger was “Let us not seek to satisfy our thirst for freedom by drinking from the cup of bitterness and hatred”.

Leanne LeBlanc, IBM Watson project manager.
Leanne LeBlanc, IBM Watson project manager.

IBM doesn’t rate these services as party tricks, but as serious tools for decision making. That could include a job agency getting a computer to decide who of 200 job applicants would fit a particular job personality wise. Input all the job applications into Watson, along with applicants’ Facebook feeds, Twitter handles and blog addresses and Watson will pump out profiles.

The tone analysis tool could be used to analyse the customer’s language during an online chat session to determine their anger. It could automatically prescribe follow-up action. A politician preparing a speech before an election might use it to ensure it’s the right tone, the same for a senior executive before a proposed corporate announcement.

I don’t dispute Watson’s cleverness and its ability to learn, analyse data, score against linguistic criteria and compute mightily quickly, but can we trust a computer’s personality analysis? Intuitively I find this hard to do, but there is research suggesting computers make better judgments than humans about personality using a person’s digital footprint.

Watson’s judgment is far from foolproof but it’s an incredible tool and whether we like it or not its formidable capabilities look set to transform how we surf the web and interact with businesses.

Add your comment to this story

To join the conversation, please Don't have an account? Register

Join the conversation, you are commenting as Logout

Original URL: https://www.theaustralian.com.au/business/technology/human-speech-becoming-elementary-to-ibms-watson/news-story/5bf98b6c40fb3bc273162ed6a2d84d1c