Speak on the dotted line
BBC Radio 4, Sat 16th June 2007, 10:30 - 11:00
Presenter: Rory Bremner
Speakers (in order of appearance):
- Melvyn Hunt, Novauris
- Chris Woodward, Anana
- Colin Howman, SRC
- Ian Gibbons, SRC
- Tony Benn
- Judith Markowitz, speech recognition consultant
- Mark Huckvale, UCL
- Vance Harris, Voice Vault
- Nick Ogden, VoicePay
This programme was an introduction to speaker recognition and speaker verification. It was all pleasant enough: Rory Bremner affable, lashings of Tony Benn charisma, everyone else ruthlessly edited. I have only one comment on the content, a few more on the form:
Content
It would be silly of me to complain that the programme wasn’t technical enough: it wasn’t meant for people like me; it was aimed at the layperson listening while they washed the dishes or drove back from the supermarket. My main disappointment was that the programme didn’t differentiate clearly enough between the different speech tech functions it introduced - ‘limited domain’ speech recognition, speech synthesis, translation, ‘personas’, digital dictation (i.e., large vocabulary continuous speech recognition), voice morphing, and finally speaker recognition itself were all covered in a seemless flow. The pace was gentle, and the speakers were good but I would have liked clear blue water between the types. In my experience, for many non-experts the different speech technologies blend together in a hi-tech blur. Some of these non-experts are writing briefs and setting budgets.
OK, another grumble. The programme did not explain how speaker recognition worked. Judith Markowitz came closest when she said that speaker recognition systems ignored the content of the speech and listened for characteristics of the speaker’s vocal tract. Then Vance Harris was wheeled on and asked how speaker recognition worked. He answered to the effect that it used hidden Markov models (see below). This is true enough, but all of the speech recognition on the programme would have used HMMs - and I bet even some of the speech synthesis demonstrated by Melvyn Hunt, and the voice morphing too. So, not much use.
Of course, it’s not fair to say ‘Vance answered …’. We weren’t really listening to Vance.
Form
As none of the content was new to me, I was able to think about the form. All of the speakers (apart from Rory and Tony, and possibly Judith) were heavily edited. There was lots of padding: jingles, mostly, or collages of synthetic voices. The topic of the programme was broached (by Judith Markowitz) just over 80% of the way in, and some technical terms were uttered soon after (”hidden Markov model” and “vectors” by Vance Harris).
I imagine the way the programme was contructed:
- Requirements analysis: when is the programme going out? Who is the intended audience? Questions like this will indicate who your presenter should be (if it’s a documentary going out in the UK it has to be a comedian), what music to use, and what you might call information density. And yes, I imagine all that comes before
- What is the programme about? Decide on your story, spreading your points over the time given (no. of points of information = programme length * information density)
- Go out and do your interviews, collect your raw material
- Cut up the raw material until it says what you want it to, and spread it out over the time span. Using the 80/20 rule, the point of the programme should be broached 80% of the way through.
- Fill the rest of the time in with aural decoration.