The Robot: an experiment with speech synthesis

From the press release

A British music producer has released a single, ‘Do you wanna dance’, which features a lead vocal part constructed using a simple freeware speech synthesis engine, similar in sound to the artificial voices commonly installed for document-reading on computers. The track also features a second singer whose parts were derived from a vocal ‘toolkit’ library.

The single’s creator, Mark Marrington, wanted to see if it was possible to come up with a vocal-led pop song without involving any real singers in the production process.  ‘During the course of the last century’, says Marrington,’we have become increasingly inclined to transfer music-making responsibilities over to machines. However, singing synthesis offers a particular challenge to Western listeners who regard the human voice as sacred.’

He continues: ‘The Japanese were perhaps the first to wholeheartedly embrace this idea, as shown by the runaway success of one of their recent pop phenomena, called Hatsune Miku, who doesn’t actually exist – she is a synthesized vocalist!’ Marrington argues that the public is pre-disposed to accept this level of artificiality because they have already to some extent become acclimatized to it: ‘I already have on my side the fact that auto-tune, one of the most ubiquitous vocal effects in contemporary pop music, is well established as an aesthetic. It’s not much of a step from this kind of sound to a fully-fledged synthetic voice.’

Marrington’s main challenge was to make the speech synthesizer sing and this involved him recording it speaking the song’s lyrics one phrase at a time. He then imported the resulting vocal snippets into a sequencer and manipulated them in a waveform editor to obtain the correct rhythms as well as a greater range of pitches to create a melody.

The single features the synthetic vocalist serenading a female dance floor diva and this itself is a somewhat artificial contrivance: ‘I built the diva voice using a royalty free female vocal toolkit’, says Marrington, ‘which for all intents and purposes refers to a library of short sung phrases meant for use by bedroom dance producers who can’t afford to hire a vocalist. Luckily I found at least four phrases that were sung in the same key as my song and which just about amounted to a passable lyric. Marrington continues, ‘These phrases were in fact originally recorded by a session singer called Mandy Edge, who is unlikely to have any inkling that she has ended up on my record! And to save Ms Edge from embarrassment, Marrington has even disguised her name, calling her ‘Beth Toro’ (an anagram of ‘The Robot’).