[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Business models and eye candy



Brandon J. Van Every wrote:
2 months ago I asked several text-to-speech researchers whether synthesis for dramatic purposes was currently possible. As in, controlling tone of voice, expression, etc. They said no. Estimates of when it'll be available ranged from "that's our next generation product" to "people have been saying they'll have this in 5 years for, like, forever."

Transplanted prosody will help; it uses the prosody (pitch, timing, volume) from a real audio recording to control the pitch, timing, and volume of the text-to-speech. I put some examples on http://www.mxac.com.au/m3d/tts.htm. It's still a long way from what you're probably looking for.


I am emphasizing TTS because it's a cost issue; voice talent is expensive, particularly if you want to create a large world. Pre-recorded speech (and transplanted prosody) are also inflexible and can only play canned responses.

Of course, TTS may not be enough "eye candy", and might even be seen as worse than raw text by some players.


Mike Rozak
http://www.mxac.com.au