Now, this suggestion is rife with potential problems, I know. For one, it would require players to have a reasonably high quality microphone, but headphones with built-in mikes are becoming more and more commonplace. Adventure gamers are typically among the last to embrace equipment change, but we get there in the end if properly motivated. A next-gen talkie would also demand a bit more privacy to play, unless you've got understanding friends and family who don't mind hearing you constantly blurt out "use rubber chicken on cable" while they're watching TV. And playing on mobile platforms in public? Fuhgeddaboudit. (Just ask anyone who yelled "Objection!" into their Nintendo DS along with Phoenix Wright.) But I suspect many of us play games at home, in private, and wouldn't find this a problem.
Star Trek's Scotty might have been getting ahead of himself back in 1986, but he had the right idea
Far more relevant is the software's ability to recognize speech. Dragon promises a success rate of "up to 99%", but it's impossible to guarantee a minimum. If gamers have to enunciate like we're training for a theatre performance just to be understood, the experience will get old fast. And what about foreign speakers who aren't proficient in the language? Or people with heavy accents? Just how flexible are these programs? I confess I don't know. It will surely come with some restrictions; there's no getting around that. Ultimately for this idea to be successful worldwide, it would require software that supports multiple languages. But hey, even Windows Vista comes with a simplified speech program that recognizes English, French, Spanish, German, Japanese, and Chinese. It's certainly possible.
Speed is also a serious consideration. A (hypothetical) three second delay would be entirely acceptable in medical transcription, but an eternity sitting in front of your computer waiting for a game to react time and time again. Perhaps the software is already efficient enough now, or perhaps that's a bridge yet to be crossed, but we'll never know unless someone tries.
Then of course there's the problem of how to manage increased interactivity in a new talkie adventure. Assuming the software understands most of our commands, more available interactions inevitably means more animations and/or dialogue. This is an issue of money, however, not technology. Ironically, it may well be that the early adopters need to rely more on text to respond to our voice commands, simply to account for the added possibilities. (Especially since I'd expect it to be an indie designer who takes on a challenge like this at first.) I could certainly live with that, as a first step at least.
It would probably be necessary to minimize visual presentation as well, perhaps forsaking third-person for first. Much like the text adventure, many possible interactions would likely need to be described rather than shown, or shown only briefly. I'm okay with that as well. Give me a funny text response to "sit on mailbox" or an unanimated image of the results (rather than the action displayed in progress) and I'll be satisfied. Then again, maybe "sit on" shouldn't be an option. Developers might want to minimize both the word recognition risk and overwhelming interactive contingencies by limiting the number of available commands at any one time. I don't think a full-blown return to text adventure-level freedom is particularly advisable in this case.
Imagine playing a game like Eric the Unready with hi-res graphics and full speech recognition
So how to restrict options without resorting to the dreaded, arbitrary "no"? Why not something along the lines of early Legend Entertainment titles like Eric the Unready or the Gateway and Spellcasting series. More flexible (but less cinematic) than SCUMM, these games provided a large list of clickable actions, some of them tailored to current circumstances, along with another list of accompanying objects to try using them on. (They also included a pure text parser for additional experimentation, but little or no actual typing was required.) With a 21st century talkie, rather than clicking you'd simply verbalize your commands instead.
That may not seem like much of a change on paper, but I suspect the effect would be profound. First, by dramatically increasing the number of actions from the current one- or two-click formula, players would feel far more personally invested in games again. And where constantly typing or clicking text on screen used to feel more finicky than fun, sitting back in a comfy chair and vocally directing events on screen would restore the feeling that you're in control of the game, not vice versa. Freed from the shackles of keyboard, mouse, and even trackpads, there would be no more "awkward interface" complaints (well, there could be, but they'd now be easily avoidable), and you wouldn't need to be chained to a desk. Such games would work great on an Android phone, an iPad, your laptop, or even your big screen TV.
I admit, the thought of returning to intrusive word lists and voiceless text responses to many verbal inputs is far from ideal. But these are just stepping stones, the logistics yet to truly be explored. (Perhaps the word list should be the "hotspot highlight" hint equivalent, available only on request?) For now, the concept merely lays the groundwork, and the sky's the limit for creative designers looking to build on the core idea. The first few efforts may be rough, as pioneers often are, but innovations can be improved and enhanced until we wonder how we ever did without them. At the very least, incorporating speech recognition software into a modern era talkie would be something genuinely NEW in a genre that's all too often stagnant and uninspired. Hands up all those who are open to something different? ... Wait a second, never mind hands. Speak up, and let your voices be heard!