Speaking Out for Change: The Next Evolution of the “Talkie” page 2

Opinion & Special Features

Musings on the adventure genre, developer columns and other special features

Why 'Gone Home' Is a Game
Jul 11, 2014

If Not Puzzles... What?
Feb 14, 2013

The Negative Psychology of No
Jul 20, 2012

Kickstarter Adventure: Gearing into Overdrive, Threatening to Stall
Jun 22, 2012

View All Articles

Speaking Out for Change: The Evolution of the “Talkie”

Written by
Jack Allin — September 21, 2012

It's been two decades since the emergence of the video game "talkie", and the inescapable truth is that there's been very little evolution in adventures ever since. But as the saying goes, the more things change the more they stay the same, and the future of the genre may very well be the talkie all over again. A brand new type of talkie, of course, but a talkie nonetheless.

It's all very well and good for a game to talk to you, with pre-scripted dialogue lines recorded by voice actors in a studio long before you ever hear them. But that's not really talking, just playback. So what about a game where you talk to it? Impossible? Not at all! I've spoken to automated routing services that understood my commands on the phone before, so why not a game?

One of the unfortunate byproducts of the move from text to graphic adventures was a significant loss of interactive freedom. Even a basic text adventure can offer so much more personal control (however illusory it may be) than its modern day point-and-click counterparts. Encountering a small mailbox in front of a white house, you could try opening it, looking at it, depositing something, emptying it, kicking it, kissing it, pushing it over, covering it with graffiti, or talking to it (hey, you never know). True, far too often in the genre's early days the parser didn't understand what you were trying to tell it, but that was a technical limitation of the time that no longer applies. That said, staring at a blank screen and typing just isn't going to cut it for most people today, so more text adventures really isn't the answer.

Early SCUMM games like Maniac Mansion offered many more interactive options than modern adventures

SCUMM-era graphic adventures scaled back the interactivity, but still allowed a wide variety of verbs to play with. With each hotspot giving you as many as fifteen generic options (not including inventory), there was still plenty of choice available. And yet it was a nuisance to continually drag your cursor back and forth between action commands and the environment. Before long, experimenting became more of a pain than pleasure. Sierra refined the process further, reducing interactive options yet again and allowing right-clicks to cycle commands, but this too grew tedious over time. The "verb coin" provided an elegant solution, eliminating additional mouse clicks at the expense of still more possibilities. But even click-hold-slide-select is a hassle when multiplied by hundreds or thousands over the course of a game.

Most graphic adventures nowadays dispense with all semblance of individual control in favour of purely linear scripting. Sure, you can click what you want (so long as it's a hotspot), but the player's only input is to click, guess what might happen, and hope it's what you seek to accomplish. (Often getting nothing more than a "that won't work" for your troubles.) Many games add a "look" option as well, and the verb coin is still around, occasionally offering another choice or two. But for the most part, we've been reduced to one-click-fits-all interaction. It's very restrictive, but simple, streamlined, and fast.

I'm thankful for the "fast" part. My time is limited, and I have no desire to spend literally hours of it "playing the interface" rather than the game itself. But I do lament the loss of personal involvement in my adventures, and I wish there was an alternative to typing or a tedious series of mouse clicks to accomplish what text adventures could (at least theoretically) do from the start. If only we could sit back and simply TELL the game what we wanted to do!

Well, why can't we?

Of all video game genres, arguably none are less tactile than adventures (excluding direct control titles like Dreamfall and Sherlock Holmes, which remain few and far between). You simply couldn't take hands-on control away from a shooter, RPG, or strategy game, but there's a reason the term "point-and-click" is largely referred to (outside genre circles) with derision: the act itself is boring. There is nothing intrinsically fun or inspiring about sweeping the screen, clicking hotspots, watching a character plod around, then be force-fed whatever scripted action (or response) the developer saw fit to provide. That's one second of active engagement for many times that of passive spectacle. Yawn.

No, the appeal of adventures comes from the thinking, not the doing, and with most adventures resigned to click-and-pray mechanics these days, not only isn't there all that much thinking involved anymore, we spend far more time watching than acting. We accept it because it's so efficient, and because we'd never sacrifice the pretty pictures that come with modern games, but as entertainment it's a far cry from the more rewarding means of interaction we once enjoyed. By eliminating the mouse in favour of speech, could we finally have both?

Retail products like Dragon NaturallySpeaking make speech recognition easily accessible

We've all seen futuristic sci-fi where everything is controlled by voice commands, and it's an appealing prospect. Usually things go disastrously wrong (HAL says hi), but that's only when computers are so smart they're able to think and talk back. I'm not asking them to do that much, merely listen and respond as they've been programmed. And that's not science fiction, merely science. In fact, speech recognition programs have been around for quite a while, but like all new technological breakthroughs, it's taken until now for them to reach a reliably functional level. They may still largely be an automated annoyance on the phone, but voice-to-text programs are currently in use in many professional fields, from healthcare to law to education. If it's good enough for "important" jobs, is it not good enough for a game?

And you know what? It's pretty cheap. I was under the mistaken impression that such an option would be cost-prohibitive. It probably was in years past, but now there are highly respected retail programs like Dragon NaturallySpeaking for only $200. Surely that's well within reach for an enterprising developer looking to forge a new path. And if not... well, coughKickstartercough. This is the sort of tangible, justifiable expense I'd gladly contribute to if necessary. There are even open source options to explore for those more technically than financially inclined.

Now, this suggestion is rife with potential problems, I know. For one, it would require players to have a reasonably high quality microphone, but headphones with built-in mikes are becoming more and more commonplace. Adventure gamers are typically among the last to embrace equipment change, but we get there in the end if properly motivated. A next-gen talkie would also demand a bit more privacy to play, unless you've got understanding friends and family who don't mind hearing you constantly blurt out "use rubber chicken on cable" while they're watching TV. And playing on mobile platforms in public? Fuhgeddaboudit. (Just ask anyone who yelled "Objection!" into their Nintendo DS along with Phoenix Wright.) But I suspect many of us play games at home, in private, and wouldn't find this a problem.

Star Trek's Scotty might have been getting ahead of himself back in 1986, but he had the right idea

Far more relevant is the software's ability to recognize speech. Dragon promises a success rate of "up to 99%", but it's impossible to guarantee a minimum. If gamers have to enunciate like we're training for a theatre performance just to be understood, the experience will get old fast. And what about foreign speakers who aren't proficient in the language? Or people with heavy accents? Just how flexible are these programs? I confess I don't know. It will surely come with some restrictions; there's no getting around that. Ultimately for this idea to be successful worldwide, it would require software that supports multiple languages. But hey, even Windows Vista comes with a simplified speech program that recognizes English, French, Spanish, German, Japanese, and Chinese. It's certainly possible.

Speed is also a serious consideration. A (hypothetical) three second delay would be entirely acceptable in medical transcription, but an eternity sitting in front of your computer waiting for a game to react time and time again. Perhaps the software is already efficient enough now, or perhaps that's a bridge yet to be crossed, but we'll never know unless someone tries.

Then of course there's the problem of how to manage increased interactivity in a new talkie adventure. Assuming the software understands most of our commands, more available interactions inevitably means more animations and/or dialogue. This is an issue of money, however, not technology. Ironically, it may well be that the early adopters need to rely more on text to respond to our voice commands, simply to account for the added possibilities. (Especially since I'd expect it to be an indie designer who takes on a challenge like this at first.) I could certainly live with that, as a first step at least.

It would probably be necessary to minimize visual presentation as well, perhaps forsaking third-person for first. Much like the text adventure, many possible interactions would likely need to be described rather than shown, or shown only briefly. I'm okay with that as well. Give me a funny text response to "sit on mailbox" or an unanimated image of the results (rather than the action displayed in progress) and I'll be satisfied. Then again, maybe "sit on" shouldn't be an option. Developers might want to minimize both the word recognition risk and overwhelming interactive contingencies by limiting the number of available commands at any one time. I don't think a full-blown return to text adventure-level freedom is particularly advisable in this case.

Imagine playing a game like Eric the Unready with hi-res graphics and full speech recognition

So how to restrict options without resorting to the dreaded, arbitrary "no"? Why not something along the lines of early Legend Entertainment titles like Eric the Unready or the Gateway and Spellcasting series. More flexible (but less cinematic) than SCUMM, these games provided a large list of clickable actions, some of them tailored to current circumstances, along with another list of accompanying objects to try using them on. (They also included a pure text parser for additional experimentation, but little or no actual typing was required.) With a 21st century talkie, rather than clicking you'd simply verbalize your commands instead.

That may not seem like much of a change on paper, but I suspect the effect would be profound. First, by dramatically increasing the number of actions from the current one- or two-click formula, players would feel far more personally invested in games again. And where constantly typing or clicking text on screen used to feel more finicky than fun, sitting back in a comfy chair and vocally directing events on screen would restore the feeling that you're in control of the game, not vice versa. Freed from the shackles of keyboard, mouse, and even trackpads, there would be no more "awkward interface" complaints (well, there could be, but they'd now be easily avoidable), and you wouldn't need to be chained to a desk. Such games would work great on an Android phone, an iPad, your laptop, or even your big screen TV.

I admit, the thought of returning to intrusive word lists and voiceless text responses to many verbal inputs is far from ideal. But these are just stepping stones, the logistics yet to truly be explored. (Perhaps the word list should be the "hotspot highlight" hint equivalent, available only on request?) For now, the concept merely lays the groundwork, and the sky's the limit for creative designers looking to build on the core idea. The first few efforts may be rough, as pioneers often are, but innovations can be improved and enhanced until we wonder how we ever did without them. At the very least, incorporating speech recognition software into a modern era talkie would be something genuinely NEW in a genre that's all too often stagnant and uninspired. Hands up all those who are open to something different? ... Wait a second, never mind hands. Speak up, and let your voices be heard!