The Myths of Multimodal Interaction

Computer Interaction with Eyes, Hands and Voice

Anonymous
Multimodal refers to the different ways to interact with a computer - through speech, hand gestures, body gestures, and the like. New technologies are being designed where control is based on hand gestures instead of simply voice commands. Multimodal systems provide more robustness than unimodal systems that are based on speech, pen or vision. The first myth says if you build a multimodal system that users will interact multimodally, however they use different modes of interaction. Another myth is that speech and pointing is the dominant form of interaction with a computer, but there are many other ways. Multimodal input does not require simultaneous signals - for example a person may point at their computer screen and say something later, instead of at the same moment. In multimodal modes, speech is not the primary input. Communicating with multimodal software is far more difficult by using spoken word than drawing pictures or typing on the screen. When using different modes together, some people assume that the other modes are simply the same thing as another, when in fact these varying modes complement each other and not overlap each other. Many people worry about how error prone these multimodal systems are, but the article states that they are more robust and less error prone. As users interact with multimodal systems, the system becomes more familiar with how the user interacts and can essentially improve its recognition rates. Different input modes have differing ways of inputting information into the computer. Voice and pen, for example, can't convey similar commands to the computer system - meaning the content for each mode is unique from the next. Multimodes can increase speed and flexibility of a computer system. The way to improve multimodal systems is to learn how people interact with computers and what accommodations they need.

Prior to fully knowing of multimodal interaction, I only knew about a few different modes - voice recognition being one. I never thought of controlling a computer with hand or eye gestures, I would imagine building such a machine would be extremely difficult. Multimodal systems sound like a good idea, but they may inadvertently make computers far too complex to use anymore. However, multimodal systems may serve quite useful for blind people, or other individuals with certain disabilities who couldn't use computers before. So, maybe someone can benefit from this developing technology, but I will stick with the unimodal systems for now.

Source: http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel3%2F5%2F14574%2F00664275.pdf&authDecision=-203

  • Multimodal refers to the different ways to interact with a computer.
  • Many myths exist about multimodal interaction.

To comment, please sign in to your Yahoo! account, or sign up for a new account.