MIT Media Lab: Roy Momentum

Dedication
Introduction

Dan Ariely
Walter Bender
Steve Benton
Bruce Blumberg
V. Michael Bove, Jr.
Cynthia Breazeal
Ike Chuang
Chris Csikszentmihályi
Glorianna Davenport
Judith Donath
Neil Gershenfeld
Hiroshi Ishii
Joe Jacobson
Andy Lippman
Tod Machover
John Maeda
Scott Manalis
Marvin Minsky
William J. Mitchell
Seymour Papert
Joe Paradiso
Sandy Pentland
Rosalind Picard
Mitchel Resnick
Deb Roy
Chris Schmandt
Ted Selker
Barry Vercoe

Deb Roy

My passion is to understand how language works, and to leverage that understanding toward building machines that use language in fluid, meaningful, human-like ways.

How do we learn from experience? How do we use lessons learned to behave intelligently? What is the role of language in cognition? How can we build machines that learn, think, and communicate in human-like ways? These are questions at the heart of my research program. They are, of course, vast unanswered questions, but I believe that we are in a position to shed new light on their answers.

I am particularly fascinated with how language works. Although language is often studied in isolation from other capacities, I instead view language as a lens through which to examine more central cognitive processes. Words are the only tangible handles for the otherwise invisible structures and processes running under the hood of consciousness. Words provide clues to how we carve up our world into concepts. Grammar, the rules by which we combine words, hints at deep and powerful processes by which we combine concepts to form unlimited new thoughts. I believe that words and grammarare shaped by many aspects of the mind, including perception and motor skills. Thus, for me, to study language is also to study many other parts of the mind and body and how they interact.

Our approach is to develop, simulate, and implement models of natural language acquisition by machine. Our tools are drawn from robotics: machine learning, computer vision, computational linguistics, and spoken language processing. We build models that focus on the roles of perception and action in the formation of language. Specifically, we create machines, often robotic in form, that learn to talk about what they see and do. This choice of problem forces us to bridge the symbolic world of words and sentences with the fluid, ever-changing physical world of sensory stimuli and motor actions.

We refer to the process of relating symbolic and sensory-motor representations as "conceptual grounding." In essence, we are trying to resolve the problem of "dictionary definitions" that is inherent in language processing systems today. In a dictionary, although some words may be fully defined in terms of other words, the meaning of "blue" or "heavy" cannot be captured using words alone. The underlying concepts must already have been experienced in order to understand the definition. If we are to build a robot that can understand commands such as "get the cup from under the counter," the robot needs to link words such as "cup" and "under" to perceptual, non-linguistic, representations. With only symbolic representations of word meanings, the robot has no way to break out of the world of symbols and into the physical world in which it must act. Solving the grounding problem is not a simple matter of associating some words with world experiences; rather, relations between words must be associated with relations between experiences. Only then can symbols be truly hooked into the world.

The potential payoff of building grounded language learning systems is immense. I envision a host of new human-machine interfaces, and, more fundamentally, new insights into how humans learn language, and the structure of conceptual knowledge that underlies language. Five to ten years from now, I imagine we will build a primitive but complete working model of the conceptual structures underlying the vocabulary of a three-year-old. Although primitive, and thus missing many of the subtle details of human conceptual structure, this model will be embodied in an interactive personal robot that will be able to participate in natural spoken conversations with people, fluidly translating the meaning of sentences to and from perception and action.

Favorite childhood toy: Meccano (known as Erector Set in the US)