Visitors to Jesse Spencer-Smith’s office quickly learn there is no such thing as a simple smile. Gazing back at them from his computer screen is an avatar with the complex facial expressions we have come to expect from our fellow humans. The avatar’s support software includes programming for an easily recognizable smile, but with a slight turn of the computer’s joystick, it becomes obvious this is no ordinary virtual visage. As an eyebrow lifts here, the corners of the mouth move there, and the brow furrows, the smiling computer-generated image reveals additional, deeper emotions like surprise or satisfaction, and with them the potential for a new kind of human-computer intelligent interaction.
A computational psychologist and full-time Beckman Institute faculty member in the Human Perception and Performance (HPP) group, Spencer-Smith has had remarkable success in quantifying human facial expressions. He then translates those computations into applications, like this avatar whose smiles demonstrate not only emotional depth, but also a startling confirmation of the research work. Spencer-Smith is adding the component of dynamics to the area of facial recognition research, which previously relied on static images requiring exaggerated expressions in order to communicate simple emotions.
By fashioning an avatar with facial features that move, and writing computations that express the range of emotions found in the human face, Spencer-Smith has created unique human-computer interface opportunities. In addition, the ability to decode people’s facial expressions gives the computer more information on such things as a user’s intent, emotional state, and a more precise understanding of the language being used.
Jesse Spencer-Smith developed an avatar with a wide range of expressions.
Spencer-Smith said the technology has a number of applications in industry and education. But it is the potential clinical uses that cause this psychologist and former Beckman Fellow to lean back in his chair and truly savor the possibilities of the technology. “Oh, yeahhh,” he says slowly – then proceeds to give examples that illustrate his excitement over the prospects. Spencer-Smith said a clinical psychologist could employ a video camera to capture a patient’s facial expressions and use that information to tell whether the person is depressed or simply anxious. “We know that there are differences in how people perceive expressions based on their level of depression,” he said. “We could use this for a very fine test to say this person is not responding well to drug therapy, or for a differential diagnosis between depression and anxiety. You really want to know how someone is doing. Rather than saying how are you doing, it would be very nice to have a quantitative measure.”
That quantitative measure has been the basis for adding a dynamic quality to understanding facial expressions.
“What I have found over the past year is the dynamics do matter, not just for perceiving emotions but also so that you can differentiate expressions that you cannot differentiate without the dynamics,” Spencer-Smith said. “Previously in the literature it was held that pride in achievement, sensual pleasure, satisfaction, contentment, happiness, all shared the same sentiment, which was a smile. And it was confusing because even though we can tell them apart, it looked like the signal was exactly the same. What I found was that if you include dynamics you can differentiate between them. The dynamics are really where all the information is.”
Measuring human emotions is the goal of another project in the Human-Computer Intelligent Interaction Research Initiative. HCII Co-chair Tom Huang of the Image Formation and Processing (IFP) group also focuses on facial recognition and exploring ways the knowledge gained could benefit people. Huang has been working with middle school students to develop a computer that is more proactive in recognizing the emotional state of students.
Tom Huang develops programs allowing computers to understand a user’s emotions.
Huang is a professor in Electrical and Computer Engineering who seeks to use face-tracking as a way to program computers to read the cognitive and emotional states of the user and respond accordingly. Huang’s current focus is on using human-computer interfaces to teach students scientific principles through the construction of Legos. Algorithms have been written for real-time face-tracking and recognizing voice tone so the computer can understand the emotional states of the students. Huang works with Artificial Intelligence group members Stephen Levinson from the Department of Electrical and Computer Engineering and Dan Roth from the Department of Computer Science, Kevin Miller (CS) from the Department of Psychology, and David Brown of the College of Education on this project.
“This is a good example of a really interdisciplinary project, involving both the technical, engineering side, and the psychology side,” Huang said.
Huang’s co-chair in HCII, Art Kramer (HPP) , continues to be involved in numerous projects that cut across many disciplines. One of his primary research areas involves aging, and improving the quality of life for older adults through different interventions. One such intervention is aerobic fitness training, which Kramer studies with Beckman Fellow Stanley Colcombe and post-doctoral students Kirk Erickson and Paige Scalf and graduate students in his laboratory through various methods for its neurological and cognitive effects on older adults. The research uses functional magnetic resonance imaging (fMRI) as a way to study changes in the brain – both functional and structural – that occur after improvements in the fitness of test subjects. Kramer also employs near infrared optical imaging with his colleagues Monica Fabiani and Gabriele Gratton of the Biological Intelligence Research Initiative in the study of aging and cortical plasticity.
“A big focus of what we’ve been doing is trying to understand, both from a psychological perspective and a neurophysiological perspective, change in the brain that occurs as a function of age and how these changes are reflected in changes in behavior or performance in a variety of computer-based memory, attention, decision-making tasks that we use in the laboratory,” Kramer said. “And we‘ve been looking at the effectiveness within this vein of a number of different interventions as a way to slow or reverse age-related decline in memory and attention and in brain function and structure that underlie those performance-based changes.”
Kramer’s studies on the neurological and cognitive benefits of aerobic fitness training for older adults have received national attention.
“What we’ve been finding is that there are increases in the volume of different brain regions both in terms of gray matter, the neurons, and white matter, the interconnections between brain regions,” Kramer said. He said those increases are related to more efficient neural circuits, which means faster and more accurate processing of information for those who engage in fitness training. Their research has also shown that improved fitness leads to better performance of certain psycho-motor skills that require switching between tasks, such as driving, another area of investigation for Kramer.
“The remarkable thing about fitness training is it promotes improvement in memory and attention and a variety of different processes,” Kramer said. “Fitness training has relatively broad effects on a number of different perceptual, cognitive and motor processes. There’s a fitness fix in general.”
Increasing social engagement, along with increased intellectual engagement, has also been shown to be an effective intervention for improving the mental health of older adults, as evidenced by results of a collaborative effort between Kramer’s group and the Center on Aging and Health at Johns Hopkins. The study focused on the Baltimore Experience Corp program in which a group of isolated, sedentary, older women increased their social engagement and intellectual engagement by mentoring school children. The results showed that, compared to a control group, the mentors had measurable improvements in their cognitive abilities and more efficient neural circuits as revealed by fMRI.
“We found comparable changes for these women as people involved in physical exercise,” Kramer said. “So we can improve their function and the double bonus here is the kids tend to improve their scores in reading and math. That’s collaborative beyond Beckman.”
Elizabeth Stine-Morrow researches the benefits of problem-solving for older adults.
Elizabeth A.L. Stine-Morrow (HPP) is also interested in optimizing cognitive function over a lifetime. She focuses on reading over a lifespan, and also on how experience and engagement promotes cognitive functioning. She has created the Senior Odyssey project, which promotes creative problem-solving for seniors as a path toward better mental health. Modeled on Odyssey of the Mind for students, Senior Odyssey features teams working together over a 20-week period to prepare for a competition that requires looking at problems from different perspectives and mental flexibility in order to generate solutions. Stine-Morrow said the project shows that learning can be a lifelong process, not just something people do early in life.
“The idea of constraining education within a short period of the life span is untenable,” Stine-Morrow said. “Odyssey gives older adults a chance to play intellectually. We can’t think of education as something you store away to be tapped later. Rather, education helps you to develop habits of thinking and engagement that have to be practiced. If you don’t maintain those habits, they are going to decay.”
Productive interactions between disciplines and with other facilities can be found throughout the Human-Computer Intelligent Interaction RI. Richard Sproat (AI) holds joint appointments in the University of Illinois departments of Linguistics and Electrical and Computer Engineering. One of Sproat’s research topics is transliteration, which is the spelling or representation of letters from one alphabet to another.
Working with Roth and Chengxiang Zhai from Computer Science and Elabbas Benmamoun from the Department of Linguistics, Sproat is researching ways to find names of peoples, places, and organizations in text from various languages. The project – called multilingual named entity recognition– is developing methods to track the same word in texts of widely different languages for use by analysts. Sproat and his colleagues are attacking the problem by developing computational models for a computer program that can look at two parallel streams of text from, for example, a Chinese newspaper and an English newspaper of news from the same day. It will search for a specific word or term phonetically and based on the frequency with which it occurs in differing texts.
Since words have different forms, especially as they translate to languages with widely varying forms such as Chinese or Hindu, the idea is to be able to track those instances of a word in all its different forms in different languages. The program will search for a specific word or term phonetically and based on the frequency with which it occurs in the text.
“It turns out if you combine those two kinds of evidence, the phonetic evidence and the evidence that says they co-occur over time, in a similar pattern, you can do better than just using the phonetic information,” Sproat said.
Sproat said the software developed from this research would be a tool for analysts to get at information coming in multiple languages more quickly than having language specialists go over it. “You can imagine that they have streams coming in from multiple languages and they want to be able to quickly identify that these stories are about the same thing and here’s the key players,” he said.
Mark Hasegawa-Johnson studies speech recognition devices for automobiles.
Other researchers within HCII are attempting to quantify speech patterns for real-world applications. Mark Hasegawa-Johnson (AI) has been developing mathematical models that have proven successful in detecting speech prosody, or the rhythms and intonations of our speech. This research could lead to speech recognition devices for automobiles that provide for more efficient hands-free cell phone dialing, or for human-computer interaction with an onboard computer. Hasegawa-Johnson has been developing mathematical models based on landmarks (such as a consonant release where the speaker opens their mouth to end a consonant) that have proven successful in detecting prosodic speech. Hasegawa-Johnson said the relationship between the spoken word (phoneme) and how it is pronounced (prosody) is key to understanding language.
“The surprising result we did find was that there’s a synergistic effect you get by modeling the prosodic effects on each phoneme and by modeling the relationship between prosody and syntax,” he said. “When we put them both into the speech recognizer we actually get an improvement that is a bigger improvement than the sum of the two parts. I don’t think anybody had done that experiment before.”
A number of HCII researchers are working on practical applications that will improve the relationship between man and machine.
Jason McCarley (HPP), a former Beckman Fellow and current faculty member at the Institute for Aviation and Department of Psychology, used a grant from the Transportation Safety Administration to look at the effects of mental workload on the visual performance of baggage screeners at airports. McCarley said the project relates to an earlier study on cell phone usage in visual scenes in that it studies the effects of distractions such as conversations, on task performance.
“We found in both older and younger subjects that carrying on a conversation makes you less likely to notice information in the environment, potentially important information,” McCarley said.
In the baggage screening study, they looked at distractions of secondary workload such as filtering out background noise and talking to passengers and their effects on eye movements when it came to visual attention and object recognition. McCarley said eye movement and object recognition skills both contribute to how well screeners perform their tasks.
“If you lapse in either regard then you can miss the target,” he said. “We sort of expected that but what we found was those processes were pretty much independent and that only the object recognition skills, the ability to perceptually pull it out and distinguish it from that background noise, gets better when you practice people on the task. Your ability to forage it out doesn’t get any better.”
Jean Ponce (AI) does vision research, including work on a modeling technique that was intriguing enough to earn a sketch at Siggraph 2005, the annual convention that has been called the Academy Awards for computer graphics. The sketch was about work done by Yasutaka Furukawa and their group on a novel method for acquiring high-accuracy solid models of complex 3-D shapes from multiple calibrated photographs by enforcing available photometric and geometric information. Ponce said the software program for this technique permits the creation of 3-D models less expensively than other methods.
“We have this technology that allows us to build these very precise 3-D models,” Ponce said. “Normally when people build those they use laser-based systems, which are relatively expensive. These you can just build from photographs with normal digital accuracy, but with the same level of accuracy.”
The computer-created image can then be turned into a model by fabricating it on a 3-D printer. Ponce said he has already used this method to create a model of a Neanderthal for an anthropologist.
Siggraph 2005 also featured a presentation from Beckman’s Computer Vision and Robotics Laboratory based on a paper by lab director Narenda Ahuja (AI) and others about techniques the group developed for compactly representing multidimensional visual datasets for efficient image-based rendering on a PC.
Using their methods, the group began with a set of images of a wave-filled swimming pool from a sparse number of viewing directions and under a limited variety of lighting conditions. They then showed what the same scene would look like at different times of the day as the light changes, and from continuously varying perspectives, and with the waves moving faster or the water becoming placid. The continuous, interpolated images introduced temporal, spatial and illumination changes in the scene – factors missing in the original image set – in close to real time.
“What we have done gives us the ability to redisplay a scene, i.e., to create a video, which depicts a new viewpoint, or new lighting conditions, or new display speeds, starting with a relatively small number of images of the original scene,” Ahuja said.
Another of Ahuja’s research interests is developing video camera systems for railroads that would be used to prevent derailments or save fuel. His group has developed a video system that captures images after a sensor detects an oncoming train. It develops the multiple images into a panoramic image, and then finds a description of the loading pattern and what each rail car had on it. Their theoretical component analyzes data and it gets sent to the railroad company, where humans can then analyze it.
Communication between humans, and between humans and computers is only part of the HCII mission.
Frances Wang (HPP) collaborates with a number of researchers in projects involving the CUBE™, Beckman’s totally immersive 3-D virtual reality environment. Wang, and fellow HPP members Dan Simons, David Irwin, Alejando Lleras, and Kramer have been studying the nature of object memory, asking ‘how do people remember different properties of objects, such as location, size and color?’
Wang said their research, which tested subjects in the CUBE™ using colored cubic objects, has shown that location plays a key role in our ability to remember an object.
“It tells us spatial information is very special in the way we represent an object,” Wang said. “That’s a default that cannot be deleted. So, we cannot remember an object without first specifying there’s an object here, then you get to the idea about what it looks like, what does it belong to? This is very new.”
Wang’s research also focuses on memory in visual searches (how we keep track of things) and the nature of spatial representation in navigation (understanding navigation, or walking, in a natural setting). In order to understand more about navigation, Wang and CUBE™ personnel created a tunnel-like virtual maze to find out what kind of errors people make in navigating the virtual maze and what those errors say about how they computed their movements along the way. Wang said the results have contradicted the literature, which says we access allocentric, or environment-centered, spatial memory in order to make our way through the world.
“We believe that we navigate in an egocentric way, which means that we keep track of where objects are relative to ourselves, so in that way we can access and act on the object in an immediate way,” Wang said.
For the memory in a visual search project, a virtual tree with leaves was used, with test subjects using a wand to point at leaves they remembered as having fruit beneath them. Wang said people have a difficult time during a search returning to places they have already looked at, in a process called inhibition.
“There is somewhere in the brain that keeps track of places in space,” Wang said. “If you look at a place, it will put a tag, inhibit, that place. That prevents you from getting back to that place.”
Whether it’s promoting a better life for seniors, or improving the computer interaction experience for humans, HCII researchers are exploring the many facets of man and machine and coming up with exciting new results. Many of those results will lead to technological innovations, others will allow us to lead more productive lives, and some will simply help us to better understand our world and the devices that have become such an important part of it.
