My Computer Knows how to French Kiss

10 07 2008

By Christian Laforte and Joshua Koopferstock

You’re watching a feel-good romantic comedy. It comes to that scene: Mr. Hollywood Hunk and Ms. Beverly Hills Perfect10 are staring into each other’s eyes. They lean in. She blinks, slowly. You know what’s going to happen. As they come together for that trite, big-romantic-scene-of-the-movie kiss, you think to yourself, “All these Hollywood rom-coms are exactly the same!”

Your computer watches on silently, intently. It’s trying to learn how to French kiss.

Teaching computers how to recognize human actions is one of the biggest ongoing challenges in the field of computer vision. A new paper shows a first step in the right direction: recognizing specific human actions, such as kissing or answering the phone.

Learning realistic human actions from movies (PDF, video mpg, abstract)
Ivan Laptev,
Marcin Marszalek, Cordelia Schmid, Benjamin Rozenfeld
Published at CVPR 2008

How does it work? First, the authors needed examples of human actions, like actors kissing, answering the phone or getting out of a car.

Examples of actions recognized by Laptev et al:
Kiss, Answer the phone, Get out of a car

They took clips of Hollywood movies and annotated them, e.g. a long kiss starts at 1m53s. Doing this manually for dozens of movies would have been tedious, so the authors developed a technique that combines subtitles and movie scripts (e.g. http://www.dailyscript.com) automatically, a hard problem considering the variety of expressions and the ambiguities in the text. The authors shared their results at http://www.irisa.fr/vista/actions, along with detailed technical explanations.

Automated machine learning and computer vision techniques were then used to recognize what a specific action, like kissing, looks like in successive images. The computer can therefore notice that in most kisses, two regions of the image (e.g. eyes, lips and nose of the actor) are slowly approaching and touching each other.

We will cover interesting potential applications in a future post. In the meantime, here are a few additional technical details.

Technical details

Although the automatic generation of the data sets is itself interesting, I paid particular attention to the video classification problem for action recognition. First, sparse space-time features are detected using a space-time extension of the Harris operator, using multiple temporal and spatial scales to further improve accuracy. To characterize the motion and appearance of local features, histograms are computed in the space-time volume surrounding a point feature, somewhat like SIFT encodes 2D point features.

A spatio-temporal bag of features (BoF) is built from the features, arranged along several spatio-temporal grids shown empirically to produce good results. A non-linear support vector machine is then used to classify actions amongst the 8 possibilities. Basically, this allows the system to automatically learn what important visual features appear in a given sequence. For example, for the shaking a hand action, we would expect that some features would move up and down in time.

Examples of spatio-temporal grids

This new technique outperforms previous ones in simple scenes, and works for natural movies with cluttered backgrounds. For the simpler KTH action dataset, the Laptev approach achieves an average classification accuracy of 91.8%, higher than any other published technique.

Actions in the KTH data sets

The action recognition in real-world video is much harder, so unsurprisingly, the accuracy is much lower, varying between 18% and 53% depending on the type of action. Still, this type of approach is promising and it’s reasonable to expect, in a few years, that improved versions will achieve near-human action recognition. What new applications will this technology make possible? We’ll explore the exciting possibilities in a future post.

Share/Save/Bookmark

Subscribe to RSS feed!


Actions

Informations

4 responses to “My Computer Knows how to French Kiss”

10 07 2008
Yongdong (21:26:16) :

Wow! Great work!

10 07 2008
Ehsan (22:16:54) :

It was really excellent. Last week, One of my friends said that he doesn’t believe that computers might mimic the emotions of humans. I said him that I’m sure that it’s possible, but I had no proof to prove my idea. Now I’ll give him the link of this page:)

11 07 2008
Christian Laforte (09:36:21) :

Thanks for the kind words, but we’re not going to take the credit… all we do is report on the advances from smart researchers like Laptev and his colleagues. ;-)

Regarding the computers mimicing the emotions of humans, there are a lot of interesting papers out there. If you google for “facial action aam” you’ll find some of the best ones. I’ll try to cover some in a future post.

29 12 2008
Bookmarks about Algorithm (18:30:11) :

[...] - bookmarked by 5 members originally found by Kingjay134 on 2008-12-02 My Computer Knows how to French Kiss http://www.enlighten3d.com/2008/07/10/my-computer-knows-how-to-french-kiss/ - bookmarked by 3 [...]

Leave a comment

You can use these tags : <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>