The 5 Senses: Why Stop at Vision?

20 11 2008

by Joshua Koopferstock

Unless you are Haley Joel Osment, odds are you possess 5 senses.  On this blog, we are focused on how computers can interpret one of those senses, vision.  It’s the field where we have our technical expertise, and where we believe we can make our biggest contribution to technological advancement (see our current R&D project to turn photos to 3D models).  However, in the same way that computer vision scientists are trying to help machines understand what they see, the researchers over at mufin are trying to teach computers how to hear.

mufin is an automated music recommendation system that takes a different, one could say more technical, approach to helping you find music that you like.  Services like Pandora and iTunes Genius use human expertise or song meta-data where mufin actually analyzes the audio content.  This is from the mufin website:

How does mufin work?

mufin knows the musical essence of millions of songs and connects those songs that have a similar essence. This essence consists of sound properties like tempo, instruments, sound density or harmony. Whether the music is well-known or not, which genre it belongs to, when it was released or where in the world it was made, plays no role when you discover music using mufin. What matters is the sound!

Much like those of us in Computer Vision, Computer Audition scientists sit at the meeting point between art and mathematics.  As effectively as Computer Audition algorithms can objectively break down music into a series of variables, it requires a subjective human to determine which combination of those variables implies a “similar” song.  It is this subjectivity that keeps both of our fields fascinating.

Though the early reviews of this site seem to be not entirely positive, the folks over at mufin already have my respect for taking a stab at a truly complex problem.  Whether mufin succeeds or not, I expect to see Computer Audition techniques applied in other music recommendation services in the future, though undoubtedly in combination with other, more manual, techniques rather than in isolation.

Share/Save/Bookmark

Subscribe to RSS feed!



Facial Reconstruction: Photos to 3D

14 11 2008

By Joshua Koopferstock

Who hasn’t always dreamed of having a $2000 sculpture of their own head?  I know I have.  And finally, thatsmyface.com will make this dream come true.  With the upload of 2 photos of your face, one in front view and one in profile, thatsmyface.com will reconstruct a 3D model of your head in a semi-automatic fashion. This is a pretty neat attempt to bring a computer vision & 3D technology to the consumer market.

The technology behind the site is nothing new.  I can’t say for sure, but I would be willing to bet that thatsmyface.com is using FaceGen technology, which has been around for 10 years.

Similar?

Still, you don’t need cutting edge technology AND an exciting product to make a successful business; one of the two will usually do just fine.  Thatsmyface.com now just needs to convince people that what they’ve always wanted was random gear with their head as the main feature.  For the sake of entrepreneurship, good luck to them!

Share/Save/Bookmark

Subscribe to RSS feed!



Is Computer Vision the Next Big Thing for Advertising?

31 10 2008

By Joshua Koopferstock

Today I came across a neat computer-vision based installation/game that was presented at the FILE 2008 expo in Brazil.  The company that put this together, Colmeia, is an interactive production company that creates digital installations for advertising.  The game involves using one’s silhouette, reprojected as dark blobs, to block out certain parts on a screen.


Tantalus Quest at FILE 2008 from eduardo omine on Vimeo.

We wrote a couple of weeks ago about Float4, a Montreal-based company also working on interactive display installations.  I am hopeful that what we are observing is the beginning of a trend, and that in the next couple of years we will see a rising number of computer vision applications in advertising.  This is certainly a trend that we will be following closely on ENLIGHTEN3D.

Share/Save/Bookmark

Subscribe to RSS feed!



Computer vision for all your secret agent needs

30 10 2008

By Joshua Koopferstock

Spies rejoice!  Computer vision scientists at UC San Diego have devised a way to create a duplicate of a key using a only one digital photo, apparently taken in normal conditions.

By identifying a few “key” points (lame pun intended) in the photograph, the researchers are able to extract all the information required by a locksmith to make a copy of that key.  So, to all our loyal readers in the secret agent and espionage professions, send this report to your R&D department today and get this mildly amusing incredibly practical technology out of the labs!

Read the full article here.

Share/Save/Bookmark

Subscribe to RSS feed!



Google’s Orkut filters sexually-explicit photos

28 10 2008

By Joshua Koopferstock

The folks over at the Orkut blog announced a few days ago a new filter for their social network that can screen out “sexually suggestive phrases or images”.  However, no detail has been published (to my knowledge) on exactly how they filter out these images, whether they use text-based clues or somehow incorporate computer vision to detect the content of photos.  Until they respond to my e-mail I cannot say for sure, but my guess is that they are using only a text-based filter like the one used for Google Images SafeSearch.  If this is the case, I doubt it will be as effective on a social network where there are far fewer textual clues to put an image in context than there are on, say, a pornography web site.

Regardless of whether Orkut is using it or not, it is only a matter of time before computer vision techniques drive the next wave of “safe content” filtering.  As we discussed a few weeks ago, VideoSurf, the video search engine, has already introduced an image/video-based filter.  Once one major player in search adopts computer vision for content filtering, their competitors will have a strong incentive  to follow or risk being seen as the “unsafe” search engine.  Whether or not Google started the trend with Orkut this week, it’s only a matter of time until this application of computer vision hits the mainstream.

Share/Save/Bookmark

Subscribe to RSS feed!



We’re ahead of the Times

16 10 2008

by Joshua Koopferstock

The New York Times, that is.  In what can only be perceived as the deepest form of flattery, last week the NY Times wrote a full length article on the “beautification algorithm”; the exact same paper that we wrote about 2 months ago.  Personally, I was excited to see a major mainstream publication taking an interest in computer vision, and I hope that our field continues to push into the thoughts of those who are not working directly in it.

Most folks that I encounter who aren’t working directly in computer vision or computer science are unfamiliar with the field entirely, and, more importantly, are unaware of the amazing things that can and are being done with computer vision today.  In my own experience, some of the best ideas come from those who are not the scientists directly doing the research, but who are working on something completely different yet see the potential of the research in their own, seemingly unrelated, application.  So, New York Times, show the potential of computer vision to the world!  And any time you need inspiration for an article, we will be flattered if you come check by ENLIGHTEN3D.

Share/Save/Bookmark

Subscribe to RSS feed!



Experience Flying like a Bird with Float4 Interactive

1 10 2008

By Christian Laforte

In the last month, Josh and I have had the opportunity to meet with several companies on the cutting edge of the computer vision and graphics field. First, we sat down at SIGGRAPH with the team leaders on Intel’s Larrabee project (the topic of a future post). More recently, we talked to Videosurf CTO Eitan Sharon, our pick for top company presenting at the TechCrunch50 conference two weeks ago. Last week, we got to chat with the founders of a local startup, Float4 Interactive, a Montreal start-up, who is turning image processing and computer vision techniques into an interactive art form.


Although Float4’s custom software technology works with cheap cameras (e.g. eyeToy) and regular screens, the effects are most impressive when displayed life-size using a back-projected display and two high-performance cameras. They provide turn-key solutions and even rent the hardware for special events, such as extravagant wedding ceremonies, industry expositions, and advertising installations.

Applications and clients.

Other companies experiment similar technologies — I remember seeing some at SIGGRAPH – but Float4 has raised the bar in interactivity, robustness and aesthetic quality.

Here are a couple examples of the applications of their current technology:

  • Move your body to create unique animations (such as juggling a soccer ball)
  • Experience the joy of flying like an airplane or a bird, by shifting or waving your arms.

Already, the company has attracted attention from notable clients such as the Cirque du Soleil. Despite our best efforts to pry it out of them, the Float4 folks are staying quiet on exactly what kind of display they’re building for the Cirque. Personally, I’m excited to see how this state-of-the-art graphical technology gets integrated into what is already a visually astounding performance.

Visit Float4 Interactive

Share/Save/Bookmark

Subscribe to RSS feed!



Computer Vision Comes to Video Search

18 09 2008

By Joshua Koopferstock and Christian Laforte

With 50 startups launching at the TechCrunch50 conference last week, the blogosphere has been abuzz with high tech news. In all that media frenzy, one company stood out and impressed us more than the rest. Videosurf, a video search engine, combines traditional text metadata search with computer vision to provide higher quality search results. After seeing a video of their presentation at TC50, I wanted to learn more, so I arranged an interview with Videosurf CTO and computer vision expert Dr. Eitan Sharon. The following article is based on our discussion.

How Videosurf Works

Videosurf starts by looking through textual tags and dividing video into shots. Unlike other video search engines though, Videosurf goes much further in its analysis. A facial recognition system goes through the films frame-by-frame, collects faces that look alike in different shots, and tries to match them against the appearance of known actors using the cast information. In the beginning, an operator had to assist the system, but now, most of the common actors are easily recognized automatically.

Here’s an example. I put in a search for “Star Trek”. At the top of the page, I am presented with a thumbnail of people that are associated with my search. In this case the list includes William Shatner, Jeri Ryan, Leonard Nimoy, and Patrick Stewart, all actors who have played major characters in a Star Trek TV series. By clicking on one of the thumbnails (Patrick Stewart), I can find all of the Star Trek-related clips in which Patrick Stewart appears.

From the screenshots, you may have noticed that next to each clip there appears to be a storyboard. Appearing simple at first glance, these “video summaries” actually employ a complex computer vision approach in trying to automatically determine the most “important” frames in a clip. According to Dr. Sharon, creating these visual summaries involves a combination of techniques such as determining the uniqueness or similarity of objects across frames, the depth of field, motion in the scenes, etc. If these automatic storyboards work effectively, and from my own testing they seem to do a good job, this should save users time so that they don’t have to wait for a video to finish downloading just to realize that it is not what they were looking for.

Other Features

Perhaps the feature with the most immediate profit potential is their adult content filter. Though we did not get too far into the details, they have employed a type of “safe search” filter that adds computer vision techniques to traditional text search to determine whether content should be filtered. Like Google’s Safe Search, this filter can be toggled on or off. If I was Yahoo’s chief marketing officer, I would immediately license this technology exclusively for a few years. Then I would run an ad campaign about how our search engine is the only safe place for children on the internet. But that’s just me.

One other neat feature lets you share a specific segment from a video, increasing the viral potential of online video beyond where it already is today.

Will it take off?

Saying “Will I use this?” is not really the most unbiased kind of market analysis, but I can say that I will use this (free!) service now. I already have, actually; having “Videosurfed” 3 times in the past 24 hours since receiving my beta invite, and I am confident that I will keep coming back if it keeps giving me quality results.

From a profitability perspective, Videosurf may have an ace in the hole that they haven’t talked about yet: copyright detection. After Google (owner of YouTube) was sued for one billion dollars by Viacom last, it is safe to bet that they and others are interested in being able to better detect whether the material on their sites is copyrighted. Computer vision has finally come to online video search, and we can safely say that it is here to stay.

You can find a list of Dr. Sharon’s computer vision publications on his personal web site: http://www.dam.brown.edu/people/eitans/

Share/Save/Bookmark

Subscribe to RSS feed!



Create 3D Models from Photos

12 09 2008

By Joshua Koopferstock

If creating 3D models was as easy as taking photos, it is safe to say that the use of 3D would be far more widespread than it is today.  From e-commerce to virtual tourism to casual games, reducing the cost and complexity of creating 3D models would have a widespread effect on multiple industries.

Feeling Software is making that possible.  Over the last 2 years, we have worked to develop a technology that allows anyone to create 3D models with little effort and no training.  Our goal: simplicity.  You take a bunch of photos with a regular camera from any angle you please, and we automatically create a 3D model.  The demo video below discusses our project in detail.


Feeling Software Demo from joshk on Vimeo.

We have thought of a variety ways that this technology can be applied to solve problems for consumers.  For our readers, imagine that you could take photos of an object or scene, press a button, and instantly have a high-quality 3D model of that object or scene.  If this technology were available today, how would you use it?

Share/Save/Bookmark

Subscribe to RSS feed!



DARPA Binoculars Use Haze to See Farther

28 08 2008

By Joshua Koopferstock

Being able to see through haze would be neat enough in its own right, but DARPA-funded scientists are going one step beyond that and using the haze to actually see further than they would be able to see if it wasn’t there.  More specifically, these researchers believe that the shimmering of heat “waves” can in fact be used as a lens, with the right image recognition technology.

Photo by Keirn

The goal? 90% accurate facial recognition at 1km with a 6cm lens.  As the system combines the data from multiple images, you will not be able to see 1km away in real time; the aim is 1 frame per second.

What I like about this approach is that it takes something that is impeding the goal — haze blocks your ability to see far — and turns it into an improvement — haze helps you see farther!  More problems should be solved this way.

View the technical presentation

Source: New Scientist

Share/Save/Bookmark

Subscribe to RSS feed!