Experience Flying like a Bird with Float4 Interactive

1 10 2008

By Christian Laforte

In the last month, Josh and I have had the opportunity to meet with several companies on the cutting edge of the computer vision and graphics field. First, we sat down at SIGGRAPH with the team leaders on Intel’s Larrabee project (the topic of a future post). More recently, we talked to Videosurf CTO Eitan Sharon, our pick for top company presenting at the TechCrunch50 conference two weeks ago. Last week, we got to chat with the founders of a local startup, Float4 Interactive, a Montreal start-up, who is turning image processing and computer vision techniques into an interactive art form.


Although Float4’s custom software technology works with cheap cameras (e.g. eyeToy) and regular screens, the effects are most impressive when displayed life-size using a back-projected display and two high-performance cameras. They provide turn-key solutions and even rent the hardware for special events, such as extravagant wedding ceremonies, industry expositions, and advertising installations.

Applications and clients.

Other companies experiment similar technologies — I remember seeing some at SIGGRAPH – but Float4 has raised the bar in interactivity, robustness and aesthetic quality.

Here are a couple examples of the applications of their current technology:

  • Move your body to create unique animations (such as juggling a soccer ball)
  • Experience the joy of flying like an airplane or a bird, by shifting or waving your arms.

Already, the company has attracted attention from notable clients such as the Cirque du Soleil. Despite our best efforts to pry it out of them, the Float4 folks are staying quiet on exactly what kind of display they’re building for the Cirque. Personally, I’m excited to see how this state-of-the-art graphical technology gets integrated into what is already a visually astounding performance.

Visit Float4 Interactive

Share/Save/Bookmark

Subscribe to RSS feed!



Create 3D Models from Photos

12 09 2008

By Joshua Koopferstock

If creating 3D models was as easy as taking photos, it is safe to say that the use of 3D would be far more widespread than it is today.  From e-commerce to virtual tourism to casual games, reducing the cost and complexity of creating 3D models would have a widespread effect on multiple industries.

Feeling Software is making that possible.  Over the last 2 years, we have worked to develop a technology that allows anyone to create 3D models with little effort and no training.  Our goal: simplicity.  You take a bunch of photos with a regular camera from any angle you please, and we automatically create a 3D model.  The demo video below discusses our project in detail.


Feeling Software Demo from joshk on Vimeo.

We have thought of a variety ways that this technology can be applied to solve problems for consumers.  For our readers, imagine that you could take photos of an object or scene, press a button, and instantly have a high-quality 3D model of that object or scene.  If this technology were available today, how would you use it?

Share/Save/Bookmark

Subscribe to RSS feed!



Facial Recognition + Search = Cool and Creepy

3 09 2008

By Joshua Koopferstock

You tag this photo:

Source: www.bt.dk

Polar Rose finds this photo:

Source: www.newprophecy.net

With facial recognition in Picasa Web Albums launched yesterday, an exciting computer vision application once again bumps heads against privacy concerns.  On the one hand, automatic tagging of photos through facial recognition can be a useful time saver; if I have an album of 100 pictures of my family vacation with the same 5 people, I would much rather have my computer tag them for me than having to sift through them one-by-one adding tags.  On the other hand, I might not necessarily want all photos of me to be so easily found by anyone.

Picasa Web Albums seems to be OK in how it deals with this issue, at least for now.  Users only tag their own albums, and, as far as I can tell, the information gathered is not used to search Google Imagesand automatically tag images of the people you tag in your web albums.

This is not the case with every player stepping into this industry.  Polar Rose, announced late-2006, uses facial recognition on user-tagged photos to search for more photos of an individual in any publicly available images.  The service is designed as a browser plug-in and an embeddable widget for photo-sharing sites, and rumour has it that a partnership with one of the major sites is imminent.  Users tag photos of people that they find anywhere online, and Polar Rose uses that information to find that same face across other images.

The example at the beginning of this post should illustrate why this may be a concern.  For Paris Hilton, perhaps her image benefits when scandalous pictures surface, but this is not the case for most of us.  Should photos of people really be that searchable?

In reality, though, the point is moot.  Using Google Image search, I probably could have found the same 2 pictures shown above; with facial recognition, this just becomes more efficient.  If you have been using Facebook for the past few years, you have probably already come to terms with the fact that people can quickly find many pictures of you, including ones that others took without your consent (though in fairness to Facebook, you can untag and render unsearchable pictures you don’t like).

Conclusion: from a computer vision standpoint, it is neat to see these technologies reach the mass market.  From a privacy outlook, more (visual) information about us is going to become accessible without our direct consent, but only information that was publicly available in the first place, and this development is probably inevitable.

On a final note, there is one feature proposed by Polar Rose which I can’t help but find far more creepy than useful: personal photo RSS feeds.  Basically, get instantly updated by RSS each time a new photo of a targeted person is found.  Stalkers rejoice!

Share/Save/Bookmark

Subscribe to RSS feed!



Make way for remote surgery!

19 08 2008

By Joshua Koopferstock

As if getting the required precision for surgery wasn’t hard enough in person, surgeons can now perform surgery remotely from thousands of miles away.  While it is far from mainstream, robotic surgery, including remote robotic surgery, has made leaps and bounds in the past decade.  At SIGGRAPH, I came across a technology that might give robotic surgery another shove forward.

Butterfly Haptics, launched earlier this year as a spinoff company from Carnegie Mellon University, is hard at work trying to commercialize a magnetic levitation haptic device (pictured below).  The grey handle in the center of the device floats in a magnetic field.

What Sets it Apart

The device allows 6 degrees of freedom (translation in any direction, rotation in any direction) like many haptic devices.  However, the Butterfly Haptic device separates itself from the rest in two ways.

First, despite the fact the the device is floating in a magnetic field, it can still very effectively stop your motion completely.  In one demo, you could control objects in 3D space; when you ran the object into a wall, the feeling of hitting a solid object was extremely convincing.

Second, no static friction is present as the device is floating in a magnetic field and is not mechanical.  The surface texture demo illustrating this property sold me completely to the benefits of maglev haptics.  In this demo, you were presented on screen with different surface textures that you could run over with the device, such as a solid wavy surface, a tiny ridged surface which felt something like running your fingernail over the the paper edge of a closed book and, most impressively, the “ice” (frictionless surface).  Pushing down on the surface, it was completely solid, but as you move along it in the other two dimensions, it feels absolutely frictionless like perfect ice.

Butterfly Haptics Device

Back to Surgery

It does not require explaining that in surgery, having maximum freedom of movement and realistic force feedback is optimal, if not necessary.  And it is on these two fronts that Butterfly Haptics excels.  Not only would this technology be beneficial in remote robotic surgery, but also for surgical training simulations.  With immersive 3D displays like those used in currently available robotic surgery devices and realistic force feedback, surgeons-in-training can perform highly realistic surgeries on “humans” (anatomically correct 3D models) without ever making a true incision!

The Future for Butterfly Haptics

While the maglev haptic device is currently more academic than commercial, the fact that Butterfly Haptics has been spun out of academia into the business world suggests to me that these devices may find exciting real-world applications in the near future.  What exactly those applications may be are uncertain, but the company suggests on their site that beyond medicine, the devices may be used for CAD applications, data visualization, and character animation.  The medical applications appear most promising to me, but in any case, this is a company and technology well worth keeping an eye on in the next few years.

This is only one of the many interesting technologies and research papers that we came across during SIGGRAPH last week. Expect to find more blog posts about what we saw at SIGGRAPH in the upcoming days now that I am back in beautiful Montreal.

Share/Save/Bookmark

Subscribe to RSS feed!



Feeling Software launches Presto3D in closed beta

8 08 2008

By Joshua Koopferstock & Christian Laforte

For the last few months, we have been quietly working on Presto3D, a 3D model marketplace which integrates our 3D web viewer to display user-generated 3D models in 3D within the browser.  Finally, it is out the door and in closed beta!  If you want to check it out, go to www.presto3d.com and enter this beta referral key: “presto0845″ (without quotes).


Presto3D Tutorial from joshk on Vimeo.

What is Presto3D?

Presto3D is a marketplace where 3D models can be bought or sold.  All 3D assets are user-generated and user-priced.  When a model is uploaded, we convert the 3D model into COLLADA to create a 3D preview for our web viewer.  With a small, one-time plugin download, potential buyers can see the models in 3D, rotate and zoom the models within the browser.

To ensure an optimal performance and to keep the models safe from petty thefts, we automatically reduce the resolution of textures, compress and encrypt all the data.

Why is Presto3D so exciting?

To the best of our knowledge, there exists nothing on the web that allows such openness for the display of user-generated 3D content.  Due to our automatic conversion, on Presto3D, users can upload files in any Maya (.ma, .mb) or 3dsMax (.max, .3ds) format and see them in 3D in the browser (.dae, .obj, & .fbx are also supported).  Even outside of 3D marketplace websites, other sites will require that you use file formats specially created for the specific web viewer, or create the files within their proprietary platform (such as within some web games and virtual worlds).

The goals of Presto3D are two-fold.  First, we aim to drastically improve the experience of buying and selling 3D content.  Second, we will create the most direct path to display 3D content online, irregardless of the software used to create it.

Go give Presto3D a try, and tell us what you think!

Share/Save/Bookmark

Subscribe to RSS feed!



Advanced real-time facial tracking ready to leave the labs

24 07 2008

By Christian Laforte and Joshua Koopferstock

The goal of facial tracking is to recognize elements of a face in an image, and to follow them in a series of images. It may sound simple, but in reality it’s so complex that the human brain evolved a special area just for this task. Some unfortunate folks are born without that area and can’t even recognize themselves in the mirror.

Is this me? Welcome to Prosopagnosia
(original image)

AAM (Active Appearance Model) is the best family of facial tracking algorithms out there right now. The technique was first published by Cootes, Edwards, and Taylor in 2001, then heavily optimized and extended by a group of CMU researchers, primarily Iain Matthews, Simon Baker and Ralph Gross.

Here’s why this algorithm rocks:

  • It’s fast: 300Hz on a regular desktop PC.
  • It’s robust. It can deal with occlusions, e.g. sunglasses.
  • It’s relatively straightforward to implement.
  • It requires no special calibration for the user.

First, statistical model…

To do its magic, AAM must be taught what faces look like in various conditions. To achieve this, hundreds of images of faces must be annotated by human operators. These faces display a wide range of conditions including different races, different expressions, illumination, etc. For each image, someone must manually mark special points, e.g. tip of the nose, to build a mesh:

This training data is converted into a statistical model of face shapes and appearances. This is a tedious process, but once it works for a few faces, the rest of the algorithm can be used to “bootstrap” other faces, so adding new examples become faster with time.

… then track the face.

Once we have our statistical model, tracking can be performed in real-time by fitting the model on an image, such as a frame from a real-time video. Technically-speaking, this is a non-linear optimization problem that consists in minimizing the error between the image and the model. Because the problem is non-linear, we need a good first estimate and a robust fitting algorithm, otherwise the tracking gets stuck in the wrong part of the image, so a face will be detected in some guy’s ear.

Having a good first estimate used to be the hardest part. Basically, we need to tell AAM roughly where to look. Five years ago this would have required a special face detection algorithm, making the system twice as complicated to implement. Now that AAM is super fast, it’s probably simpler to just run it randomly in the image until we catch a face.

AAM improved by CMU researchers

Prior to the CMU papers, AAM was promising but not robust or fast enough for practical applications. The CMU researchers invented a fitting technique called the inverse compositional image alignment algorithm. Basically, they inverted one key step of the original algorithm (comparing the image against the model) which allowed them to compute some expensive calculations much less frequently. The end result was a much faster and robust algorithm, capable of running hundreds of times per second.

The CMU researchers then further improved AAM to deal with occlusions (i.e. partially hidden face) and to track a 3D face instead of a 2D approximation.

Regular AAM on the left,
improved AAM to deal with occlusions on the right
(video)

Surprisingly, when the CMU researchers extended AAM in 3D, using one or more cameras, the extension not only produced precise 3D results, it also ran faster and more robustly!

3D AAM in action (video)

From lab to the real world

Tons of cool applications could leverage this algorithm. So why hasn’t it happened yet? I think the primary reasons is that most people just don’t know it’s possible.

The time has come for AAM to leave the lab and make the real world a more technologically advanced place. If you have a good application idea and funding to bring this to market, give us a call and we’ll be happy to help and pitch in!

Share/Save/Bookmark

Subscribe to RSS feed!



Trying on clothes in 3D: We have a long way to go.

23 07 2008

By Joshua Koopferstock & Christian Laforte

For you technology lovers who are still kids at heart, Disneyland has recently opened up the Innoventions Dream Home, showcasing cool high tech integration in a futuristic home. What caught my attention was one invention called the Magic Mirror. Getting its name from the mirror in Snow White, this Magic Mirror does not tell you “who is the fairest of them all” (for that you’ll still need HotOrNot.com). What it does do is allow you to virtually try on clothes in your wardrobe. In fact, the Magic Mirror is not a mirror at all; it is a large display monitor with a video camera next to it.

Magic Mirror

Trying on a dress in the Magic Mirror. Photo: cepro.com

While the concept is neat and would probably be even more useful in the department store dressing room than the bedroom, by the looks of it in the video below, the concept is still far from the realism necessary for a technology like this to take off. A few years back, it was thought that by today, virtual clothes shopping would be mainstream, and companies like My Virtual Model had signed contracts with major apparel retailers to integrate their technology into online stores.

It turns out that the technology wasn’t ready, and by the looks of this Magic Mirror, it still has a long way to go.

Here’s how I think they do it:

The dress moves roughly according to the orientation of the head, so they are most likely using a simple real-time head tracker and applying the pose of the head to the top of the dress. The bottom seems to be animated randomly, or maybe through secondary animation.

Later this week I’ll post on a face tracking algorithm that could make this easily possible.

How could we do it better?

One imperfection is very noticeable: the dress doesn’t follow the shoulders and hips properly. Part of it may be anatomical (this is a guy after all), but I think this problem should be easily solved by tracking the silhouette (using background subtraction) and identifying the shoulders and hips using simple heuristics, e.g. areas of low curvature and roughly horizontal or vertical slopes. This would immediately improve the realism of this solution.

Another improvement would be to track features on the user’s T-shirt, so we can have a better estimate of the body pose, its size and maybe even the person’s sex. I’d start my search with Automatic Non-Rigid 3D Modeling from Video (Torresani and Hertzmann, 2004), since I remember being impressed with the results way back then: it handles occlusions and variations in illumination very nicely. In the picture below, you can see one of the researchers moving his hands in front of his T-shirt… the algorithm can still capture a 3D representation of the deforming T-shirt. Doing this in real-time may be challenging, but fast GPUs and multi-core systems should make it possible.

Still a Long Way to Go

With the method that we have suggested, there is one major sticking point that we have not addressed: content creation.  Assuming you have the ability to accurately track a person and render the image in real time, you still need a way to create the clothes in 3D.  This is not a simple task, as clothes can be highly variable in elasticity, reflectiveness, etc., which would make automation of the modeling process complex.

Before we see these Magic Mirrors in department stores like Sears or Macy’s that have hundreds of thousands of different apparel items each year, a method for automatic creation of clothing content will have to be developed.  And while automatic 3D content creation is going to take great strides in the next couple of years, the quality of 3D reconstruction needed for clothing is still a long way off.

Share/Save/Bookmark

Subscribe to RSS feed!



Google Earth in the browser… is it really useful?

17 07 2008

By Christian Laforte

It’s been a over month since Google launched the browser plug-in for Google Earth, combining the best of Google Maps (fast, easy, in the browser, extendable through javascript) with the capability to navigate in a 3D terrain.

So does the 3D capability really add a lot over the regular Google Maps? It’s still too early to tell, but a few examples show the potential applications in education, entertainment and planning.

The most visually compelling example I’ve seen so far comes from Bjorn Sandvik’s ThematicMapping blog:

In a glance, you can see in which countries infant mortality is a critical problem. A color legend alone doesn’t fire the imagination the same way.

Another example, the Google Monster Milktruck, is kind of fun:

This mini-game allows you to drive the milk truck around. Unfortunately, many limitations of the Google Earth engine become apparent, especially the lack of collision detections with walls.

A third example, from GolfNation’s blog, allows you to see a golf course in 3D:

This example demonstrates the main problem with Google Earth right now: for the 3D capability to be worthwhile, we need more 3D content. Trees, cars and buildings look like they are painted on the ground, because we don’t have a 3D representation. We’re basically just looking at 2D data (satellite imagery) from a different perspective.

Google and Microsoft are apparently working hard on this problem of reconstructing buildings and landmarks. Feeling Software is also investing in 3D reconstruction from images… but we’re taking a different strategy, that hopefully will put us one step ahead of these giants, in one promising niche. Incidentally, our Feeling 3D Engine also supports KML and KMZ, along with geo-referencing and geographic measurements.

Incidentally, there are other 3D GIS (geographic information systems) that work on the web. One interesting example comes from Korea, according to this informative ZDNet article:

If you know of other compelling examples of 3D use in GIS, by all means, reply to this post!

Share/Save/Bookmark

Subscribe to RSS feed!



New Research: Face Recognition from any Angle

3 07 2008

By Christian Laforte

Look directly at the camera, or the computer can’t see you. Or at least, it can’t recognize that it is you. The major drawback of all facial recognition systems in commercial use today is that they require people to face the camera directly, like a passport photo. A newly published paper takes aim at this problem, helping computers better recognize a face in more natural conditions.

Through Tied Factor Analysis, computers can recognize a face seen from the side by comparing the side image with the passport picture, achieving an accuracy of 92%, a major technological leap when compared with 60% in the previous state-of-the-art technique. The algorithm learns how to extrapolate other views by analyzing thousands of pictures of varied people, taken from many angles. The approach is relatively simple to implement and reportedly much faster than other state-of-the-art techniques.

Side photos (bottom) automatically generated from “Passport” photos (top)

This new model assumes very little about the structure of a face, geometry or lighting, so it could easily be adapted to applications such as recognizing vehicles or animals in a semi-controlled environment.

Technical details

The tied factor analysis (TFA) technique uses machine learning techniques such as Expectation Maximization to automatically learn the relationship between frontal and nonfrontal faces, e.g. pictures taken from the side or at an angle.

Along the way, the system automatically learns an identity space to represent a face in a few hundred parameters. This identity space doesn’t vary significantly with pose, angle or lighting, so in theory, all images of an individual would map to the identity position in that space.

To perform this feat, the researchers started from a large number of pictures, like the 320 individuals from the FERET database taken with multiple poses and angles. These faces were manually altered and annotated to increase accuracy. After running TFA on these pictures, the system can extrapolate from a known facing face (e.g. passport photo) to an unknown non-facing face (e.g. side photo), as shown in the image shown above.

The authors then significantly increased the accuracy by combining 21 TFAs applied around manually-specified positions, identifying standard facial features like the left eye corner in the following figure:

Prince and his colleagues also integrated a relatively simple face part detector inspired from Viola and Jones. This made the system more automatic but reduced the precision in the worst case by 6%. Still, they are confident that this gap could be filled with a more sophisticated detector.

Conclusion

I find these results exciting and promising, but we are still a long way from human-like recognition. Here are the most significant limitations with this approach, along with potential solutions:

Discretized poses: this paper shows how to support many poses, but it doesn’t address how to support arbitrary poses efficiently and accurately, like seeing a face slightly from above or from the bottom. This severely limits the use in non-intrusive monitoring. Building a full 3D model may be better suited to this.

Resistance to occlusions: The paper assumes that the user is cooperative and won’t get partially occluded, e.g. put his hand in front of his face. There are possible solutions that we’ll explore in future posts.

Automatic head pose detection: To make the system completely automatic and unobtrusive, the faces would need to be detected and registered (identified and located) automatically.

Tied factor analysis for face recognition across large pose differences
Simon J.D. Prince, James H. Elder, Jonathan Warrell, Fatima M. Felisberti

IEEE Transactions on pattern analysis and machine intelligence, Vol. 30, No. 6, June 2008
(http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4459336)
You need an IEEE Explore membership to access the paper.

An older, less complete version of the paper is publicly available: http://www.macs.hw.ac.uk/bmvc2006/papers/292.pdf

Share/Save/Bookmark

Subscribe to RSS feed!



Technologies behind Spore

30 06 2008

By Christian Laforte

Chances are you have already heard and are waiting to play Spore, arguably the most ambitious video game ever. Spore allows the player to easily model and control the evolution of an alien species, including an original 3D, fully-animated creature. The possibilities are infinite: your creature can have multiple heads, zero or dozens of legs, multiple sets of arms or claws, a mouth in the middle of its belly, etc. Here’s an example that you can reproduce using the free Creature Creator tool.

For the game to be enjoyable, each unique creature must be animated to show feelings and intent. So how do you make a two-headed worm look excited? And how do you animate a thirteen-legged, three-armed, two-mouthed monster running happily? This would already be challenging for an experienced 3D animator. But because the user can change his creature anytime, the animations must actually be adapted on the fly.

So how did they do it? The EA engineers have opened up a lot of their secrets, most of which are collected on Chris Hecker’s page.

Their most detailed research paper will soon be published at Siggraph 2008:

Real-Time Motion Retargeting to Highly Varied User-Created Morphologies
Chris Hecker, Bernd Raabe, Ryan W. Enslow, John DeWeese (Maxis/Electronic Arts), Jordan Maynard (Trion World Network), Kees van Prooijen (Total Immersion Software)

If you’re a game or 3D graphics developer, by all mean, read this paper. As far as Siggraph papers are concerned, it’s refreshingly practical and accessible. About half of the paper concerns special cases that are only applicable if you try to animate alien characters, e.g. how to handle with look-at constraints for two-headed characters.

Still, there are a lot of techniques applicable to any game or animation system, including a much improved IK (inverse kinematics) solver which could help produce more believable animations in games. In case you don’t know, an IK solver is used to pose the character according to some constraints or goals, e.g. bring one arm to the food, bring the food in the mouth, shake hand with the hand of another character. The IK system in Spore is designed for real-time and elegantly handles conflicting goals. The same system could be applied to complex dynamic animations on human characters, e.g. animate the character aiming a gun, running and reaching for an object at the same time, in the most natural way.

Their animation retargeting technique should also help artists be more productive, by creating animations once and having the computer automatically adapt it to characters with longer legs or arms.

So how well does it work, and will it be fun? I tried the Creature Creator tool. I found it exciting to create my first two creatures. The authoring tool is extremely simple, and there’s a lot of smart usability design in there, like automatic mirroring of body parts, a simple but useful idea that isn’t as easily supported in Maya and 3ds Max.

The animations are fun to watch and listen to, especially the first ten minutes. The gait animation (walking and running cycles) is really cool to watch in action. Even the weirdest character ends up conveying fear, happiness and excitement effectively.

After ten minutes I started noticing small problems here and there. The keyboard movement is annoyingly jerky. The animation and simulation system doesn’t check for interpenetration… so one leg of my creature frequently passed through the torso, breaking the illusion. Sometimes the heads even disconnected from the rest of the body. Finally, the system doesn’t model the mass of the body parts, so you can end up with creatures that look like they should topple, but mysteriously hang in balance.

I suspect that some of these problems will be addressed in the game, especially the keyboard control jerkiness.

So how much effort would it take to reproduce this complex animation system? It all depends on what you want to achieve and where you start with. Hecker and his Spore colleagues made their own animation authoring tool, which makes sense since they wanted to integrate the functionality right inside their game. If all you’re interested in is some pre-processed animation retargeting and improved IK, you could save time by building the functionality as a plug-in to Maya or your favorite animation package… although you’ll still need to implement the runtime IK evaluation if you want to adapt the animation in the game.

If you’re interested in supporting unusual creatures, with multiple heads, arms in unusual locations, etc. then you could implement a variant of Spore’s selection menu and branching predicates. I suspect this could save a lot of time in pre-visualization for CG movies, when animating creatures like these:

Candlestick character from Disney’s Beauty and the Beast

Cerberus mythological character, unknown artist

If we were asked by a client to implement some of this functionality, we would likely start from the Feeling Engine or a Maya plug-in. The Maya plug-in would be a natural approach if the final desired result is an offline animation, to be touched up by an experienced animator. Otherwise, the Feeling Engine already supports complex character animations, physics and 3D manipulators, and easy back-and-forth with Maya or 3ds Max, so this solution would be most appropriate for a stand-alone tool.

Share/Save/Bookmark

Subscribe to RSS feed!