Semantic texton forests for image categorization and segmentation
24 07 2008By Christian Laforte
Warning: this post is pretty technical.
I forgot an important newcomer in my earlier post on segmentation algorithms published at CVPR 2008:
- Semantic Texton Forests for Image Categorization and Segmentation (PDF, extra results, video)
- Jamie Shotton, Matthew Johnson, Roberto Cipolla
Semantic texton forests (STFs) is not a typical segmentation algorithm. Unlike traditional segmentation algorithms that rely primarily on edge information or other low-level image processing, the semantic texton forests (STFs) use an ensemble of decision trees built from training examples, e.g. manually segmented images. This allows STFs to understand that pixels not only belong together, but they also represent, say, a sheep, or some grass. STFs and the related algorithms described in this paper therefore solve two problems: segmentation and categorization.
Other categorization techniques typically rely on features descriptors or manually tuned filter banks. In contrast, STFs operate directly on pixels, resulting in a very fast, relatively simple to implement algorithm, especially compared against state-of-the-art classification systems like Marszaek’s.
Once an STF is built, it can be used to identify that a given green pixel in an image is likely to be grass, by examining neighboring pixels in, say, the 21×21 pixels surrounding it. Likewise, a different pixel could be identified as part of a sheep.
By computing histograms of STFs in a region or a full image, we end up with a higher-level Bags of Semantic Textons (BoSTs). We can then perform semantic segmentation, e.g. capture the notion that sheep often stand on grass. This greatly increases the recognition and segmentation accuracy. The authors explore several optional implementation details and optimizations, and provide details on how each of them improves or not the segmentation quality.
Shotton and his colleagues report that their 8 fps implementation achieves segmentation accuracy of 66.9% on the medium-difficulty MSRC segmentation data set:

Original images (top) from the MSRC data set
and the final categorized segmentation (bottom)
The performance comes down significantly when the algorithm is confronted to the much harder VOC 2007 data set, performing at 24% by itself, or 42% when combined with TKK, a state-of-the-art detector.

Original image (left) from the VOC2007 data set
and the final categorized segmentation (right)
You’ll notice that the segmentation is rough and not pixel-perfect. The authors mention that a much cleaner segmentation could be performed using a Markov or conditional random field to precisely follow image edges.
Subscribe to RSS feed!



