A Video Representation Using Temporal Superpixels

A Video Representation Using Temporal Superpixels

J. Chang, D. Wei, J. W. Fisher III

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.

[pdf] [poster] [slides] [code]

Summary: Following Ren and Malik, superpixels became ubiquitous as a preprocessing step in many vision systems (e.g. Felzenszwalb, Grundmann). Superpixels comprise an intermediate-level representation that preserves significant image structure in a compact representation that is orders of magnitude smaller than a pixel-based representation. Separately, work on motion analysis ranging from Lucas & Kanade and optical flow to structure from motion considered methods for establishing correspondences of image pixels or locations in consecutive frames. Inspired bya , this work developed a representation for videos that parallels the superpixel representation in images. We call these new elementary components, temporal superpixels (or TSPs for short).

Related SLI papers: Though different in nature, prior work on topologically constrained shape sampling led to thinking about TSP. Formulation and implementation of TSP involved birth/death sampling influencing later work on parallel sampling methods for Dirichlet Processes and Hierarchical Dirichlet Processes. GP-based flow models were used for layered tracking and influenced the use of GPs for Bayesian intrinsic image estimation.

Temporal Superpixel Overview presented at CVPR 2013

Figures

Example of TSPs. Note that the same TSPs track the same points across frames on the parachute and rock. We show a subset of TSPs, though each frame is entirely segmented.


Example superpixels at two granularities of images from the Berkeley Segmentation Dataset. Example superpixels at two granularities of images from the Berkeley Segmentation Dataset. Example superpixels at two granularities of images from the Berkeley Segmentation Dataset. Example superpixels at two granularities of images from the Berkeley Segmentation Dataset.

References

[1] J Chang, R. Cabezas, J. W. Fisher III; "Bayesian nonparametric intrinsic image decomposition," in Proceedings of the European Conference on Computer Vision (ECCV), 2014.

[2] J Chang, J. W. Fisher III; "Parallel sampling of HDPs using sub-cluster splits," in Proceedings of the Neural Information Processing Systems (NIPS), 2014.

[3] J. Chang, J. W. Fisher III; "Topology-constrained layered tracking with latent flow," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013.

[4] Jason Chang, John W. Fisher; "Parallel sampling of dp mixture models using sub-clusters splits," in Proceedings of the Neural Information Process Systems (NIPS), 2013.

[5] Jason Chang, John W. Fisher III; "Efficient topology-controlled sampling of implicit shapes," in Proceeding of the IEEE International Conference on Image Processing (ICIP), 2012.

[6] Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa; "Efficient hierarchical graph based video segmentation," in CVPR, 2010.

[7] Pedro F. Felzenszwalb, Daniel P. Huttenlocher; "Efficient graph-based image segmentation," in IJCV, 2004.

[8] X. Ren, J. Malik; "Learning a classification model for segmentation," in CVPR, 2003.

[9] Berthold K. P. Horn, Brian G. Schunck; "Determining optical flow," in Artificial Intelligence, 1981.

[10] Bruce D. Lucas, Takeo Kanade; "An iterative image registration technique with an application to stereo vision," in Int. Joint Conference on AI, 1981.