VC-Dimension Analysis of Object Recognition Tasks


    loading  Checking for direct PDF access through Ovid

Abstract

We analyze the amount of data needed to carry out various model-based recognition tasks in the context of a probabilistic data collection model. We focus on objects that may be described as semi-algebraic subsets of a Euclidean space. This is a very rich class that includes polynomially described bodies, as well as polygonal objects, as special cases. The class of object transformations considered is wide, and includes perspective and affine transformations of 2D objects, and perspective projections of 3D objects.We derive upper bounds on the number of data features (associated with non-zero spatial error) which provably suffice for drawing reliable conclusions. Our bounds are based on a quantitative analysis of the complexity of the hypotheses class that one has to choose from. Our central tool is the VC-dimension, which is a well-studied parameter measuring the combinatorial complexity of families of sets. It turns out that these bounds grow linearly with the task complexity, measured via the VC-dimension of the class of objects one deals with. We show that this VC-dimension is at most logarithmic in the algebraic complexity of the objects and in the cardinality of the model library.Our approach borrows from computational learning theory. Both learning and recognition use evidence to infer hypotheses but as far as we know, their similarity was not exploited previously. We draw close relations between recognition tasks and a certain learnability framework and then apply basic techniques of learnability theory to derive our sample size upper bounds. We believe that other relations between learning procedures and visual tasks exist and hope that this work will trigger further fruitful study along these lines.

    loading  Loading Related Articles