Yuanlu Xu's Homepage

Datasets: CAMPUS-Human&EPFL

General Introduction

We aim at a newly raising task in visual surveillance: re-identifying people at a distance by matching body information, given several reference examples.

Our approach builds a compositional part-based template to represent the target individual and matches the template with input images by employing a stochastic cluster sampling algorithm, as illustrated in Fig.1.

Fig.1 An illustration of the proposed approach. A query individual is represented as a compositional part-based template, and part proposals are extracted from multiple instances at each parts. Human re-identification is thus posed as compositional template matching.

In the experiments, we demonstrate the superior performance of our approach on three public databases compared to existing methods. One of the databases is newly proposed by us including various real challenges in human re-identification.

Representation

Compositional Template

We organize the template of a query individual with an expressive tree representation that can be produced in a very simple way. Human body part detectors [2,3] are performed on several reference images of the individual, and the images of detected parts are grouped according to their semantics. We regard this representation as the multiple-instance-based compositional template (MICT).

Given target images (scene shots) to be matched, we can obtain the target proposal set by a similar process as constructing the MICT. The problem of human re-identification can then be posed as the task of part-based template matching. A re-identification example is shown in Fig.2.

Fig.2 A re-identification example used to illustrate our inference algorithm. Given (a) reference images and (b) a scene shot, proposals of four parts: head, torso, left thigh, left calf are drawn and numbered in the image.

Candidacy Graph

The matching algorithm is designed based upon the candidacy graph, where each vertex denotes a pair of matching part proposals, and each edge link represents the contextual interaction (i.e. the compatible or the competitive relation) between two matching pairs.

Compatible relations encourage vertices to activate together. We represent compatible relations as how two target part proposals are coupled together and mainly explore two cases: (i) kinematics relations for coupling kinematic dependent parts. (ii) symmetry relations for coupling symmetrical parts. An intuitive example is shown as follows. Navy blue and brown edges denote kinematics symmetry relations, respectively. An illustration of compatible relations is shown in Fig.3.

Fig.3 An illustration of compatible relations. (a) Kinematics (navy blue edges) and symmetry (brown edges) relations within the compositional template. (b) An example to show how target part proposals are coupled together by kinematics and symmetry relations.

Competitive relations depress conflicting vertices being activated at the same time. We also develop two cases for competitive relations: (i) Two target proposals with the same part type cannot be activated simultaneously. (ii) The overlapped region between two target part proposals should only be compared once.

An illustration of the candidacy graph representation is shown in Fig.4, corresponding to the example given in Fig.2.

Fig.4 An illustration of the candidacy graph representation. In the graph, vertices denote candidate matches, blue and red edges indicate compatible and competitive edges between vertices, respectively.

Inference Algorithm

We employ Composite Cluster Sampling algorithm [4] to search for optimal match between the template and the correct target. The algorithm iterates in two steps for optimal matching solution searching. (i) It forms several possible partial matches (clusters) by turning off the edge links probabilistically and deterministically. (ii) It activates clusters to confirm partial matches, leading to a new matching solution that will be accepted by the Markov Chain Monte Carlo (MCMC) mechanism [5].

We show an example of one transition in composite cluster sampling in Fig.5.

Fig.5 An illustration of one transition in composite cluster sampling. The first row and the second row denote labels of part proposals, labels of the composite cluster and matching configurations of two successive states (A and B) in one reversible transition, respectively.

Experiments

We validate our method on three public databases: (i) VIPeR dataset (ii) EPFL dataset (iii) CAMPUS-Human dataset, which covers the most chanllenges in human re-identification.

We compare our approach with the state-of-the-arts methods: Pictorial Structures (PS) [2], View-based Pictorial Structures (VPS) [3], Custom Pictorial Structures (CPS) [6], Symmetry-driven Accumulation of Local Features (SDALF) [7] and Ensemble of Localized Features (ELF) [8].

We firstly evaluate our method by re-identifying individuals in segmented images, and adopt the cumulative match characteristic (CMC) curve for quantitative analysis, as shown in the following figure. We demonstrate the superior performance over the competing approaches in both single-shot case and multi-shot case.

We then evaluate our method by re-identifying individuals from scene shots without provided segmentations. We adopt the PASCAL Challenge criterion to evaluate the localization results: a match is counted as the correct match only if the intersection-over-union ratio (IoU) with the groundtruth bounding box is greater than 50%. We compare our method with PS [2], VPS [3], which can localize the body at the same time as localizing the parts. The quantitative results are reported in Table 1.

A number of representative results generated by our method are exhibited in the following figure.

We further analyze component benefits of our approach in the following figure. It is apparent that the combined feature and constraints help refine the results.

Reference

[1] Human Re-identification by Matching Compositional Template with Cluster Sampling. Yuanlu Xu, Liang Lin, Wei-Shi Zheng, Xiaobai Liu. International Conference on Computer Vision (ICCV), 2013. [pdf]
[2] Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. M. Andriluka, S. Roth and B. Schiele. In Proc. of IEEE International Conference on Computer Vision and Pattern Recogntion (CVPR), 2009.
[3] Monocular 3d Pose Estimation and Tracking by Detection. M. Andriluka, S. Roth and B. Schiele. In Proc. of IEEE International Conference on Computer Vision and Pattern Recogntion (CVPR), 2010.
[4] C4: Exploring multiple solutions in graphical models by cluster sampling. J. Porway and S.C. Zhu. IEEE Transactions on Patttern Analysis and Machine Intelligence (TPAMI), 33(9):1713–1727, 2011.
[5] Generalizing Swendsen-Wang to Sampling Arbitrary Posterior Probabilities. A. Barbu and S.C. Zhu. IEEE Transactions on Patttern Analysis and Machine Intelligence (TPAMI), 27(8):1239–1253, 2005.
[6] Custom Pictorial Structures for Re-Identification. D.S. Cheng, M. Cristani, M.Stoppa, L. Bazzani and V. Murino. In Proc. of British Machine Vision Conference (BMVC), 2011.
[7] Person Re-Identification by Symmetry-Driven Accumulation of Local Features. M. Farenzena, L. Bazzani, A. Perina, V. Murino and M. Cristani. In Proc. of IEEE International Conference on Computer Vision and Pattern Recogntion (CVPR), 2010.
[8] Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features. D. Gray and H. Tao. In Proc. of IEEE International Conference on Computer Vision and Pattern Recogntion (CVPR), 2008.