Abstract
The focus of the paper is on studying ??ve di??erent meth-
ods to combine multi-view data from an uncalibrated smart
camera network for human activity recognition. The multi-
view classi??cation scenarios studied can be divided to two
categories: view selection and view fusion methods. Selec-
tion uses a single view to classify, whereas fusion merges
multi-view data either on the feature- or label-level. The
??ve methods are compared in the task of classifying human
activities in three fully annotated datasets: MAS, VIHASI
and HOMELAB, and a combination dataset MAS+VIHASI.
Classi??cation is performed based on image features com-
puted from silhouette images with a binary tree structured
classi??er using 1D CRF for temporal modeling. The results
presented in the paper show that fusion methods outper-
form practical selection methods. Selection methods have
their advantages, but they strongly depend on how good of
a selection criteria is used, and how well this criteria adapts
to di??erent environments. Furthermore, fusion of features
outperforms other scenarios within more controlled settings.
But the more variability exists in camera placement and
characteristics of persons, the more likely improved accu-
racy in multi-view activity recognition can be achieved by
combining candidate labels
Original language | English |
---|---|
Title of host publication | Proceedings of the 4th ACM/IEEE International Conference on Distributed Smart Cameras, 31 August - 4 September 2010, Atlanta, Georgia |
Publisher | Association for Computing Machinery, Inc |
Pages | 1-8 |
ISBN (Print) | 978-1-4503-0317-0 |
Publication status | Published - 2010 |