This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair. The learned correspondence structure can not only capture the spatial correspondence pattern between cameras but also handle the viewpoint or human-pose variation in individual images. We further introduce a global-based matching process. It integrates a global matching constraint over the learned correspondence structure to exclude cross-view misalignments during the image patch matching process, hence achieving a more reliable matching score between images. Experimental results on various datasets demonstrate the effectiveness of our approach.
In this paper we show that by carefully making good choices for various detailed but important factors in a visual recognition framework using deep learning features, one can achieve a simple, efficient, yet highly accurate image classification system. We first list 5 important factors, based on both existing researches and ideas proposed in this paper. These important detailed factors include: 1) $\ell_2$ matrix normalization is more effective than unnormalized or $\ell_2$ vector normalization, 2) the proposed natural deep spatial pyramid is very effective, and 3) a very small $K$ in Fisher Vectors surprisingly achieves higher accuracy than normally used large $K$ values. Along with other choices (convolutional activations and multiple scales), the proposed DSP framework is not only intuitive and efficient, but also achieves excellent classification accuracy on many benchmark datasets. For example, DSP's accuracy on SUN397 is 59.78%, significantly higher than previous state-of-the-art (53.86%).
Image deblurring techniques play important roles in many image processing applications. As the blur varies spatially across the image plane, it calls for robust and effective methods to deal with the spatially-variant blur problem. In this paper, a Saliency-based Deblurring (SD) approach is proposed based on the saliency detection for salient-region segmentation and a corresponding compensate method for image deblurring. We also propose a PDE-based deblurring method which introduces an anisotropic Partial Differential Equation (PDE) model for latent image prediction and employs an adaptive optimization model in the kernel estimation and deconvolution steps. Experimental results demonstrate the effectiveness of the proposed algorithm.
In this paper, a macroblock classification method is proposed for various video processing applications involving motions. Based on the analysis of the Motion Vector field in the compressed video, we propose to classify Macroblocks of each video frame into different classes and use this class information to describe the frame content. We demonstrate that this low-computation-complexity method can efficiently catch the characteristics of the frame. Based on the proposed macroblock classification, we further propose algorithms for different video processing applications, including shot change detection, motion discontinuity detection, and outlier rejection for global motion estimation. Experimental results demonstrate that the methods based on the proposed approach can work effectively on these applications.
This paper presents a novel approach for automatic recognition of group activities for video surveillance applications. We propose to use a group representative to handle the recognition with a varying number of group members, and use an Asynchronous Hidden Markov Model (AHMM) to model the relationship between people. Furthermore, we propose a group activity detection algorithm which can handle both symmetric and asymmetric group activities, and demonstrate that this approach enables the detection of hierarchical interactions between people. Experimental results show the effectiveness of our approach.
This paper presents a novel approach for automatic recognition of human activities for video surveillance applications. We propose to represent an activity by a combination of category components, and demonstrate that this approach offers flexibility to add new activities to the system and an ability to deal with the problem of building models for activities lacking training data. For improving the recognition accuracy, a Confident-Frame- based Recognition algorithm is also proposed, where the video frames with high confidence for recognizing an activity are used as a specialized local model to help classify the remainder of the video frames. Experimental results show the effectiveness of the proposed approach.
Video enhancement plays an important role in various video applications. In this paper, we propose a new intra-and-inter-constraint-based video enhancement approach aiming to 1) achieve high intra-frame quality of the entire picture where multiple region-of-interests (ROIs) can be adaptively and simultaneously enhanced, and 2) guarantee the inter-frame quality consistencies among video frames. We first analyze features from different ROIs and create a piecewise tone mapping curve for the entire frame such that the intra-frame quality of a frame can be enhanced. We further introduce new inter-frame constraints to improve the temporal quality consistency. Experimental results show that the proposed algorithm obviously outperforms the state-of-the-art algorithms.
In this paper, a new heat-map-based (HMB) algorithm is proposed for group activity recognition. The proposed algorithm first models human trajectories as series of "heat sources" and then applies a thermal diffusion process to create a heat map (HM) for representing the group activities. Based on this heat map, a new key-point based (KPB) method is used for handling the alignments among heat maps with different scales and rotations. And a surface-fitting (SF) method is also proposed for recognizing group activities. Our proposed HM feature can efficiently embed the temporal motion information of the group activities while the proposed KPB and SF methods can effectively utilize the characteristics of the heat map for activity recognition. Experimental results demonstrate the effectiveness of our proposed algorithms.
In this paper, a new network-transmission-based (NTB) algorithm is proposed for human activity recognition in videos. The proposed NTB algorithm models the entire scene as an error-free network. In this network, each node corresponds to a patch of the scene and each edge represents the activity correlation between the corresponding patches. Based on this network, we further model people in the scene as packages while human activities can be modeled as the process of package transmission in the network. By analyzing these specific "package transmission" processes, various activities can be effectively detected. The implementation of our NTB algorithm into abnormal activity detection and group activity recognition are described in detail in the paper. Experimental results demonstrate the effectiveness of our proposed algorithm.