Sound analysis research has mainly been focused on speech and music processing. The deployed methodologies are not suitable for analysis of sounds with varying background noise, in many cases with very low signal-to-noise ratio (SNR). In this paper, we present a method for the detection of patterns of interest in audio signals. We propose novel trainable feature extractors, which we call COPE (Combination of Peaks of Energy). The structure of a COPE feature extractor is determined using a single prototype sound pattern in an automatic configuration process, which is a type of representation learning. We construct a set of COPE feature extractors, configured on a number of training patterns. Then we take their responses to build feature vectors that we use in combination with a classifier to detect and classify patterns of interest in audio signals. We carried out experiments on four public data sets: MIVIA audio events, MIVIA road events, ESC-10 and TU Dortmund data sets. The results that we achieved (recognition rate equal to 91.71% on the MIVIA audio events, 94% on the MIVIA road events, 81.25% on the ESC-10 and 94.27% on the TU Dortmund) demonstrate the effectiveness of the proposed method and are higher than the ones obtained by other existing approaches. The COPE feature extractors have high robustness to variations of SNR. Real-time performance is achieved even when the value of a large number of features is computed.
Pose detection is one of the fundamental steps for the recognition of human actions. In this paper we propose a novel trainable detector for recognizing human poses based on the analysis of the skeleton. The main idea is that a skeleton pose can be described by the spatial arrangements of its joints. Starting from this consideration, we propose a trainable pose detector, that can be configured on a prototype skeleton in an automatic configuration process. The result of the configuration is a model of the position of the joints in the concerned skeleton. In the application phase, the joint positions contained in the model are compared with the ones of their homologous joints in the skeleton under test. The similarity of two skeletons is computed as a combination of the position scores achieved by homologous joints. In this paper we describe an action classification method based on the use of the proposed trainable detectors to extract features from the skeletons. We performed experiments on the publicly available MSDRA data set and the achieved results confirm the effectiveness of the proposed approach.