In this paper we present a new database acquired with three different sensors (visible, near infrared and thermal) under different illumination conditions. This database consists of 41 people acquired in four different acquisition sessions, five images per session and three different illumination conditions. The total amount of pictures is 7.380 pictures. Experimental results are obtained through single sensor experiments as well as the combination of two and three sensors under different illumination conditions (natural, infrared and artificial illumination). We have found that the three spectral bands studied contribute in a nearly equal proportion to a combined system. Experimental results show a significant improvement combining the three spectrums, even when using a simple classifier and feature extractor. In six of the nine scenarios studied we obtained identification rates higher or equal to 98%, when using a trained combination rule, and two cases of nine when using a fixed rule.
In this paper we discuss the relevance of bandwidth extension for speaker identification tasks. Mainly we want to study if it is possible to recognize voices that have been bandwith extended. For this purpose, we created two different databases (microphonic and ISDN) of speech signals that were bandwidth extended from telephone bandwidth ([300, 3400] Hz) to full bandwidth ([100, 8000] Hz). We have evaluated different parameterizations, and we have found that the MELCEPST parameterization can take advantage of the bandwidth extension algorithms in several situations.
In this paper we propose a Non-Linear Predictive Vector quantizer (PVQ) for speech coding, based on Multi-Layer Perceptrons. With this scheme we have improved the results of our previous ADPCM coder with nonlinear prediction, and we have reduced the bit rate up to 1 bit per sample.
This paper presents an exhaustive study about the robustness of several parameterizations, in speaker verification and identification tasks. We have studied several mismatch conditions: different recording sessions, microphones, and different languages (it has been obtained from a bilingual set of speakers). This study reveals that the combination of several parameterizations can improve the robustness in all the scenarios for both tasks, identification and verification. In addition, two different methods have been evaluated: vector quantization, and covariance matrices with an arithmetic-harmonic sphericity measure.
Many speech coders are based on linear prediction coding (LPC), nevertheless with LPC is not possible to model the nonlinearities present in the speech signal. Because of this there is a growing interest for nonlinear techniques. In this paper we discuss ADPCM schemes with a nonlinear predictor based on neural nets, which yields an increase of 1-2.5dB in the SEGSNR over classical methods. This paper will discuss the block-adaptive and sample-adaptive predictions.
Multi-Angled Parallelism (MAP) is a method to recognize lines in binary images. It is suitable to be implemented in parallel processing and image processing hardware. The binary image is transformed into directional planes, upon which, directional operators of erosion-dilation are iteratively applyed. From a set of basic operators, more complex ones are created, which let to extract the several types of lines. Each type is extracted with a different set of operations and so the lines are identified when extracted. In this paper, an overview of MAP is made, and it is adapted to line recognition in Spanish topographical maps, with the double purpose of testing the method in a real case and studying the process of adapting it to a custom application.
This paper applies the quadtree structure for image coding. The goal is to adapt the block size and thus to increase the compression ratio (without reducing SNR). Also, the computational time is not significatively increased. It has been applied to Block Truncation Coding of still images, and motion vector coding (interframe). An inter/intraframe application is also discussed.
A novel approach for speech segmentation is proposed, based on Multilevel Hybrid (mean/min) Filters (MHF) with the following features: An accurate transition location. Good performance in noisy environments (gaussian and impulsive noise). The proposed method is based on spectral changes, with the goal of segmenting the voice into homogeneous acoustic segments. This algorithm is being used for phoneticallysegmented speech coder, with successful results.
A comparative study of different block matching alternatives for motion estimation is presented. The study is focused on computational burden and objective measures on the accuracy of prediction. Together with existing algorithms several new variations have been tested. An interesting modification of the conjugate direction method previously related in literature is reported. This new algorithm shows a good trade-off between computational complexity and accuracy of motion vector estimation. Computational complexity is evaluated using a sequence of artificial images designed to incorporate a great variety of motion vectors. The performance of block matching methods has been measured in terms of the entropy in the error signal between the motion compensated and the original frames.
In this paper we propose the inversion of nonlinear distortions in order to improve the recognition rates of a speaker recognizer system. We study the effect of saturations on the test signals, trying to take into account real situations where the training material has been recorded in a controlled situation but the testing signals present some mismatch with the input signal level (saturations). The experimental results shows that a combination of data fusion with and without nonlinear distortion compensation can improve the recognition rates with saturated test sentences from 80% to 88.57%, while the results with clean speech (without saturation) is 87.76% for one microphone.