This paper compares the speech coder and speaker recognizer applications, showing some parallelism between them. In this paper, some approaches used for speaker recognition are applied to speech coding in order to improve the prediction accuracy. Experimental results show an improvement in Segmental SNR (SEGSNR).
In this paper we propose a nonlinear scalar predictor based on a combination of Multi Layer Perceptron, Radial Basis Functions and Elman networks. This system is applied to speech coding in an ADPCM backward scheme. The combination of this predictors improves the results of one predictor alone. A comparative study of this three neural networks for speech prediction is also presented.
In this paper, we consider the effect of a bandwidth extension of narrow-band speech signals (0.3-3.4 kHz) to 0.3-8 kHz on speaker verification. Using covariance matrix based verification systems together with detection error trade-off curves, we compare the performance between systems operating on narrow-band, wide-band (0-8 kHz), and bandwidth-extended speech. The experiments were conducted using different short-time spectral parameterizations derived from microphone and ISDN speech databases. The studied bandwidth-extension algorithm did not introduce artifacts that affected the speaker verification task, and we achieved improvements between 1 and 10 percent (depending on the model order) over the verification system designed for narrow-band speech when mel-frequency cepstral coefficients for the short-time spectral parameterization were used.
In this paper we propose a nonlinear vectorial prediction scheme based on a Multi Layer Perceptron. This system is applied to speech coding in an ADPCM backward scheme. In addition a procedure to obtain a vectorial quantizer is given, in order to achieve a fully vectorial speech encoder. We also present several results with the proposed system
In this paper we evaluate the relevance of the model size for speaker identification. We show that it is possible to improve the identification rates if a different model size is used for each speaker. We also present some criteria for selecting the model size, and a new algorithm that outperforms the classical system with a fixed model size.
This paper presents an exhaustive study about the robustness of several parameterizations, with a new database specially acquired for the purpose of a speaker recognition application. This database includes the following variations: different recording sessions (including telephonic and microphonic recordings), recording rooms, and languages (it has been obtained from a bilingual set of speakers). This study has been performed with covariance matrices in a text independent speaker verification application. It reveals that the combination of several parameterizations can improve the robustness in all the scenarios.
This paper describes a novel face identification method that combines the eigenfaces theory with the Neural Nets. We use the eigenfaces methodology in order to reduce the dimensionality of the input image, and a neural net classifier that performs the identification process. The method presented recognizes faces in the presence of variations in facial expression, facial details and lighting conditions. A recognition rate of more than 87% has been achieved, while the classical method of Turk and Pentland achieves a 75.5%.
This paper is focused on nonlinear prediction coding, which consists on the prediction of a speech sample based on a nonlinear combination of previous samples. It is known that in the generation of the glottal pulse, the wave equation does not behave linearly [2], [10], and we model these effects by means of a nonlinear prediction of speech based on a parametric neural network model. This work is centred on the neural net weight's quantization and on the compression gain.
Advanced motion models (4 or 6 parameters) are needed for a good representation of the motion experimented by the different objects contained in a sequence of images. If the image is split in very small blocks, then an accurate description of complex movements can be achieved with only 2 parameters. This alternative implies a large set of vectors per image. We propose a new approach to reduce the number of vectors, using different block sizes as a function of the local characteristics of the image, without increasing the error accepted with the smallest blocks. A second algorithm is proposed for an inter/intraframe coder.
In this paper we summarize several applications based on thermal imaging. We emphasize the importance of emissivity adjustment for a proper temperature measurement. A new set of face images acquired at different emissivity values with steps of 0.01 is also presented and will be distributed for free for research purposes. Among the utilities, we can mention: a) the possibility to apply corrections once an image is acquired with a wrong emissivity value and it is not possible to acquire a new one; b) privacy protection in thermal images, which can be obtained with a low emissivity factor, which is still suitable for several applications, but hides the identity of a user; c) image processing for improving temperature detection in scenes containing objects of different emissivity.