Automatic question-answering is a classical problem in natural language processing, which aims at designing systems that can automatically answer a question, in the same way as human does. In this work, we propose a deep learning based model for automatic question-answering. First the questions and answers are embedded using neural probabilistic modeling. Then a deep similarity neural network is trained to find the similarity score of a pair of answer and question. Then for each question, the best answer is found as the one with the highest similarity score. We first train this model on a large-scale public question-answering database, and then fine-tune it to transfer to the customer-care chat data. We have also tested our framework on a public question-answering database and achieved very good performance.
Subspace learning is an important problem, which has many applications in image and video processing. It can be used to find a low-dimensional representation of signals and images. But in many applications, the desired signal is heavily distorted by outliers and noise, which negatively affect the learned subspace. In this work, we present a novel algorithm for learning a subspace for signal representation, in the presence of structured outliers and noise. The proposed algorithm tries to jointly detect the outliers and learn the subspace for images. We present an alternating optimization algorithm for solving this problem, which iterates between learning the subspace and finding the outliers. This algorithm has been trained on a large number of image patches, and the learned subspace is used for image segmentation, and is shown to achieve better segmentation results than prior methods, including least absolute deviation fitting, k-means clustering based segmentation in DjVu, and shape primitive extraction and coding algorithm.
Text extraction is an important problem in image processing with applications from optical character recognition to autonomous driving. Most of the traditional text segmentation algorithms consider separating text from a simple background (which usually has a different color from texts). In this work we consider separating texts from a textured background, that has similar color to texts. We look at this problem from a signal decomposition perspective, and consider a more realistic scenario where signal components are overlaid on top of each other (instead of adding together). When the signals are overlaid, to separate signal components, we need to find a binary mask which shows the support of each component. Because directly solving the binary mask is intractable, we relax this problem to the approximated continuous problem, and solve it by alternating optimization method. We show that the proposed algorithm achieves significantly better results than other recent works on several challenging images.
Signal decomposition is a classical problem in signal processing, which aims to separate an observed signal into two or more components each with its own property. Usually each component is described by its own subspace or dictionary. Extensive research has been done for the case where the components are additive, but in real world applications, the components are often non-additive. For example, an image may consist of a foreground object overlaid on a background, where each pixel either belongs to the foreground or the background. In such a situation, to separate signal components, we need to find a binary mask which shows the location of each component. Therefore it requires to solve a binary optimization problem. Since most of the binary optimization problems are intractable, we relax this problem to the approximated continuous problem, and solve it by alternating optimization technique, which solves the mask and the subspace projection for each component, alternatingly and iteratively. We show the application of the proposed algorithm for three applications: separation of text from background in images, separation of moving objects from a background undergoing global camera motion in videos, separation of sinusoidal and spike components in one dimensional signals. We demonstrate in each case that considering the non-additive nature of the problem can lead to significant improvement.
Iris is one of the popular biometrics that is widely used for identity authentication. Different features have been used to perform iris recognition in the past. Most of them are based on hand-crafted features designed by biometrics experts. Due to tremendous success of deep learning in computer vision problems, there has been a lot of interest in applying features learned by convolutional neural networks on general image recognition to other tasks such as segmentation, face recognition, and object detection. In this paper, we have investigated the application of deep features extracted from VGG-Net for iris recognition. The proposed scheme has been tested on two well-known iris databases, and has shown promising results with the best accuracy rate of 99.4\%, which outperforms the previous best result.
Sparse decomposition has been widely used for different applications, such as source separation, image classification and image denoising. This paper presents a new algorithm for segmentation of an image into background and foreground text and graphics using sparse decomposition. First, the background is represented using a suitable smooth model, which is a linear combination of a few smoothly varying basis functions, and the foreground text and graphics are modeled as a sparse component overlaid on the smooth background. Then the background and foreground are separated using a sparse decomposition framework and imposing some prior information, which promote the smoothness of background, and the sparsity and connectivity of foreground pixels. This algorithm has been tested on a dataset of images extracted from HEVC standard test sequences for screen content coding, and is shown to outperform prior methods, including least absolute deviation fitting, k-means clustering based segmentation in DjVu, and shape primitive extraction and coding algorithm.
Sparse decomposition has been widely used for different applications, such as source separation, image classification, image denoising and more. This paper presents a new algorithm for segmentation of an image into background and foreground text and graphics using sparse decomposition and total variation minimization. The proposed method is designed based on the assumption that the background part of the image is smoothly varying and can be represented by a linear combination of a few smoothly varying basis functions, while the foreground text and graphics can be modeled with a sparse component overlaid on the smooth background. The background and foreground are separated using a sparse decomposition framework regularized with a few suitable regularization terms which promotes the sparsity and connectivity of foreground pixels. This algorithm has been tested on a dataset of images extracted from HEVC standard test sequences for screen content coding, and is shown to have superior performance over some prior methods, including least absolute deviation fitting, k-means clustering based segmentation in DjVu and shape primitive extraction and coding (SPEC) algorithm.
This paper considers how to separate text and/or graphics from smooth background in screen content and mixed document images and proposes two approaches to perform this segmentation task. The proposed methods make use of the fact that the background in each block is usually smoothly varying and can be modeled well by a linear combination of a few smoothly varying basis functions, while the foreground text and graphics create sharp discontinuity. The algorithms separate the background and foreground pixels by trying to fit background pixel values in the block into a smooth function using two different schemes. One is based on robust regression, where the inlier pixels will be considered as background, while remaining outlier pixels will be considered foreground. The second approach uses a sparse decomposition framework where the background and foreground layers are modeled with a smooth and sparse components respectively. These algorithms have been tested on images extracted from HEVC standard test sequences for screen content coding, and are shown to have superior performance over previous approaches. The proposed methods can be used in different applications such as text extraction, separate coding of background and foreground for compression of screen content, and medical image segmentation.
Palmprint recognition has drawn a lot of attention during the recent years. Many algorithms have been proposed for palmprint recognition in the past, majority of them being based on features extracted from the transform domain. Many of these transform domain features are not translation or rotation invariant, and therefore a great deal of preprocessing is needed to align the images. In this paper, a powerful image representation, called scattering network/transform, is used for palmprint recognition. Scattering network is a convolutional network where its architecture and filters are predefined wavelet transforms. The first layer of scattering network captures similar features to SIFT descriptors and the higher-layer features capture higher-frequency content of the signal which are lost in SIFT and other similar descriptors. After extraction of the scattering features, their dimensionality is reduced by applying principal component analysis (PCA) which reduces the computational complexity of the recognition task. Two different classifiers are used for recognition: multi-class SVM and minimum-distance classifier. The proposed scheme has been tested on a well-known palmprint database and achieved accuracy rate of 99.95% and 100% using minimum distance classifier and SVM respectively.
Personal identification problem has been a major field of research in recent years. Biometrics-based technologies that exploit fingerprints, iris, face, voice and palmprints, have been in the center of attention to solve this problem. Palmprints can be used instead of fingerprints that have been of the earliest of these biometrics technologies. A palm is covered with the same skin as the fingertips but has a larger surface, giving us more information than the fingertips. The major features of the palm are palm-lines, including principal lines, wrinkles and ridges. Using these lines is one of the most popular approaches towards solving the palmprint recognition problem. Another robust feature is the wavelet energy of palms. In this paper we used a hybrid feature which combines both of these features. %Moreover, multispectral analysis is applied to improve the performance of the system. At the end, minimum distance classifier is used to match test images with one of the training samples. The proposed algorithm has been tested on a well-known multispectral palmprint dataset and achieved an average accuracy of 98.8\%.