We develop and analyze a projected particle Langevin optimization method to learn the distribution in the Sch\"{o}nberg integral representation of the radial basis functions from training samples. More specifically, we characterize a distributionally robust optimization method with respect to the Wasserstein distance to optimize the distribution in the Sch\"{o}nberg integral representation. To provide theoretical performance guarantees, we analyze the scaling limits of a projected particle online (stochastic) optimization method in the mean-field regime. In particular, we prove that in the scaling limits, the empirical measure of the Langevin particles converges to the law of a reflected It\^{o} diffusion-drift process. Moreover, the drift is also a function of the law of the underlying process. Using It\^{o} lemma for semi-martingales and Grisanov's change of measure for the Wiener processes, we then derive a Mckean-Vlasov type partial differential equation (PDE) with Robin boundary conditions that describes the evolution of the empirical measure of the projected Langevin particles in the mean-field regime. In addition, we establish the existence and uniqueness of the steady-state solutions of the derived PDE in the weak sense. We apply our learning approach to train radial kernels in the kernel locally sensitive hash (LSH) functions, where the training data-set is generated via a $k$-mean clustering method on a small subset of data-base. We subsequently apply our kernel LSH with a trained kernel for image retrieval task on MNIST data-set, and demonstrate the efficacy of our kernel learning approach. We also apply our kernel learning approach in conjunction with the kernel support vector machines (SVMs) for classification of benchmark data-sets.
The counterfeit goods trade represents nowadays more than 3.3% of the whole world trade and thus it's a problem that needs now more than ever a lot of attention and a reliable solution that would reduce the negative impact it has over the modern society. This paper presents the design and early stage development of a novel counterfeit goods detection platform that makes use of the outstsanding learning capabilities of the classical VGG16 convolutional model trained through the process of "transfer learning" and a multi-stage fake detection procedure that proved to be not only reliable but also very robust in the experiments we have conducted so far using an image dataset of various goods which we gathered ourselves.
Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world. It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion. We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera. Specifically, we present a new curve-based approach to estimate accurate camera poses by establishing correspondences between featureless thin objects in the foreground in consecutive video frames, without requiring visual texture in the background scene to lock on. Enabled by this effective curve-based camera pose estimation strategy, we develop an iterative optimization method with tailored measures on geometry, topology as well as self-occlusion handling for reconstructing 3D thin structures. Extensive validations on a variety of thin structures show that our method achieves accurate camera pose estimation and faithful reconstruction of 3D thin structures with complex shape and topology at a level that has not been attained by other existing reconstruction methods.
Image Segmentation is a technique of partitioning the original image into some distinct classes. Many possible solutions may be available for segmenting an image into a certain number of classes, each one having different quality of segmentation. In our proposed method, multilevel thresholding technique has been used for image segmentation. A new approach of Cuckoo Search (CS) is used for selection of optimal threshold value. In other words, the algorithm is used to achieve the best solution from the initial random threshold values or solutions and to evaluate the quality of a solution correlation function is used. Finally, MSE and PSNR are measured to understand the segmentation quality.
Temporal segmentation of untrimmed videos and photo-streams is currently an active area of research in computer vision and image processing. This paper proposes a new approach to improve the temporal segmentation of photo-streams. The method consists in enhancing image representations by encoding long-range temporal dependencies. Our key contribution is to take advantage of the temporal stationarity assumption of photostreams for modeling each frame by its nonlocal self-similarity function. The proposed approach is put to test on the EDUB-Seg dataset, a standard benchmark for egocentric photostream temporal segmentation. Starting from seven different (CNN based) image features, the method yields consistent improvements in event segmentation quality, leading to an average increase of F-measure of 3.71% with respect to the state of the art.
We propose the design of an original scalable image coder/decoder that is inspired from the mammalians retina. Our coder accounts for the time-dependent and also nondeterministic behavior of the actual retina. The present work brings two main contributions: As a first step, (i) we design a deterministic image coder mimicking most of the retinal processing stages and then (ii) we introduce a retinal noise in the coding process, that we model here as a dither signal, to gain interesting perceptual features. Regarding our first contribution, our main source of inspiration will be the biologically plausible model of the retina called Virtual Retina. The main novelty of this coder is to show that the time-dependent behavior of the retina cells could ensure, in an implicit way, scalability and bit allocation. Regarding our second contribution, we reconsider the inner layers of the retina. We emit a possible interpretation for the non-determinism observed by neurophysiologists in their output. For this sake, we model the retinal noise that occurs in these layers by a dither signal. The dithering process that we propose adds several interesting features to our image coder. The dither noise whitens the reconstruction error and decorrelates it from the input stimuli. Furthermore, integrating the dither noise in our coder allows a faster recognition of the fine details of the image during the decoding process. Our present paper goal is twofold. First, we aim at mimicking as closely as possible the retina for the design of a novel image coder while keeping encouraging performances. Second, we bring a new insight concerning the non-deterministic behavior of the retina.
With the development of image segmentation in computer vision, biomedical image segmentation have achieved remarkable progress on brain tumor segmentation and Organ At Risk (OAR) segmentation. However, most of the research only uses single modality such as Computed Tomography (CT) scans while in real world scenario doctors often use multiple modalities to get more accurate result. To better leverage different modalities, we have collected a large dataset consists of 136 cases with CT and MR images which diagnosed with nasopharyngeal cancer. In this paper, we propose to use Generative Adversarial Network to perform CT to MR transformation to synthesize MR images instead of aligning two modalities. The synthesized MR can be jointly trained with CT to achieve better performance. In addition, we use instance segmentation model to extend the OAR segmentation task to segment both organs and tumor region. The collected dataset will be made public soon.
Polarimetric synthetic aperture radar (PolSAR) images are widely used in disaster detection and military reconnaissance and so on. However, their interpretation faces some challenges, e.g., deficiency of labeled data, inadequate utilization of data information and so on. In this paper, a complex-valued generative adversarial network (GAN) is proposed for the first time to address these issues. The complex number form of model complies with the physical mechanism of PolSAR data and in favor of utilizing and retaining amplitude and phase information of PolSAR data. GAN architecture and semi-supervised learning are combined to handle deficiency of labeled data. GAN expands training data and semi-supervised learning is used to train network with generated, labeled and unlabeled data. Experimental results on two benchmark data sets show that our model outperforms existing state-of-the-art models, especially for conditions with fewer labeled data.
In this paper, we implemented both sequential and parallel version of fractal image compression algorithms using CUDA (Compute Unified Device Architecture) programming model for parallelizing the program in Graphics Processing Unit for medical images, as they are highly similar within the image itself. There are several improvement in the implementation of the algorithm as well. Fractal image compression is based on the self similarity of an image, meaning an image having similarity in majority of the regions. We take this opportunity to implement the compression algorithm and monitor the effect of it using both parallel and sequential implementation. Fractal compression has the property of high compression rate and the dimensionless scheme. Compression scheme for fractal image is of two kind, one is encoding and another is decoding. Encoding is very much computational expensive. On the other hand decoding is less computational. The application of fractal compression to medical images would allow obtaining much higher compression ratios. While the fractal magnification an inseparable feature of the fractal compression would be very useful in presenting the reconstructed image in a highly readable form. However, like all irreversible methods, the fractal compression is connected with the problem of information loss, which is especially troublesome in the medical imaging. A very time consuming encoding pro- cess, which can last even several hours, is another bothersome drawback of the fractal compression.
Planning is a powerful approach to control problems with known environment dynamics. In unknown environments the agent needs to learn a model of the system dynamics to make planning applicable. This is particularly challenging when the underlying states are only indirectly observable through images. We propose to learn a deep latent Gaussian process dynamics (DLGPD) model that learns low-dimensional system dynamics from environment interactions with visual observations. The method infers latent state representations from observations using neural networks and models the system dynamics in the learned latent space with Gaussian processes. All parts of the model can be trained jointly by optimizing a lower bound on the likelihood of transitions in image space. We evaluate the proposed approach on the pendulum swing-up task while using the learned dynamics model for planning in latent space in order to solve the control problem. We also demonstrate that our method can quickly adapt a trained agent to changes in the system dynamics from just a few rollouts. We compare our approach to a state-of-the-art purely deep learning based method and demonstrate the advantages of combining Gaussian processes with deep learning for data efficiency and transfer learning.