Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Image inpainting: A review

Sep 13, 2019
Omar Elharrouss, Noor Almaadeed, Somaya Al-Maadeed, Younes Akbari

Although image inpainting, or the art of repairing the old and deteriorated images, has been around for many years, it has gained even more popularity because of the recent development in image processing techniques. With the improvement of image processing tools and the flexibility of digital image editing, automatic image inpainting has found important applications in computer vision and has also become an important and challenging topic of research in image processing. This paper is a brief review of the existing image inpainting approaches we first present a global vision on the existing methods for image inpainting. We attempt to collect most of the existing approaches and classify them into three categories, namely, sequential-based, CNN-based and GAN-based methods. In addition, for each category, a list of methods for the different types of distortion on the images is presented. Furthermore, collect a list of the available datasets and discuss these in our paper. This is a contribution for digital image inpainting researchers trying to look for the available datasets because there is a lack of datasets available for image inpainting. As the final step in this overview, we present the results of real evaluations of the three categories of image inpainting methods performed on the datasets used, for the different types of image distortion. In the end, we also present the evaluations metrics and discuss the performance of these methods in terms of these metrics. This overview can be used as a reference for image inpainting researchers, and it can also facilitate the comparison of the methods as well as the datasets used. The main contribution of this paper is the presentation of the three categories of image inpainting methods along with a list of available datasets that the researchers can use to evaluate their proposed methodology against.

  Access Paper or Ask Questions

ELKPPNet: An Edge-aware Neural Network with Large Kernel Pyramid Pooling for Learning Discriminative Features in Semantic Segmentation

Jun 27, 2019
Xianwei Zheng, Linxi Huan, Hanjiang Xiong, Jianya Gong

Semantic segmentation has been a hot topic across diverse research fields. Along with the success of deep convolutional neural networks, semantic segmentation has made great achievements and improvements, in terms of both urban scene parsing and indoor semantic segmentation. However, most of the state-of-the-art models are still faced with a challenge in discriminative feature learning, which limits the ability of a model to detect multi-scale objects and to guarantee semantic consistency inside one object or distinguish different adjacent objects with similar appearance. In this paper, a practical and efficient edge-aware neural network is presented for semantic segmentation. This end-to-end trainable engine consists of a new encoder-decoder network, a large kernel spatial pyramid pooling (LKPP) block, and an edge-aware loss function. The encoder-decoder network was designed as a balanced structure to narrow the semantic and resolution gaps in multi-level feature aggregation, while the LKPP block was constructed with a densely expanding receptive field for multi-scale feature extraction and fusion. Furthermore, the new powerful edge-aware loss function is proposed to refine the boundaries directly from the semantic segmentation prediction for more robust and discriminative features. The effectiveness of the proposed model was demonstrated using Cityscapes, CamVid, and NYUDv2 benchmark datasets. The performance of the two structures and the edge-aware loss function in ELKPPNet was validated on the Cityscapes dataset, while the complete ELKPPNet was evaluated on the CamVid and NYUDv2 datasets. A comparative analysis with the state-of-the-art methods under the same conditions confirmed the superiority of the proposed algorithm.

  Access Paper or Ask Questions

Knowledge-rich Image Gist Understanding Beyond Literal Meaning

Apr 18, 2019
Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Wolfgang Effelsberg, Laura Dietz

We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. To enable a thorough investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8,000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. We test the robustness of our gist detection approach when receiving automatically generated input, i.e., using automatically generated image tags or generated captions, and prove the feasibility of an end-to-end automated process.

* Data & Knowledge Engineering, Volume 117, September 2018, Pages 114-132 

  Access Paper or Ask Questions

EM-like Learning Chaotic Dynamics from Noisy and Partial Observations

Mar 25, 2019
Duong Nguyen, Said Ouala, Lucas Drumetz, Ronan Fablet

The identification of the governing equations of chaotic dynamical systems from data has recently emerged as a hot topic. While the seminal work by Brunton et al. reported proof-of-concepts for idealized observation setting for fully-observed systems, {\em i.e.} large signal-to-noise ratios and high-frequency sampling of all system variables, we here address the learning of data-driven representations of chaotic dynamics for partially-observed systems, including significant noise patterns and possibly lower and irregular sampling setting. Instead of considering training losses based on short-term prediction error like state-of-the-art learning-based schemes, we adopt a Bayesian formulation and state this issue as a data assimilation problem with unknown model parameters. To solve for the joint inference of the hidden dynamics and of model parameters, we combine neural-network representations and state-of-the-art assimilation schemes. Using iterative Expectation-Maximization (EM)-like procedures, the key feature of the proposed inference schemes is the derivation of the posterior of the hidden dynamics. Using a neural-network-based Ordinary Differential Equation (ODE) representation of these dynamics, we investigate two strategies: their combination to Ensemble Kalman Smoothers and Long Short-Term Memory (LSTM)-based variational approximations of the posterior. Through numerical experiments on the Lorenz-63 system with different noise and time sampling settings, we demonstrate the ability of the proposed schemes to recover and reproduce the hidden chaotic dynamics, including their Lyapunov characteristic exponents, when classic machine learning approaches fail.

  Access Paper or Ask Questions

Joint Embedding Learning and Low-Rank Approximation: A Framework for Incomplete Multi-view Learning

Dec 25, 2018
Hong Tao, Chenping Hou, Dongyun Yi, Jubo Zhu

In real-world applications, not all instances in multi-view data are fully represented. To deal with incomplete multi-view data, traditional multi-view algorithms usually throw away the incomplete instances, resulting in loss of available information. To overcome this loss, Incomplete Multi-view Learning (IML) has become a hot research topic. In this paper, we propose a general IML framework for unifying existing IML methods and gaining insight into IML. The proposed framework jointly performs embedding learning and low-rank approximation. Concretely, it approximates the incomplete data by a set of low-rank matrices and learns a full and common embedding by linear transformation. Several existing IML methods can be unified as special cases of the framework. More interestingly, some linear transformation based full-view methods can be adapted to IML directly with the guidance of the framework. This bridges the gap between full multi-view learning and IML. Moreover, the framework can provide guidance for developing new algorithms. For illustration, within the framework, we propose a specific method, termed as Incomplete Multi-view Learning with Block Diagonal Representation (IML-BDR). Based on the assumption that the sampled examples have approximate linear subspace structure, IML-BDR uses the block diagonal structure prior to learn the full embedding, which would lead to more correct clustering. A convergent alternating iterative algorithm with the Successive Over-Relaxation (SOR) optimization technique is devised for optimization. Experimental results on various datasets demonstrate the effectiveness of IML-BDR.

  Access Paper or Ask Questions

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Oct 03, 2018
Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, L. Jason Anastasopoulos

Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes existing methods into: (1) the choice of text representation, and (2) the choice of distance metric. We investigate how different choices within this framework affect both the quantity and quality of matches identified through a systematic multifactor evaluation experiment using human subjects. Altogether we evaluate over 100 unique text matching methods along with 5 comparison methods taken from the literature. Our experimental results identify methods that generate matches with higher subjective match quality than current state-of-the-art techniques. We enhance the precision of these results by developing a predictive model to estimate the match quality of pairs of text documents as a function of our various distance scores. This model, which we find successfully mimics human judgment, also allows for approximate and unsupervised evaluation of new procedures. We then employ the identified best method to illustrate the utility of text matching in two applications. First, we engage with a substantive debate in the study of media bias by using text matching to control for topic selection when comparing news articles from thirteen news sources. We then show how conditioning on text data leads to more precise causal inferences in an observational study examining the effects of a medical intervention.

  Access Paper or Ask Questions

Synthesizing CT from Ultrashort Echo-Time MR Images via Convolutional Neural Networks

Jul 27, 2018
Snehashis Roy, John A. Butman, Dzung L. Pham

With the increasing popularity of PET-MR scanners in clinical applications, synthesis of CT images from MR has been an important research topic. Accurate PET image reconstruction requires attenuation correction, which is based on the electron density of tissues and can be obtained from CT images. While CT measures electron density information for x-ray photons, MR images convey information about the magnetic properties of tissues. Therefore, with the advent of PET-MR systems, the attenuation coefficients need to be indirectly estimated from MR images. In this paper, we propose a fully convolutional neural network (CNN) based method to synthesize head CT from ultra-short echo-time (UTE) dual-echo MR images. Unlike traditional $T_1$-w images which do not have any bone signal, UTE images show some signal for bone, which makes it a good candidate for MR to CT synthesis. A notable advantage of our approach is that accurate results were achieved with a small training data set. Using an atlas of a single CT and dual-echo UTE pair, we train a deep neural network model to learn the transform of MR intensities to CT using patches. We compared our CNN based model with a state-of-the-art registration based as well as a Bayesian model based CT synthesis method, and showed that the proposed CNN model outperforms both of them. We also compared the proposed model when only $T_1$-w images are available instead of UTE, and show that UTE images produce better synthesis than using just $T_1$-w images.

*, International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI 2017) 

  Access Paper or Ask Questions

Troubling Trends in Machine Learning Scholarship

Jul 26, 2018
Zachary C. Lipton, Jacob Steinhardt

Collectively, machine learning (ML) researchers are engaged in the creation and dissemination of knowledge about data-driven algorithms. In a given paper, researchers might aspire to any subset of the following goals, among others: to theoretically characterize what is learnable, to obtain understanding through empirically rigorous experiments, or to build a working system that has high predictive accuracy. While determining which knowledge warrants inquiry may be subjective, once the topic is fixed, papers are most valuable to the community when they act in service of the reader, creating foundational knowledge and communicating as clearly as possible. Recent progress in machine learning comes despite frequent departures from these ideals. In this paper, we focus on the following four patterns that appear to us to be trending in ML scholarship: (i) failure to distinguish between explanation and speculation; (ii) failure to identify the sources of empirical gains, e.g., emphasizing unnecessary modifications to neural architectures when gains actually stem from hyper-parameter tuning; (iii) mathiness: the use of mathematics that obfuscates or impresses rather than clarifies, e.g., by confusing technical and non-technical concepts; and (iv) misuse of language, e.g., by choosing terms of art with colloquial connotations or by overloading established technical terms. While the causes behind these patterns are uncertain, possibilities include the rapid expansion of the community, the consequent thinness of the reviewer pool, and the often-misaligned incentives between scholarship and short-term measures of success (e.g., bibliometrics, attention, and entrepreneurial opportunity). While each pattern offers a corresponding remedy (don't do it), we also discuss some speculative suggestions for how the community might combat these trends.

* Presented at ICML 2018: The Debates 

  Access Paper or Ask Questions

Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters

Apr 10, 2017
Shiyu Huang, Deva Ramanan

As autonomous vehicles become an every-day reality, high-accuracy pedestrian detection is of paramount practical importance. Pedestrian detection is a highly researched topic with mature methods, but most datasets focus on common scenes of people engaged in typical walking poses on sidewalks. But performance is most crucial for dangerous scenarios, such as children playing in the street or people using bicycles/skateboards in unexpected ways. Such "in-the-tail" data is notoriously hard to observe, making both training and testing difficult. To analyze this problem, we have collected a novel annotated dataset of dangerous scenarios called the Precarious Pedestrian dataset. Even given a dedicated collection effort, it is relatively small by contemporary standards (around 1000 images). To allow for large-scale data-driven learning, we explore the use of synthetic data generated by a game engine. A significant challenge is selected the right "priors" or parameters for synthesis: we would like realistic data with poses and object configurations that mimic true Precarious Pedestrians. Inspired by Generative Adversarial Networks (GANs), we generate a massive amount of synthetic data and train a discriminative classifier to select a realistic subset, which we deem the Adversarial Imposters. We demonstrate that this simple pipeline allows one to synthesize realistic training data by making use of rendering/animation engines within a GAN framework. Interestingly, we also demonstrate that such data can be used to rank algorithms, suggesting that Adversarial Imposters can also be used for "in-the-tail" validation at test-time, a notoriously difficult challenge for real-world deployment.

* To appear in CVPR 2017 

  Access Paper or Ask Questions

Making Early Predictions of the Accuracy of Machine Learning Applications

Dec 05, 2012
J. E. Smith, P. Caleb-Solly, M. A. Tahir, D. Sannen, H. van-Brussel

The accuracy of machine learning systems is a widely studied research topic. Established techniques such as cross-validation predict the accuracy on unseen data of the classifier produced by applying a given learning method to a given training data set. However, they do not predict whether incurring the cost of obtaining more data and undergoing further training will lead to higher accuracy. In this paper we investigate techniques for making such early predictions. We note that when a machine learning algorithm is presented with a training set the classifier produced, and hence its error, will depend on the characteristics of the algorithm, on training set's size, and also on its specific composition. In particular we hypothesise that if a number of classifiers are produced, and their observed error is decomposed into bias and variance terms, then although these components may behave differently, their behaviour may be predictable. We test our hypothesis by building models that, given a measurement taken from the classifier created from a limited number of samples, predict the values that would be measured from the classifier produced when the full data set is presented. We create separate models for bias, variance and total error. Our models are built from the results of applying ten different machine learning algorithms to a range of data sets, and tested with "unseen" algorithms and datasets. We analyse the results for various numbers of initial training samples, and total dataset sizes. Results show that our predictions are very highly correlated with the values observed after undertaking the extra training. Finally we consider the more complex case where an ensemble of heterogeneous classifiers is trained, and show how we can accurately estimate an upper bound on the accuracy achievable after further training.

* 35 pagers, 12 figures 

  Access Paper or Ask Questions