Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

NUBIA: NeUral Based Interchangeability Assessor for Text Generation

May 01, 2020
Hassan Kane, Muhammed Yusuf Kocyigit, Ali Abdalla, Pelkins Ajanoh, Mohamed Coulibali

Figure 1 for NUBIA: NeUral Based Interchangeability Assessor for Text Generation

Figure 2 for NUBIA: NeUral Based Interchangeability Assessor for Text Generation

Figure 3 for NUBIA: NeUral Based Interchangeability Assessor for Text Generation

Figure 4 for NUBIA: NeUral Based Interchangeability Assessor for Text Generation

We present NUBIA, a methodology to build automatic evaluation metrics for text generation using only machine learning models as core components. A typical NUBIA model is composed of three modules: a neural feature extractor, an aggregator and a calibrator. We demonstrate an implementation of NUBIA which outperforms metrics currently used to evaluate machine translation, summaries and slightly exceeds/matches state of the art metrics on correlation with human judgement on the WMT segment-level Direct Assessment task, sentence-level ranking and image captioning evaluation. The model implemented is modular, explainable and set to continuously improve over time.

* 8 pages, 5 tables, and 2 figures

Via

Access Paper or Ask Questions

Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

Apr 09, 2017
Dat Tien Nguyen, Firoj Alam, Ferda Ofli, Muhammad Imran

Figure 1 for Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

Figure 2 for Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

Figure 3 for Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

Figure 4 for Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

The extensive use of social media platforms, especially during disasters, creates unique opportunities for humanitarian organizations to gain situational awareness and launch relief operations accordingly. In addition to the textual content, people post overwhelming amounts of imagery data on social networks within minutes of a disaster hit. Studies point to the importance of this online imagery content for emergency response. Despite recent advances in the computer vision field, automatic processing of the crisis-related social media imagery data remains a challenging task. It is because a majority of which consists of redundant and irrelevant content. In this paper, we present an image processing pipeline that comprises de-duplication and relevancy filtering mechanisms to collect and filter social media image content in real-time during a crisis event. Results obtained from extensive experiments on real-world crisis datasets demonstrate the significance of the proposed pipeline for optimal utilization of both human and machine computing resources.

* Accepted for publication in the 14th International Conference on Information Systems For Crisis Response and Management (ISCRAM), 2017

Via

Access Paper or Ask Questions

Image reconstruction from dense binary pixels

Dec 06, 2015
Or Litany, Tal Remez, Alex Bronstein

Figure 1 for Image reconstruction from dense binary pixels

Figure 2 for Image reconstruction from dense binary pixels

Figure 3 for Image reconstruction from dense binary pixels

Figure 4 for Image reconstruction from dense binary pixels

Recently, the dense binary pixel Gigavision camera had been introduced, emulating a digital version of the photographic film. While seems to be a promising solution for HDR imaging, its output is not directly usable and requires an image reconstruction process. In this work, we formulate this problem as the minimization of a convex objective combining a maximum-likelihood term with a sparse synthesis prior. We present MLNet - a novel feed-forward neural network, producing acceptable output quality at a fixed complexity and is two orders of magnitude faster than iterative algorithms. We present state of the art results in the abstract.

* Signal Processing with Adaptive Sparse Structured Representations (SPARS 2015)

Via

Access Paper or Ask Questions

AQD: Towards Accurate Quantized Object Detection

Aug 03, 2020
Jing Liu, Bohan Zhuang, Peng Chen, Mingkui Tan, Chunhua Shen

Figure 1 for AQD: Towards Accurate Quantized Object Detection

Figure 2 for AQD: Towards Accurate Quantized Object Detection

Figure 3 for AQD: Towards Accurate Quantized Object Detection

Figure 4 for AQD: Towards Accurate Quantized Object Detection

Network quantization aims to lower the bitwidth of weights and activations and hence reduce the model size and accelerate the inference of deep networks. Even though existing quantization methods have achieved promising performance on image classification, applying aggressively low bitwidth quantization on object detection while preserving the performance is still a challenge. In this paper, we demonstrate that the poor performance of the quantized network on object detection comes from the inaccurate batch statistics of batch normalization. To solve this, we propose an accurate quantized object detection (AQD) method. Specifically, we propose to employ multi-level batch normalization (multi-level BN) to estimate the batch statistics of each detection head separately. We further propose a learned interval quantization method to improve how the quantizer itself is configured. To evaluate the performance of the proposed methods, we apply AQD to two one-stage detectors (i.e., RetinaNet and FCOS). Experimental results on COCO show that our methods achieve near-lossless performance compared with the full-precision model by using extremely low bitwidth regimes such as 3-bit. In particular, we even outperform the full-precision counterpart by a large margin with a 4-bit detector, which is of great practical value.

* Code is available at https://github.com/blueardour/model-quantization

Via

Access Paper or Ask Questions

Lesion Mask-based Simultaneous Synthesis of Anatomic and MolecularMR Images using a GAN

Jul 05, 2020
Pengfei Guo, Puyang Wang, Jinyuan Zhou, Vishal Patel, Shanshan Jiang

Figure 1 for Lesion Mask-based Simultaneous Synthesis of Anatomic and MolecularMR Images using a GAN

Figure 2 for Lesion Mask-based Simultaneous Synthesis of Anatomic and MolecularMR Images using a GAN

Figure 3 for Lesion Mask-based Simultaneous Synthesis of Anatomic and MolecularMR Images using a GAN

Figure 4 for Lesion Mask-based Simultaneous Synthesis of Anatomic and MolecularMR Images using a GAN

Data-driven automatic approaches have demonstrated their great potential in resolving various clinical diagnostic dilemmas for patients with malignant gliomas in neuro-oncology with the help of conventional and advanced molecular MR images. However, the lack of sufficient annotated MRI data has vastly impeded the development of such automatic methods. Conventional data augmentation approaches, including flipping, scaling, rotation, and distortion are not capable of generating data with diverse image content. In this paper, we propose a generative adversarial network (GAN), which can simultaneously synthesize data from arbitrary manipulated lesion information on multiple anatomic and molecular MRI sequences, including T1-weighted (T1w), gadolinium enhanced T1w (Gd-T1w), T2-weighted (T2w), fluid-attenuated inversion recovery (FLAIR), and amide proton transfer-weighted (APTw). The proposed framework consists of a stretch-out up-sampling module, a brain atlas encoder, a segmentation consistency module, and multi-scale labelwise discriminators. Extensive experiments on real clinical data demonstrate that the proposed model can perform significantly better than the state-of-the-art synthesis methods.

* MICCAI 2020

Via

Access Paper or Ask Questions

Fast Glare Detection in Document Images

Oct 24, 2019
Dmitry Rodin, Nikita Orlov

Figure 1 for Fast Glare Detection in Document Images

Figure 2 for Fast Glare Detection in Document Images

Figure 3 for Fast Glare Detection in Document Images

Figure 4 for Fast Glare Detection in Document Images

Glare is a phenomenon that occurs when the scene has a reflection of a light source or has one in it. This luminescence can hide useful information from the image, making text recognition virtually impossible. In this paper, we propose an approach to detect glare in images taken by users via mobile devices. Our method divides the document into blocks and collects luminance features from the original image and black-white strokes histograms of the binarized image. Finally, glare is detected using a convolutional neural network on the aforementioned histograms and luminance features. The network consists of several feature extraction blocks, one for each type of input, and the detection block, which calculates the resulting glare heatmap based on the output of the extraction part. The proposed solution detects glare with high recall and f-score.

* 4 pages, Workshop on Industrial Applications of Document Analysis and Recognition 2019

Via

Access Paper or Ask Questions

Novel min-max reformulations of Linear Inverse Problems

Jul 05, 2020
Mohammed Rayyan Sheriff, Debasish Chatterjee

Figure 1 for Novel min-max reformulations of Linear Inverse Problems

Figure 2 for Novel min-max reformulations of Linear Inverse Problems

Figure 3 for Novel min-max reformulations of Linear Inverse Problems

Figure 4 for Novel min-max reformulations of Linear Inverse Problems

In this article, we dwell into the class of so-called ill-posed Linear Inverse Problems (LIP) which simply refers to the task of recovering the entire signal from its relatively few random linear measurements. Such problems arise in a variety of settings with applications ranging from medical image processing, recommender systems, etc. We propose a slightly generalized version of the error constrained linear inverse problem and obtain a novel and equivalent convex-concave min-max reformulation by providing an exposition to its convex geometry. Saddle points of the min-max problem are completely characterized in terms of a solution to the LIP, and vice versa. Applying simple saddle point seeking ascend-descent type algorithms to solve the min-max problems provides novel and simple algorithms to find a solution to the LIP. Moreover, the reformulation of an LIP as the min-max problem provided in this article is crucial in developing methods to solve the dictionary learning problem with almost sure recovery constraints.

Via

Access Paper or Ask Questions

Light Field Spatial Super-resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

Apr 05, 2020
Jing Jin, Junhui Hou, Jie Chen, Sam Kwong

Figure 1 for Light Field Spatial Super-resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

Figure 2 for Light Field Spatial Super-resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

Figure 3 for Light Field Spatial Super-resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

Figure 4 for Light Field Spatial Super-resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited sampling resources have to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challenging than traditional single-image SR. The performance of existing methods is still limited as they fail to thoroughly explore the coherence among LF views and are insufficient in accurately preserving the parallax structure of the scene. In this paper, we propose a novel learning-based LF spatial SR framework, in which each view of an LF image is first individually super-resolved by exploring the complementary information among views with combinatorial geometry embedding. For accurate preservation of the parallax structure among the reconstructed views, a regularization network trained over a structure-aware loss function is subsequently appended to enforce correct parallax relationships over the intermediate estimation. Our proposed approach is evaluated over datasets with a large number of testing images including both synthetic and real-world scenes. Experimental results demonstrate the advantage of our approach over state-of-the-art methods, i.e., our method not only improves the average PSNR by more than 1.0 dB but also preserves more accurate parallax details, at a lower computational cost.

* This paper was accepted by CVPR 2020

Via

Access Paper or Ask Questions

Online Invariance Selection for Local Feature Descriptors

Jul 20, 2020
Rémi Pautrat, Viktor Larsson, Martin R. Oswald, Marc Pollefeys

Figure 1 for Online Invariance Selection for Local Feature Descriptors

Figure 2 for Online Invariance Selection for Local Feature Descriptors

Figure 3 for Online Invariance Selection for Local Feature Descriptors

Figure 4 for Online Invariance Selection for Local Feature Descriptors

To be invariant, or not to be invariant: that is the question formulated in this work about local descriptors. A limitation of current feature descriptors is the trade-off between generalization and discriminative power: more invariance means less informative descriptors. We propose to overcome this limitation with a disentanglement of invariance in local descriptors and with an online selection of the most appropriate invariance given the context. Our framework consists in a joint learning of multiple local descriptors with different levels of invariance and of meta descriptors encoding the regional variations of an image. The similarity of these meta descriptors across images is used to select the right invariance when matching the local descriptors. Our approach, named Local Invariance Selection at Runtime for Descriptors (LISRD), enables descriptors to adapt to adverse changes in images, while remaining discriminative when invariance is not required. We demonstrate that our method can boost the performance of current descriptors and outperforms state-of-the-art descriptors in several matching tasks, when evaluated on challenging datasets with day-night illumination as well as viewpoint changes.

* 27 pages, Accepted at ECCV 2020 (Oral)

Via

Access Paper or Ask Questions

Learning from Multiple Datasets with Heterogeneous and Partial Labels for Universal Lesion Detection in CT

Sep 05, 2020
Ke Yan, Jinzheng Cai, Youjing Zheng, Adam P. Harrison, Dakai Jin, You-Bao Tang, Yu-Xing Tang, Lingyun Huang, Jing Xiao, Le Lu

Figure 1 for Learning from Multiple Datasets with Heterogeneous and Partial Labels for Universal Lesion Detection in CT

Figure 2 for Learning from Multiple Datasets with Heterogeneous and Partial Labels for Universal Lesion Detection in CT

Figure 3 for Learning from Multiple Datasets with Heterogeneous and Partial Labels for Universal Lesion Detection in CT

Figure 4 for Learning from Multiple Datasets with Heterogeneous and Partial Labels for Universal Lesion Detection in CT

Large-scale datasets with high-quality labels are desired for training accurate deep learning models. However, due to annotation costs, medical imaging datasets are often either partially-labeled or small. For example, DeepLesion is a large-scale CT image dataset with lesions of various types, but it also has many unlabeled lesions (missing annotations). When training a lesion detector on a partially-labeled dataset, the missing annotations will generate incorrect negative signals and degrade performance. Besides DeepLesion, there are several small single-type datasets, such as LUNA for lung nodules and LiTS for liver tumors. Such datasets have heterogeneous label scopes, i.e., different lesion types are labeled in different datasets with other types ignored. In this work, we aim to tackle the problem of heterogeneous and partial labels, and develop a universal lesion detection algorithm to detect a comprehensive variety of lesions. First, we build a simple yet effective lesion detection framework named Lesion ENSemble (LENS). LENS can efficiently learn from multiple heterogeneous lesion datasets in a multi-task fashion and leverage their synergy by feature sharing and proposal fusion. Next, we propose strategies to mine missing annotations from partially-labeled datasets by exploiting clinical prior knowledge and cross-dataset knowledge transfer. Finally, we train our framework on four public lesion datasets and evaluate it on 800 manually-labeled sub-volumes in DeepLesion. On this challenging task, our method brings a relative improvement of 49% compared to the current state-of-the-art approach.

* In submission

Via

Access Paper or Ask Questions