Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

Deep Decoding of $\ell_\infty$-coded Light Field Images

Jan 24, 2022
Muhammad Umair Mukati, Xi Zhang, Xiaolin Wu, Søren Forchhammer

$Figure 1 for Deep Decoding of $\ell_\infty$-coded Light Field Images$

$Figure 2 for Deep Decoding of $\ell_\infty$-coded Light Field Images$

$Figure 3 for Deep Decoding of $\ell_\infty$-coded Light Field Images$

$Figure 4 for Deep Decoding of $\ell_\infty$-coded Light Field Images$

To enrich the functionalities of traditional cameras, light field cameras record both the intensity and direction of light rays, so that images can be rendered with user-defined camera parameters via computations. The added capability and flexibility are gained at the cost of gathering typically more than $100\times$ greater amount of information than conventional images. To cope with this issue, several light field compression schemes have been introduced. However, their ways of exploiting correlations of multidimensional light field data are complex and are hence not suited for inexpensive light field cameras. In this work, we propose a novel $\ell_\infty$-constrained light-field image compression system that has a very low-complexity DPCM encoder and a CNN-based deep decoder. Targeting high-fidelity reconstruction, the CNN decoder capitalizes on the $\ell_\infty$-constraint and light field properties to remove the compression artifacts and achieves significantly better performance than existing state-of-the-art $\ell_2$-based light field compression methods.

Via

Access Paper or Ask Questions

ADOP: Approximate Differentiable One-Pixel Point Rendering

Oct 13, 2021
Darius Rückert, Linus Franke, Marc Stamminger

Figure 1 for ADOP: Approximate Differentiable One-Pixel Point Rendering

Figure 2 for ADOP: Approximate Differentiable One-Pixel Point Rendering

Figure 3 for ADOP: Approximate Differentiable One-Pixel Point Rendering

Figure 4 for ADOP: Approximate Differentiable One-Pixel Point Rendering

We present a novel point-based, differentiable neural rendering pipeline for scene refinement and novel view synthesis. The input are an initial estimate of the point cloud and the camera parameters. The output are synthesized images from arbitrary camera poses. The point cloud rendering is performed by a differentiable renderer using multi-resolution one-pixel point rasterization. Spatial gradients of the discrete rasterization are approximated by the novel concept of ghost geometry. After rendering, the neural image pyramid is passed through a deep neural network for shading calculations and hole-filling. A differentiable, physically-based tonemapper then converts the intermediate output to the target image. Since all stages of the pipeline are differentiable, we optimize all of the scene's parameters i.e. camera model, camera pose, point position, point color, environment map, rendering network weights, vignetting, camera response function, per image exposure, and per image white balance. We show that our system is able to synthesize sharper and more consistent novel views than existing approaches because the initial reconstruction is refined during training. The efficient one-pixel point rasterization allows us to use arbitrary camera models and display scenes with well over 100M points in real time.

Via

Access Paper or Ask Questions

Hierarchical Image Classification using Entailment Cone Embeddings

Apr 02, 2020
Ankit Dhall, Anastasia Makarova, Octavian Ganea, Dario Pavllo, Michael Greeff, Andreas Krause

Figure 1 for Hierarchical Image Classification using Entailment Cone Embeddings

Figure 2 for Hierarchical Image Classification using Entailment Cone Embeddings

Figure 3 for Hierarchical Image Classification using Entailment Cone Embeddings

Figure 4 for Hierarchical Image Classification using Entailment Cone Embeddings

Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leveraging information about the semantic hierarchy embedded in class labels. We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions using order-preserving embeddings governed by both Euclidean and hyperbolic geometries, prevalent in natural language, and tailor them to hierarchical image classification and representation learning. We empirically validate all the models on the hierarchical ETHEC dataset.

Via

Access Paper or Ask Questions

Learning to Improve Image Compression without Changing the Standard Decoder

Sep 29, 2020
Yannick Strümpler, Ren Yang, Radu Timofte

Figure 1 for Learning to Improve Image Compression without Changing the Standard Decoder

Figure 2 for Learning to Improve Image Compression without Changing the Standard Decoder

Figure 3 for Learning to Improve Image Compression without Changing the Standard Decoder

Figure 4 for Learning to Improve Image Compression without Changing the Standard Decoder

In recent years we have witnessed an increasing interest in applying Deep Neural Networks (DNNs) to improve the rate-distortion performance in image compression. However, the existing approaches either train a post-processing DNN on the decoder side, or propose learning for image compression in an end-to-end manner. This way, the trained DNNs are required in the decoder, leading to the incompatibility to the standard image decoders (e.g., JPEG) in personal computers and mobiles. Therefore, we propose learning to improve the encoding performance with the standard decoder. In this paper, We work on JPEG as an example. Specifically, a frequency-domain pre-editing method is proposed to optimize the distribution of DCT coefficients, aiming at facilitating the JPEG compression. Moreover, we propose learning the JPEG quantization table jointly with the pre-editing network. Most importantly, we do not modify the JPEG decoder and therefore our approach is applicable when viewing images with the widely used standard JPEG decoder. The experiments validate that our approach successfully improves the rate-distortion performance of JPEG in terms of various quality metrics, such as PSNR, MS-SSIM and LPIPS. Visually, this translates to better overall color retention especially when strong compression is applied. The codes are available at https://github.com/YannickStruempler/LearnedJPEG.

* Accepted to ECCV AIM Workshop

Via

Access Paper or Ask Questions

Multi-relation Message Passing for Multi-label Text Classification

Feb 10, 2022
Muberra Ozmen, Hao Zhang, Pengyun Wang, Mark Coates

Figure 1 for Multi-relation Message Passing for Multi-label Text Classification

Figure 2 for Multi-relation Message Passing for Multi-label Text Classification

Figure 3 for Multi-relation Message Passing for Multi-label Text Classification

Figure 4 for Multi-relation Message Passing for Multi-label Text Classification

A well-known challenge associated with the multi-label classification problem is modelling dependencies between labels. Most attempts at modelling label dependencies focus on co-occurrences, ignoring the valuable information that can be extracted by detecting label subsets that rarely occur together. For example, consider customer product reviews; a product probably would not simultaneously be tagged by both "recommended" (i.e., reviewer is happy and recommends the product) and "urgent" (i.e., the review suggests immediate action to remedy an unsatisfactory experience). Aside from the consideration of positive and negative dependencies, the direction of a relationship should also be considered. For a multi-label image classification problem, the "ship" and "sea" labels have an obvious dependency, but the presence of the former implies the latter much more strongly than the other way around. These examples motivate the modelling of multiple types of bi-directional relationships between labels. In this paper, we propose a novel method, entitled Multi-relation Message Passing (MrMP), for the multi-label classification problem. Experiments on benchmark multi-label text classification datasets show that the MrMP module yields similar or superior performance compared to state-of-the-art methods. The approach imposes only minor additional computational and memory overheads.

Via

Access Paper or Ask Questions

Shape-consistent Generative Adversarial Networks for multi-modal Medical segmentation maps

Jan 24, 2022
Leo Segre, Or Hirschorn, Dvir Ginzburg, Dan Raviv

Figure 1 for Shape-consistent Generative Adversarial Networks for multi-modal Medical segmentation maps

Figure 2 for Shape-consistent Generative Adversarial Networks for multi-modal Medical segmentation maps

Figure 3 for Shape-consistent Generative Adversarial Networks for multi-modal Medical segmentation maps

Figure 4 for Shape-consistent Generative Adversarial Networks for multi-modal Medical segmentation maps

Image translation across domains for unpaired datasets has gained interest and great improvement lately. In medical imaging, there are multiple imaging modalities, with very different characteristics. Our goal is to use cross-modality adaptation between CT and MRI whole cardiac scans for semantic segmentation. We present a segmentation network using synthesised cardiac volumes for extremely limited datasets. Our solution is based on a 3D cross-modality generative adversarial network to share information between modalities and generate synthesized data using unpaired datasets. Our network utilizes semantic segmentation to improve generator shape consistency, thus creating more realistic synthesised volumes to be used when re-training the segmentation network. We show that improved segmentation can be achieved on small datasets when using spatial augmentations to improve a generative adversarial network. These augmentations improve the generator capabilities, thus enhancing the performance of the Segmentor. Using only 16 CT and 16 MRI cardiovascular volumes, improved results are shown over other segmentation methods while using the suggested architecture.

Via

Access Paper or Ask Questions

Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models

Jan 24, 2022
Changyu Chen, Avinandan Bose, Shih-Fen Cheng, Arunesh Sinha

Figure 1 for Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models

Figure 2 for Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models

Figure 3 for Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models

Figure 4 for Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models

Realistic fine-grained multi-agent simulation of real-world complex systems is crucial for many downstream tasks such as reinforcement learning. Recent work has used generative models (GANs in particular) for providing high-fidelity simulation of real-world systems. However, such generative models are often monolithic and miss out on modeling the interaction in multi-agent systems. In this work, we take a first step towards building multiple interacting generative models (GANs) that reflects the interaction in real world. We build and analyze a hierarchical set-up where a higher-level GAN is conditioned on the output of multiple lower-level GANs. We present a technique of using feedback from the higher-level GAN to improve performance of lower-level GANs. We mathematically characterize the conditions under which our technique is impactful, including understanding the transfer learning nature of our set-up. We present three distinct experiments on synthetic data, time series data, and image domain, revealing the wide applicability of our technique.

Via

Access Paper or Ask Questions

Panoptic Segmentation Meets Remote Sensing

Nov 23, 2021
Osmar Luiz Ferreira de Carvalho, Osmar Abílio de Carvalho Júnior, Cristiano Rosa e Silva, Anesmar Olino de Albuquerque, Nickolas Castro Santana, Dibio Leandro Borges, Roberto Arnaldo Trancoso Gomes, Renato Fontes Guimarães

Figure 1 for Panoptic Segmentation Meets Remote Sensing

Figure 2 for Panoptic Segmentation Meets Remote Sensing

Figure 3 for Panoptic Segmentation Meets Remote Sensing

Figure 4 for Panoptic Segmentation Meets Remote Sensing

Panoptic segmentation combines instance and semantic predictions, allowing the detection of "things" and "stuff" simultaneously. Effectively approaching panoptic segmentation in remotely sensed data can be auspicious in many challenging problems since it allows continuous mapping and specific target counting. Several difficulties have prevented the growth of this task in remote sensing: (a) most algorithms are designed for traditional images, (b) image labelling must encompass "things" and "stuff" classes, and (c) the annotation format is complex. Thus, aiming to solve and increase the operability of panoptic segmentation in remote sensing, this study has five objectives: (1) create a novel data preparation pipeline for panoptic segmentation, (2) propose an annotation conversion software to generate panoptic annotations; (3) propose a novel dataset on urban areas, (4) modify the Detectron2 for the task, and (5) evaluate difficulties of this task in the urban setting. We used an aerial image with a 0,24-meter spatial resolution considering 14 classes. Our pipeline considers three image inputs, and the proposed software uses point shapefiles for creating samples in the COCO format. Our study generated 3,400 samples with 512x512 pixel dimensions. We used the Panoptic-FPN with two backbones (ResNet-50 and ResNet-101), and the model evaluation considered semantic instance and panoptic metrics. We obtained 93.9, 47.7, and 64.9 for the mean IoU, box AP, and PQ. Our study presents the first effective pipeline for panoptic segmentation and an extensive database for other researchers to use and deal with other data or related problems requiring a thorough scene understanding.

* 43 pages, 10 figures, submitted to journal

Via

Access Paper or Ask Questions

Semi-automated Virtual Unfolded View Generation Method of Stomach from CT Volumes

Jan 14, 2022
Masahiro Oda, Tomoaki Suito, Yuichiro Hayashi, Takayuki Kitasaka, Kazuhiro Furukawa, Ryoji Miyahara, Yoshiki Hirooka, Hidemi Goto, Gen Iinuma, Kazunari Misawa, Shigeru Nawano, Kensaku Mori

Figure 1 for Semi-automated Virtual Unfolded View Generation Method of Stomach from CT Volumes

Figure 2 for Semi-automated Virtual Unfolded View Generation Method of Stomach from CT Volumes

Figure 3 for Semi-automated Virtual Unfolded View Generation Method of Stomach from CT Volumes

Figure 4 for Semi-automated Virtual Unfolded View Generation Method of Stomach from CT Volumes

CT image-based diagnosis of the stomach is developed as a new way of diagnostic method. A virtual unfolded (VU) view is suitable for displaying its wall. In this paper, we propose a semi-automated method for generating VU views of the stomach. Our method requires minimum manual operations. The determination of the unfolding forces and the termination of the unfolding process are automated. The unfolded shape of the stomach is estimated based on its radius. The unfolding forces are determined so that the stomach wall is deformed to the expected shape. The iterative deformation process is terminated if the difference of the shapes between the deformed shape and expected shape is small. Our experiments using 67 CT volumes showed that our proposed method can generate good VU views for 76.1% cases.

* Published in Proceedings of MICCAI 2013, LNCS 8149, pp.332-339, 2013
* Accepted paper as a poster presentation at MICCAI 2013 (International Conference on Medical Image Computing and Computer-Assisted Intervention), Nagoya, Japan

Via

Access Paper or Ask Questions

Towards Instance-level Image-to-Image Translation

May 05, 2019
Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, Thomas Huang

Figure 1 for Towards Instance-level Image-to-Image Translation

Figure 2 for Towards Instance-level Image-to-Image Translation

Figure 3 for Towards Instance-level Image-to-Image Translation

Figure 4 for Towards Instance-level Image-to-Image Translation

Unpaired Image-to-image Translation is a new rising and challenging vision problem that aims to learn a mapping between unaligned image pairs in diverse domains. Recent advances in this field like MUNIT and DRIT mainly focus on disentangling content and style/attribute from a given image first, then directly adopting the global style to guide the model to synthesize new domain images. However, this kind of approaches severely incurs contradiction if the target domain images are content-rich with multiple discrepant objects. In this paper, we present a simple yet effective instance-aware image-to-image translation approach (INIT), which employs the fine-grained local (instance) and global styles to the target image spatially. The proposed INIT exhibits three import advantages: (1) the instance-level objective loss can help learn a more accurate reconstruction and incorporate diverse attributes of objects; (2) the styles used for target domain of local/global areas are from corresponding spatial regions in source domain, which intuitively is a more reasonable mapping; (3) the joint training process can benefit both fine and coarse granularity and incorporates instance information to improve the quality of global translation. We also collect a large-scale benchmark for the new instance-level translation task. We observe that our synthetic images can even benefit real-world vision tasks like generic object detection.

* Accepted to CVPR 2019. Project page: http://zhiqiangshen.com/projects/INIT/index.html

Via

Access Paper or Ask Questions