Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Lensless Compressive Sensing Imaging

Feb 07, 2013
Gang Huang, Hong Jiang, Kim Matthews, Paul Wilford

In this paper, we propose a lensless compressive sensing imaging architecture. The architecture consists of two components, an aperture assembly and a sensor. No lens is used. The aperture assembly consists of a two dimensional array of aperture elements. The transmittance of each aperture element is independently controllable. The sensor is a single detection element, such as a single photo-conductive cell. Each aperture element together with the sensor defines a cone of a bundle of rays, and the cones of the aperture assembly define the pixels of an image. Each pixel value of an image is the integration of the bundle of rays in a cone. The sensor is used for taking compressive measurements. Each measurement is the integration of rays in the cones modulated by the transmittance of the aperture elements. A compressive sensing matrix is implemented by adjusting the transmittance of the individual aperture elements according to the values of the sensing matrix. The proposed architecture is simple and reliable because no lens is used. Furthermore, the sharpness of an image from our device is only limited by the resolution of the aperture assembly, but not affected by blurring due to defocus. The architecture can be used for capturing images of visible lights, and other spectra such as infrared, or millimeter waves. Such devices may be used in surveillance applications for detecting anomalies or extracting features such as speed of moving objects. Multiple sensors may be used with a single aperture assembly to capture multi-view images simultaneously. A prototype was built by using a LCD panel and a photoelectric sensor for capturing images of visible spectrum.

* 12 pages, 13 figures 
  
Access Paper or Ask Questions

Vision-Only Robot Navigation in a Neural Radiance World

Oct 01, 2021
Michal Adamkiewicz, Timothy Chen, Adam Caccavale, Rachel Gardner, Preston Culbertson, Jeannette Bohg, Mac Schwager

Neural Radiance Fields (NeRFs) have recently emerged as a powerful paradigm for the representation of natural, complex 3D scenes. NeRFs represent continuous volumetric density and RGB values in a neural network, and generate photo-realistic images from unseen camera viewpoints through ray tracing. We propose an algorithm for navigating a robot through a 3D environment represented as a NeRF using only an on-board RGB camera for localization. We assume the NeRF for the scene has been pre-trained offline, and the robot's objective is to navigate through unoccupied space in the NeRF to reach a goal pose. We introduce a trajectory optimization algorithm that avoids collisions with high-density regions in the NeRF based on a discrete time version of differential flatness that is amenable to constraining the robot's full pose and control inputs. We also introduce an optimization based filtering method to estimate 6DoF pose and velocities for the robot in the NeRF given only an onboard RGB camera. We combine the trajectory planner with the pose filter in an online replanning loop to give a vision-based robot navigation pipeline. We present simulation results with a quadrotor robot navigating through a jungle gym environment, the inside of a church, and Stonehenge using only an RGB camera. We also demonstrate an omnidirectional ground robot navigating through the church, requiring it to reorient to fit through the narrow gap. Videos of this work can be found at https://mikh3x4.github.io/nerf-navigation/ .

  
Access Paper or Ask Questions

Image color correction, enhancement, and editing

Jul 28, 2021
Mahmoud Afifi

This thesis presents methods and approaches to image color correction, color enhancement, and color editing. To begin, we study the color correction problem from the standpoint of the camera's image signal processor (ISP). A camera's ISP is hardware that applies a series of in-camera image processing and color manipulation steps, many of which are nonlinear in nature, to render the initial sensor image to its final photo-finished representation saved in the 8-bit standard RGB (sRGB) color space. As white balance (WB) is one of the major procedures applied by the ISP for color correction, this thesis presents two different methods for ISP white balancing. Afterward, we discuss another scenario of correcting and editing image colors, where we present a set of methods to correct and edit WB settings for images that have been improperly white-balanced by the ISP. Then, we explore another factor that has a significant impact on the quality of camera-rendered colors, in which we outline two different methods to correct exposure errors in camera-rendered images. Lastly, we discuss post-capture auto color editing and manipulation. In particular, we propose auto image recoloring methods to generate different realistic versions of the same camera-rendered image with new colors. Through extensive evaluations, we demonstrate that our methods provide superior solutions compared to existing alternatives targeting color correction, color enhancement, and color editing.

* PhD dissertation 
  
Access Paper or Ask Questions

Turbulence Enrichment using Physics-informed Generative Adversarial Networks

Mar 06, 2020
Akshay Subramaniam, Man Long Wong, Raunak D Borker, Sravya Nimmagadda, Sanjiva K Lele

Generative Adversarial Networks (GANs) have been widely used for generating photo-realistic images. A variant of GANs called super-resolution GAN (SRGAN) has already been used successfully for image super-resolution where low resolution images can be upsampled to a $4\times$ larger image that is perceptually more realistic. However, when such generative models are used for data describing physical processes, there are additional known constraints that models must satisfy including governing equations and boundary conditions. In general, these constraints may not be obeyed by the generated data. In this work, we develop physics-based methods for generative enrichment of turbulence. We incorporate a physics-informed learning approach by a modification to the loss function to minimize the residuals of the governing equations for the generated data. We have analyzed two trained physics-informed models: a supervised model based on convolutional neural networks (CNN) and a generative model based on SRGAN: Turbulence Enrichment GAN (TEGAN), and show that they both outperform simple bicubic interpolation in turbulence enrichment. We have also shown that using the physics-informed learning can also significantly improve the model's ability in generating data that satisfies the physical governing equations. Finally, we compare the enriched data from TEGAN to show that it is able to recover statistical metrics of the flow field including energy metrics and well as inter-scale energy dynamics and flow morphology.

* for associated code, see https://github.com/akshaysubr/TEGAN 
  
Access Paper or Ask Questions

Turbulence Enrichment using Generative Adversarial Networks

Mar 04, 2020
Akshay Subramaniam, Man Long Wong, Raunak D Borker, Sravya Nimmagadda, Sanjiva K Lele

Generative Adversarial Networks (GANs) have been widely used for generating photo-realistic images. A variant of GANs called super-resolution GAN (SRGAN) has already been used successfully for image super-resolution where low resolution images can be upsampled to a $4\times$ larger image that is perceptually more realistic. However, when such generative models are used for data describing physical processes, there are additional known constraints that models must satisfy including governing equations and boundary conditions. In general, these constraints may not be obeyed by the generated data. In this work, we develop physics-based methods for generative enrichment of turbulence. We incorporate a physics-informed learning approach by a modification to the loss function to minimize the residuals of the governing equations for the generated data. We have analyzed two trained physics-informed models: a supervised model based on convolutional neural networks (CNN) and a generative model based on SRGAN: Turbulence Enrichment GAN (TEGAN), and show that they both outperform simple bicubic interpolation in turbulence enrichment. We have also shown that using the physics-informed learning can also significantly improve the model's ability in generating data that satisfies the physical governing equations. Finally, we compare the enriched data from TEGAN to show that it is able to recover statistical metrics of the flow field including energy metrics and well as inter-scale energy dynamics and flow morphology.

* for associated code, see https://github.com/akshaysubr/TEGAN 
  
Access Paper or Ask Questions

A Study on various state of the art of the Art Face Recognition System using Deep Learning Techniques

Nov 19, 2019
Sukhada Chokkadi, Sannidhan M S, Sudeepa K B, Abhir Bhandary

Considering the existence of very large amount of available data repositories and reach to the very advanced system of hardware, systems meant for facial identification ave evolved enormously over the past few decades. Sketch recognition is one of the most important areas that have evolved as an integral component adopted by the agencies of law administration in current trends of forensic science. Matching of derived sketches to photo images of face is also a difficult assignment as the considered sketches are produced upon the verbal explanation depicted by the eye witness of the crime scene and may have scarcity of sensitive elements that exist in the photograph as one can accurately depict due to the natural human error. Substantial amount of the novel research work carried out in this area up late used recognition system through traditional extraction and classification models. But very recently, few researches work focused on using deep learning techniques to take an advantage of learning models for the feature extraction and classification to rule out potential domain challenges. The first part of this review paper basically focuses on deep learning techniques used in face recognition and matching which as improved the accuracy of face recognition technique with training of huge sets of data. This paper also includes a survey on different techniques used to match composite sketches to human images which includes component-based representation approach, automatic composite sketch recognition technique etc.

* International Journal of Advanced Trends in Computer Science and Engineering, 8(4), July- August 2019, 1590 
  
Access Paper or Ask Questions

TileGen: Tileable, Controllable Material Generation and Capture

Jun 20, 2022
Xilong Zhou, Miloš Hašan, Valentin Deschaintre, Paul Guerrero, Kalyan Sunkavalli, Nima Kalantari

Recent methods (e.g. MaterialGAN) have used unconditional GANs to generate per-pixel material maps, or as a prior to reconstruct materials from input photographs. These models can generate varied random material appearance, but do not have any mechanism to constrain the generated material to a specific category or to control the coarse structure of the generated material, such as the exact brick layout on a brick wall. Furthermore, materials reconstructed from a single input photo commonly have artifacts and are generally not tileable, which limits their use in practical content creation pipelines. We propose TileGen, a generative model for SVBRDFs that is specific to a material category, always tileable, and optionally conditional on a provided input structure pattern. TileGen is a variant of StyleGAN whose architecture is modified to always produce tileable (periodic) material maps. In addition to the standard "style" latent code, TileGen can optionally take a condition image, giving a user direct control over the dominant spatial (and optionally color) features of the material. For example, in brick materials, the user can specify a brick layout and the brick color, or in leather materials, the locations of wrinkles and folds. Our inverse rendering approach can find a material perceptually matching a single target photograph by optimization. This reconstruction can also be conditional on a user-provided pattern. The resulting materials are tileable, can be larger than the target image, and are editable by varying the condition.

* 18 pages, 19 figures 
  
Access Paper or Ask Questions

Learning Efficient Multi-Agent Cooperative Visual Exploration

Oct 12, 2021
Chao Yu, Xinyi Yang, Jiaxuan Gao, Huazhong Yang, Yu Wang, Yi Wu

We consider the task of visual indoor exploration with multiple agents, where the agents need to cooperatively explore the entire indoor region using as few steps as possible. Classical planning-based methods often suffer from particularly expensive computation at each inference step and a limited expressiveness of cooperation strategy. By contrast, reinforcement learning (RL) has become a trending paradigm for tackling this challenge due to its modeling capability of arbitrarily complex strategies and minimal inference overhead. We extend the state-of-the-art single-agent RL solution, Active Neural SLAM (ANS), to the multi-agent setting by introducing a novel RL-based global-goal planner, Spatial Coordination Planner (SCP), which leverages spatial information from each individual agent in an end-to-end manner and effectively guides the agents to navigate towards different spatial goals with high exploration efficiency. SCP consists of a transformer-based relation encoder to capture intra-agent interactions and a spatial action decoder to produce accurate goals. In addition, we also implement a few multi-agent enhancements to process local information from each agent for an aligned spatial representation and more precise planning. Our final solution, Multi-Agent Active Neural SLAM (MAANS), combines all these techniques and substantially outperforms 4 different planning-based methods and various RL baselines in the photo-realistic physical testbed, Habitat.

* First three authors share equal contribution 
  
Access Paper or Ask Questions

Photon-Driven Neural Path Guiding

Oct 05, 2020
Shilin Zhu, Zexiang Xu, Tiancheng Sun, Alexandr Kuznetsov, Mark Meyer, Henrik Wann Jensen, Hao Su, Ravi Ramamoorthi

Although Monte Carlo path tracing is a simple and effective algorithm to synthesize photo-realistic images, it is often very slow to converge to noise-free results when involving complex global illumination. One of the most successful variance-reduction techniques is path guiding, which can learn better distributions for importance sampling to reduce pixel noise. However, previous methods require a large number of path samples to achieve reliable path guiding. We present a novel neural path guiding approach that can reconstruct high-quality sampling distributions for path guiding from a sparse set of samples, using an offline trained neural network. We leverage photons traced from light sources as the input for sampling density reconstruction, which is highly effective for challenging scenes with strong global illumination. To fully make use of our deep neural network, we partition the scene space into an adaptive hierarchical grid, in which we apply our network to reconstruct high-quality sampling distributions for any local region in the scene. This allows for highly efficient path guiding for any path bounce at any location in path tracing. We demonstrate that our photon-driven neural path guiding method can generalize well on diverse challenging testing scenes that are not seen in training. Our approach achieves significantly better rendering results of testing scenes than previous state-of-the-art path guiding methods.

* Keywords: computer graphics, rendering, path tracing, path guiding, machine learning, neural networks, denoising, reconstruction 
  
Access Paper or Ask Questions

Ensemble Transfer Learning for Emergency Landing Field Identification on Moderate Resource Heterogeneous Kubernetes Cluster

Jun 26, 2020
Andreas Klos, Marius Rosenbaum, Wolfram Schiffmann

The full loss of thrust of an aircraft requires fast and reliable decisions of the pilot. If no published landing field is within reach, an emergency landing field must be selected. The choice of a suitable emergency landing field denotes a crucial task to avoid unnecessary damage of the aircraft, risk for the civil population as well as the crew and all passengers on board. Especially in case of instrument meteorological conditions it is indispensable to use a database of suitable emergency landing fields. Thus, based on public available digital orthographic photos and digital surface models, we created various datasets with different sample sizes to facilitate training and testing of neural networks. Each dataset consists of a set of data layers. The best compositions of these data layers as well as the best performing transfer learning models are selected. Subsequently, certain hyperparameters of the chosen models for each sample size are optimized with Bayesian and Bandit optimization. The hyperparameter tuning is performed with a self-made Kubernetes cluster. The models outputs were investigated with respect to the input data by the utilization of layer-wise relevance propagation. With optimized models we created an ensemble model to improve the segmentation performance. Finally, an area around the airport of Arnsberg in North Rhine-Westphalia was segmented and emergency landing fields are identified, while the verification of the final approach's obstacle clearance is left unconsidered. These emergency landing fields are stored in a PostgreSQL database.

  
Access Paper or Ask Questions
<<
>>