Underwater 3D reconstruction is challenging due to the refraction of light at the water-air interface (most electronic devices cannot be directly submerged in water). In this paper, we present an underwater 3D reconstruction solution using light field cameras. We first develop a light field camera calibration algorithm that simultaneously estimates the camera parameters and the geometry of the water-air interface. We then design a novel depth estimation algorithm for 3D reconstruction. Specifically, we match correspondences on curved epipolar lines caused by water refraction. We also observe that the view-dependent specular reflection is very weak in the underwater environment, resulting the angularly sampled rays in light field has uniform intensity. We therefore propose an angular uniformity constraint for depth optimization. We also develop a fast algorithm for locating the angular patches in presence of non-linear light paths. Extensive synthetic and real experiments demonstrate that our method can perform underwater 3D reconstruction with high accuracy.
We present a novel Relightable Neural Renderer (RNR) for simultaneous view synthesis and relighting using multi-view image inputs. Existing neural rendering (NR) does not explicitly model the physical rendering process and hence has limited capabilities on relighting. RNR instead models image formation in terms of environment lighting, object intrinsic attributes, and the light transport function (LTF), each corresponding to a learnable component. In particular, the incorporation of a physically based rendering process not only enables relighting but also improves the quality of novel view synthesis. Comprehensive experiments on synthetic and real data show that RNR provides a practical and effective solution for conducting free-viewpoint relighting.
Recovering the shape and reflectance of non-Lambertian surfaces remains a challenging problem in computer vision since the view-dependent appearance invalidates traditional photo-consistency constraint. In this paper, we introduce a novel concentric multi-spectral light field (CMSLF) design that is able to recover the shape and reflectance of surfaces with arbitrary material in one shot. Our CMSLF system consists of an array of cameras arranged on concentric circles where each ring captures a specific spectrum. Coupled with a multi-spectral ring light, we are able to sample viewpoint and lighting variations in a single shot via spectral multiplexing. We further show that such concentric camera/light setting results in a unique pattern of specular changes across views that enables robust depth estimation. We formulate a physical-based reflectance model on CMSLF to estimate depth and multi-spectral reflectance map without imposing any surface prior. Extensive synthetic and real experiments show that our method outperforms state-of-the-art light field-based techniques, especially in non-Lambertian scenes.
Particle Imaging Velocimetry (PIV) estimates the flow of fluid by analyzing the motion of injected particles. The problem is challenging as the particles lie at different depths but have similar appearance and tracking a large number of particles is particularly difficult. In this paper, we present a PIV solution that uses densely sampled light field to reconstruct and track 3D particles. We exploit the refocusing capability and focal symmetry constraint of the light field for reliable particle depth estimation. We further propose a new motion-constrained optical flow estimation scheme by enforcing local motion rigidity and the Navier-Stoke constraint. Comprehensive experiments on synthetic and real experiments show that using a single light field camera, our technique can recover dense and accurate 3D fluid flows in small to medium volumes.
We present a new color photometric stereo (CPS) method that can recover high quality, detailed 3D face geometry in a single shot. Our system uses three uncalibrated near point lights of different colors and a single camera. We first utilize 3D morphable model (3DMM) and semantic segmentation of facial parts to achieve robust self-calibration of light sources. We then address the spectral ambiguity problem by incorporating albedo consensus, albedo similarity, and proxy prior into a unified framework. We avoid the need for spatial constancy of albedo and use a new measure for albedo similarity that is based on the albedo norm profile. Experiments show that our new approach produces state-of-the-art results in single image with high-fidelity geometry that includes details such as wrinkles.
Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.
The intensive computation and memory requirements of generative adversarial neural networks (GANs) hinder its real-world deployment on edge devices such as smartphones. Despite the success in model reduction of CNNs, neural network quantization methods have not yet been studied on GANs, which are mainly faced with the issues of both the effectiveness of quantization algorithms and the instability of training GAN models. In this paper, we start with an extensive study on applying existing successful methods to quantize GANs. Our observation reveals that none of them generates samples with reasonable quality because of the underrepresentation of quantized values in model weights, and the generator and discriminator networks show different sensitivities upon quantization methods. Motivated by these observations, we develop a novel quantization method for GANs based on EM algorithms, named as QGAN. We also propose a multi-precision algorithm to help find the optimal number of bits of quantized GAN models in conjunction with corresponding result qualities. Experiments on CIFAR-10 and CelebA show that QGAN can quantize GANs to even 1-bit or 2-bit representations with results of quality comparable to original models.
Neural network (NN) trojaning attack is an emerging and important attack model that can broadly damage the system deployed with NN models. Existing studies have explored the outsourced training attack scenario and transfer learning attack scenario in some small datasets for specific domains, with limited numbers of fixed target classes. In this paper, we propose a more powerful trojaning attack method for both outsourced training attack and transfer learning attack, which outperforms existing studies in the capability, generality, and stealthiness. First, The attack is programmable that the malicious misclassification target is not fixed and can be generated on demand even after the victim's deployment. Second, our trojan attack is not limited in a small domain; one trojaned model on a large-scale dataset can affect applications of different domains that reuse its general features. Thirdly, our trojan design is hard to be detected or eliminated even if the victims fine-tune the whole model.
Crossbar architecture based devices have been widely adopted in neural network accelerators by taking advantage of the high efficiency on vector-matrix multiplication (VMM) operations. However, in the case of convolutional neural networks (CNNs), the efficiency is compromised dramatically due to the large amounts of data reuse. Although some mapping methods have been designed to achieve a balance between the execution throughput and resource overhead, the resource consumption cost is still huge while maintaining the throughput. Network pruning is a promising and widely studied leverage to shrink the model size. Whereas, previous work didn`t consider the crossbar architecture and the corresponding mapping method, which cannot be directly utilized by crossbar-based neural network accelerators. Tightly combining the crossbar structure and its mapping, this paper proposes a crossbar-aware pruning framework based on a formulated L0-norm constrained optimization problem. Specifically, we design an L0-norm constrained gradient descent (LGD) with relaxant probabilistic projection (RPP) to solve this problem. Two grains of sparsity are successfully achieved: i) intuitive crossbar-grain sparsity and ii) column-grain sparsity with output recombination, based on which we further propose an input feature maps (FMs) reorder method to improve the model accuracy. We evaluate our crossbar-aware pruning framework on median-scale CIFAR10 dataset and large-scale ImageNet dataset with VGG and ResNet models. Our method is able to reduce the crossbar overhead by 44%-72% with little accuracy degradation. This work greatly saves the resource and the related energy cost, which provides a new co-design solution for mapping CNNs onto various crossbar devices with significantly higher efficiency.