Alert button
Picture for Xingguang Zhang

Xingguang Zhang

Alert button

Physics-Driven Turbulence Image Restoration with Stochastic Refinement

Jul 20, 2023
Ajay Jaiswal, Xingguang Zhang, Stanley H. Chan, Zhangyang Wang

Figure 1 for Physics-Driven Turbulence Image Restoration with Stochastic Refinement
Figure 2 for Physics-Driven Turbulence Image Restoration with Stochastic Refinement
Figure 3 for Physics-Driven Turbulence Image Restoration with Stochastic Refinement
Figure 4 for Physics-Driven Turbulence Image Restoration with Stochastic Refinement

Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs. This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. Furthermore, to overcome the ``average effect" introduced by deterministic models and the domain gap between the synthetic and real-world degradation, we further introduce PiRN with Stochastic Refinement (PiRN-SR) to boost its perceptual quality. Overall, our PiRN and PiRN-SR improve the generalization to real-world unknown turbulence conditions and provide a state-of-the-art restoration in both pixel-wise accuracy and perceptual quality. Our codes are available at \url{https://github.com/VITA-Group/PiRN}.

* Accepted by ICCV 2023 
Viaarxiv icon

FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude

Jun 29, 2023
Feng Liu, Ryan Ashbaugh, Nicholas Chimitt, Najmul Hassan, Ali Hassani, Ajay Jaiswal, Minchul Kim, Zhiyuan Mao, Christopher Perry, Zhiyuan Ren, Yiyang Su, Pegah Varghaei, Kai Wang, Xingguang Zhang, Stanley Chan, Arun Ross, Humphrey Shi, Zhangyang Wang, Anil Jain, Xiaoming Liu

Figure 1 for FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude
Figure 2 for FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude
Figure 3 for FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude
Figure 4 for FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude

Whole-body biometric recognition is an important area of research due to its vast applications in law enforcement, border security, and surveillance. This paper presents the end-to-end design, development and evaluation of FarSight, an innovative software system designed for whole-body (fusion of face, gait and body shape) biometric recognition. FarSight accepts videos from elevated platforms and drones as input and outputs a candidate list of identities from a gallery. The system is designed to address several challenges, including (i) low-quality imagery, (ii) large yaw and pitch angles, (iii) robust feature extraction to accommodate large intra-person variabilities and large inter-person similarities, and (iv) the large domain gap between training and test sets. FarSight combines the physics of imaging and deep learning models to enhance image restoration and biometric feature encoding. We test FarSight's effectiveness using the newly acquired IARPA Biometric Recognition and Identification at Altitude and Range (BRIAR) dataset. Notably, FarSight demonstrated a substantial performance increase on the BRIAR dataset, with gains of +11.82% Rank-20 identification and +11.3% TAR@1% FAR.

* 11 pages, 7 figures 
Viaarxiv icon

HDR Imaging with Spatially Varying Signal-to-Noise Ratios

Apr 16, 2023
Yiheng Chi, Xingguang Zhang, Stanley H. Chan

Figure 1 for HDR Imaging with Spatially Varying Signal-to-Noise Ratios
Figure 2 for HDR Imaging with Spatially Varying Signal-to-Noise Ratios
Figure 3 for HDR Imaging with Spatially Varying Signal-to-Noise Ratios
Figure 4 for HDR Imaging with Spatially Varying Signal-to-Noise Ratios

While today's high dynamic range (HDR) image fusion algorithms are capable of blending multiple exposures, the acquisition is often controlled so that the dynamic range within one exposure is narrow. For HDR imaging in photon-limited situations, the dynamic range can be enormous and the noise within one exposure is spatially varying. Existing image denoising algorithms and HDR fusion algorithms both fail to handle this situation, leading to severe limitations in low-light HDR imaging. This paper presents two contributions. Firstly, we identify the source of the problem. We find that the issue is associated with the co-existence of (1) spatially varying signal-to-noise ratio, especially the excessive noise due to very dark regions, and (2) a wide luminance range within each exposure. We show that while the issue can be handled by a bank of denoisers, the complexity is high. Secondly, we propose a new method called the spatially varying high dynamic range (SV-HDR) fusion network to simultaneously denoise and fuse images. We introduce a new exposure-shared block within our custom-designed multi-scale transformer framework. In a variety of testing conditions, the performance of the proposed SV-HDR is better than the existing methods.

Viaarxiv icon

Scattering and Gathering for Spatially Varying Blurs

Mar 10, 2023
Nicholas Chimitt, Xingguang Zhang, Yiheng Chi, Stanley H. Chan

Figure 1 for Scattering and Gathering for Spatially Varying Blurs
Figure 2 for Scattering and Gathering for Spatially Varying Blurs
Figure 3 for Scattering and Gathering for Spatially Varying Blurs
Figure 4 for Scattering and Gathering for Spatially Varying Blurs

A spatially varying blur kernel $h(\mathbf{x},\mathbf{u})$ is specified by an input coordinate $\mathbf{u} \in \mathbb{R}^2$ and an output coordinate $\mathbf{x} \in \mathbb{R}^2$. For computational efficiency, we sometimes write $h(\mathbf{x},\mathbf{u})$ as a linear combination of spatially invariant basis functions. The associated pixelwise coefficients, however, can be indexed by either the input coordinate or the output coordinate. While appearing subtle, the two indexing schemes will lead to two different forms of convolutions known as scattering and gathering, respectively. We discuss the origin of the operations. We discuss conditions under which the two operations are identical. We show that scattering is more suitable for simulating how light propagates and gathering is more suitable for image filtering such as denoising.

Viaarxiv icon

Real-Time Dense Field Phase-to-Space Simulation of Imaging through Atmospheric Turbulence

Oct 13, 2022
Nicholas Chimitt, Xingguang Zhang, Zhiyuan Mao, Stanley H. Chan

Figure 1 for Real-Time Dense Field Phase-to-Space Simulation of Imaging through Atmospheric Turbulence
Figure 2 for Real-Time Dense Field Phase-to-Space Simulation of Imaging through Atmospheric Turbulence
Figure 3 for Real-Time Dense Field Phase-to-Space Simulation of Imaging through Atmospheric Turbulence
Figure 4 for Real-Time Dense Field Phase-to-Space Simulation of Imaging through Atmospheric Turbulence

Numerical simulation of atmospheric turbulence is one of the biggest bottlenecks in developing computational techniques for solving the inverse problem in long-range imaging. The classical split-step method is based upon numerical wave propagation which splits the propagation path into many segments and propagates every pixel in each segment individually via the Fresnel integral. This repeated evaluation becomes increasingly time-consuming for larger images. As a result, the split-step simulation is often done only on a sparse grid of points followed by an interpolation to the other pixels. Even so, the computation is expensive for real-time applications. In this paper, we present a new simulation method that enables \emph{real-time} processing over a \emph{dense} grid of points. Building upon the recently developed multi-aperture model and the phase-to-space transform, we overcome the memory bottleneck in drawing random samples from the Zernike correlation tensor. We show that the cross-correlation of the Zernike modes has an insignificant contribution to the statistics of the random samples. By approximating these cross-correlation blocks in the Zernike tensor, we restore the homogeneity of the tensor which then enables Fourier-based random sampling. On a $512\times512$ image, the new simulator achieves 0.025 seconds per frame over a dense field. On a $3840 \times 2160$ image which would have taken 13 hours to simulate using the split-step method, the new simulator can run at approximately 60 seconds per frame.

Viaarxiv icon

Imaging through the Atmosphere using Turbulence Mitigation Transformer

Jul 13, 2022
Xingguang Zhang, Zhiyuan Mao, Nicholas Chimitt, Stanley H. Chan

Figure 1 for Imaging through the Atmosphere using Turbulence Mitigation Transformer
Figure 2 for Imaging through the Atmosphere using Turbulence Mitigation Transformer
Figure 3 for Imaging through the Atmosphere using Turbulence Mitigation Transformer
Figure 4 for Imaging through the Atmosphere using Turbulence Mitigation Transformer

Restoring images distorted by atmospheric turbulence is a long-standing problem due to the spatially varying nature of the distortion, nonlinearity of the image formation process, and scarcity of training and testing data. Existing methods often have strong statistical assumptions on the distortion model which in many cases will lead to a limited performance in real-world scenarios as they do not generalize. To overcome the challenge, this paper presents an end-to-end physics-driven approach that is efficient and can generalize to real-world turbulence. On the data synthesis front, we significantly increase the image resolution that can be handled by the SOTA turbulence simulator by approximating the random field via wide-sense stationarity. The new data synthesis process enables the generation of large-scale multi-level turbulence and ground truth pairs for training. On the network design front, we propose the turbulence mitigation transformer (TMT), a two stage U-Net shaped multi-frame restoration network which has a noval efficient self-attention mechanism named temporal channel joint attention (TCJA). We also introduce a new training scheme that is enabled by the new simulator, and we design new transformer units to reduce the memory consumption. Experimental results on both static and dynamic scenes are promising, including various real turbulence scenarios.

* 13 pages, 12 figures, project page: https://xg416.github.io/TMT/ 
Viaarxiv icon

DESK: A Robotic Activity Dataset for Dexterous Surgical Skills Transfer to Medical Robots

Mar 03, 2019
Naveen Madapana, Md Masudur Rahman, Natalia Sanchez-Tamayo, Mythra V. Balakuntala, Glebys Gonzalez, Jyothsna Padmakumar Bindu, L. N. Vishnunandan Venkatesh, Xingguang Zhang, Juan Barragan Noguera, Thomas Low, Richard Voyles, Yexiang Xue, Juan Wachs

Figure 1 for DESK: A Robotic Activity Dataset for Dexterous Surgical Skills Transfer to Medical Robots
Figure 2 for DESK: A Robotic Activity Dataset for Dexterous Surgical Skills Transfer to Medical Robots
Figure 3 for DESK: A Robotic Activity Dataset for Dexterous Surgical Skills Transfer to Medical Robots
Figure 4 for DESK: A Robotic Activity Dataset for Dexterous Surgical Skills Transfer to Medical Robots

Datasets are an essential component for training effective machine learning models. In particular, surgical robotic datasets have been key to many advances in semi-autonomous surgeries, skill assessment, and training. Simulated surgical environments can enhance the data collection process by making it faster, simpler and cheaper than real systems. In addition, combining data from multiple robotic domains can provide rich and diverse training data for transfer learning algorithms. In this paper, we present the DESK (Dexterous Surgical Skill) dataset. It comprises a set of surgical robotic skills collected during a surgical training task using three robotic platforms: the Taurus II robot, Taurus II simulated robot, and the YuMi robot. This dataset was used to test the idea of transferring knowledge across different domains (e.g. from Taurus to YuMi robot) for a surgical gesture classification task with seven gestures. We explored three different scenarios: 1) No transfer, 2) Transfer from simulated Taurus to real Taurus and 3) Transfer from Simulated Taurus to the YuMi robot. We conducted extensive experiments with three supervised learning models and provided baselines in each of these scenarios. Results show that using simulation data during training enhances the performance on the real robot where limited real data is available. In particular, we obtained an accuracy of 55% on the real Taurus data using a model that is trained only on the simulator data. Furthermore, we achieved an accuracy improvement of 34% when 3% of the real data is added into the training process.

* 8 pages, 5 figures, 4 tables, submitted to IROS 2019 conference 
Viaarxiv icon