Alert button
Picture for Jiahao Pang

Jiahao Pang

Alert button

WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

Aug 29, 2023
Eric Lei, Muhammad Asad Lodhi, Jiahao Pang, Junghyun Ahn, Dong Tian

There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific templates, e.g., human face/body templates. It restricts the learned latent codes to only be meaningful for objects in a specific category, so the learned latent spaces are unable to be used across different types of objects. In this work, we present WrappingNet, the first mesh autoencoder enabling general mesh unsupervised learning over heterogeneous objects. It introduces a novel base graph in the bottleneck dedicated to representing mesh connectivity, which is shown to facilitate learning a shared latent space representing object shape. The superiority of WrappingNet mesh learning is further demonstrated via improved reconstruction quality and competitive classification compared to point cloud learning, as well as latent interpolation between meshes of different categories.

Viaarxiv icon

GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression

Sep 09, 2022
Jiahao Pang, Muhammad Asad Lodhi, Dong Tian

Figure 1 for GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression
Figure 2 for GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression
Figure 3 for GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression
Figure 4 for GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression

Point cloud compression (PCC) is a key enabler for various 3-D applications, owing to the universality of the point cloud format. Ideally, 3D point clouds endeavor to depict object/scene surfaces that are continuous. Practically, as a set of discrete samples, point clouds are locally disconnected and sparsely distributed. This sparse nature is hindering the discovery of local correlation among points for compression. Motivated by an analysis with fractal dimension, we propose a heterogeneous approach with deep learning for lossy point cloud geometry compression. On top of a base layer compressing a coarse representation of the input, an enhancement layer is designed to cope with the challenging geometric residual/details. Specifically, a point-based network is applied to convert the erratic local details to latent features residing on the coarse point cloud. Then a sparse convolutional neural network operating on the coarse point cloud is launched. It utilizes the continuity/smoothness of the coarse geometry to compress the latent features as an enhancement bit-stream that greatly benefits the reconstruction quality. When this bit-stream is unavailable, e.g., due to packet loss, we support a skip mode with the same architecture which generates geometric details from the coarse point cloud directly. Experimentation on both dense and sparse point clouds demonstrate the state-of-the-art compression performance achieved by our proposal. Our code is available at https://github.com/InterDigitalInc/GRASP-Net.

* Accepted at ACM MM 2022 Workshop on Advances in Point Cloud Compression, Processing and Analysis 
Viaarxiv icon

Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement

Nov 09, 2021
Xue Zhang, Gene Cheung, Jiahao Pang, Yash Sanghvi, Abhiram Gnanasambandam, Stanley H. Chan

Figure 1 for Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement
Figure 2 for Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement
Figure 3 for Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement
Figure 4 for Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement

A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud \textit{a posteriori} after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements directly on the sensed images \textit{a priori}, before synthesizing a 3D point cloud. By enhancing near the physical sensing process, we tailor our optimization to our depth formation model before subsequent processing steps that obscure measurement errors. Specifically, we model depth formation as a combined process of signal-dependent noise addition and non-uniform log-based quantization. The designed model is validated (with parameters fitted) using collected empirical data from an actual depth sensor. To enhance each pixel row in a depth image, we first encode intra-view similarities between available row pixels as edge weights via feature graph learning. We next establish inter-view similarities with another rectified depth image via viewpoint mapping and sparse linear interpolation. This leads to a maximum a posteriori (MAP) graph filtering objective that is convex and differentiable. We optimize the objective efficiently using accelerated gradient descent (AGD), where the optimal step size is approximated via Gershgorin circle theorem (GCT). Experiments show that our method significantly outperformed recent point cloud denoising schemes and state-of-the-art image denoising schemes, in two established point cloud quality metrics.

* 13 pages,14 figures 
Viaarxiv icon

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Apr 01, 2021
Haiyan Wang, Jiahao Pang, Muhammad A. Lodhi, Yingli Tian, Dong Tian

Figure 1 for FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
Figure 2 for FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
Figure 3 for FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
Figure 4 for FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked new research in 3D scene flow. Nevertheless, it remains challenging to extract scene flow from point clouds due to the sparsity and irregularity in typical point cloud sampling patterns. One major issue related to irregular sampling is identified as the randomness during point set abstraction/feature extraction -- an elementary process in many flow estimation scenarios. A novel Spatial Abstraction with Attention (SA^2) layer is accordingly proposed to alleviate the unstable abstraction problem. Moreover, a Temporal Abstraction with Attention (TA^2) layer is proposed to rectify attention in temporal domain, leading to benefits with motions scaled in a larger range. Extensive analysis and experiments verified the motivation and significant performance gains of our method, dubbed as Flow Estimation via Spatial-Temporal Attention (FESTA), when compared to several state-of-the-art benchmarks of scene flow estimation.

* Accepted at CVPR 2021 
Viaarxiv icon

Graph Signal Processing for Geometric Data and Beyond: Theory and Applications

Aug 05, 2020
Wei Hu, Jiahao Pang, Xianming Liu, Dong Tian, Chia-Wen Lin, Anthony Vetro

Figure 1 for Graph Signal Processing for Geometric Data and Beyond: Theory and Applications
Figure 2 for Graph Signal Processing for Geometric Data and Beyond: Theory and Applications
Figure 3 for Graph Signal Processing for Geometric Data and Beyond: Theory and Applications
Figure 4 for Graph Signal Processing for Geometric Data and Beyond: Theory and Applications

Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited, while Graph Signal Processing (GSP)---a fast-developing field in the signal processing community---enables processing signals that reside on irregular domains and plays a critical role in numerous applications of geometric data from low-level processing to high-level analysis. To further advance the research in this field, we provide the first timely and comprehensive overview of GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs, among the various geometric data modalities, and with spectral/nodal graph filtering techniques. We also discuss the recently developed Graph Neural Networks (GNNs) and interpret the operation of these networks from the perspective of GSP. We conclude with a brief discussion of open problems and challenges.

* 16 pages, 6 figures 
Viaarxiv icon

TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations

Jun 17, 2020
Jiahao Pang, Duanshun Li, Dong Tian

Figure 1 for TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations
Figure 2 for TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations
Figure 3 for TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations
Figure 4 for TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations

Topology matters. Despite the recent success of point cloud processing with geometric deep learning, it remains arduous to capture the complex topologies of point cloud data with a learning model. Given a point cloud dataset containing objects with various genera or scenes with multiple objects, we propose an autoencoder, TearingNet, which tackles the challenging task of representing the point clouds using a fixed-length descriptor. Unlike existing works to deform primitives of genus zero (e.g., a 2D square patch) to an object-level point cloud, we propose a function which tears the primitive during deformation, letting it emulate the topology of a target point cloud. From the torn primitive, we construct a locally-connected graph to further enforce the learned topology via filtering. Moreover, we analyze a widely existing problem which we call point-collapse when processing point clouds with diverse topologies. Correspondingly, we propose a subtractive sculpture strategy to train our TearingNet model. Experimentation finally shows the superiority of our proposal in terms of reconstructing more faithful point clouds as well as generating more topology-friendly representations than benchmarks.

* Submitted to NeurIPS 2020 
Viaarxiv icon

Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module

Sep 17, 2019
Di Qiu, Jiahao Pang, Wenxiu Sun, Chengxi Yang

Figure 1 for Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module
Figure 2 for Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module
Figure 3 for Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module
Figure 4 for Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module

Recently, it is increasingly popular to equip mobile RGB cameras with Time-of-Flight (ToF) sensors for active depth sensing. However, for off-the-shelf ToF sensors, one must tackle two problems in order to obtain high-quality depth with respect to the RGB camera, namely 1) online calibration and alignment; and 2) complicated error correction for ToF depth sensing. In this work, we propose a framework for jointly alignment and refinement via deep learning. First, a cross-modal optical flow between the RGB image and the ToF amplitude image is estimated for alignment. The aligned depth is then refined via an improved kernel predicting network that performs kernel normalization and applies the bias prior to the dynamic convolution. To enrich our data for end-to-end training, we have also synthesized a dataset using tools from computer graphics. Experimental results demonstrate the effectiveness of our approach, achieving state-of-the-art for ToF refinement.

* ICCV2019 
Viaarxiv icon

DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras

Sep 26, 2018
Ruichao Xiao, Wenxiu Sun, Jiahao Pang, Qiong Yan, Jimmy Ren

Figure 1 for DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras
Figure 2 for DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras
Figure 3 for DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras
Figure 4 for DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras

With the developments of dual-lens camera modules,depth information representing the third dimension of thecaptured scenes becomes available for smartphones. It isestimated by stereo matching algorithms, taking as input thetwo views captured by dual-lens cameras at slightly differ-ent viewpoints. Depth-of-field rendering (also be referred toas synthetic defocus or bokeh) is one of the trending depth-based applications. However, to achieve fast depth estima-tion on smartphones, the stereo pairs need to be rectified inthe first place. In this paper, we propose a cost-effective so-lution to perform stereo rectification for dual-lens camerascalled direct self-rectification, short for DSR1. It removesthe need of individual offline calibration for every pair ofdual-lens cameras. In addition, the proposed solution isrobust to the slight movements, e.g., due to collisions, ofthe dual-lens cameras after fabrication. Different with ex-isting self-rectification approaches, our approach computesthe homography in a novel way with zero geometric distor-tions introduced to the master image. It is achieved by di-rectly minimizing the vertical displacements of correspond-ing points between the original master image and the trans-formed slave image. Our method is evaluated on both real-istic and synthetic stereo image pairs, and produces supe-rior results compared to the calibrated rectification or otherself-rectification approaches

* Accepted at 3DV2018 
Viaarxiv icon

Deep Graph Laplacian Regularization

Jul 31, 2018
Jin Zeng, Jiahao Pang, Wenxiu Sun, Gene Cheung, Ruichao Xiao

Figure 1 for Deep Graph Laplacian Regularization
Figure 2 for Deep Graph Laplacian Regularization
Figure 3 for Deep Graph Laplacian Regularization
Figure 4 for Deep Graph Laplacian Regularization

We propose to combine the robustness merit of model-based approaches and the learning power of data-driven approaches for image restoration. Specifically, by integrating graph Laplacian regularization as a trainable module into a deep learning framework, we are less susceptible to overfitting than pure CNN-based approaches, achieving higher robustness to small dataset and cross-domain denoising. First, a sparse neighborhood graph is built from the output of a convolutional neural network (CNN). Then the image is restored by solving an unconstrained quadratic programming problem, using a corresponding graph Laplacian regularizer as a prior term. The proposed restoration pipeline is fully differentiable and hence can be end-to-end trained. Experimental results demonstrate that our work avoids overfitting given small training data. It is also endowed with strong cross-domain generalization power, outperforming the state-of-the-art approaches by remarkable margin.

Viaarxiv icon

Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching

Jul 30, 2018
Jiahao Pang, Wenxiu Sun, Jimmy SJ. Ren, Chengxi Yang, Qiong Yan

Figure 1 for Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching
Figure 2 for Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching
Figure 3 for Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching
Figure 4 for Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching

Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the first-stage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides state-of-the-art performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin.

* Accepted at ICCVW 2017. The first two authors contributed equally to this paper 
Viaarxiv icon