Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Huang

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Oct 02, 2023
Xin Huang, Ruizhi Shao, Qi Zhang, Hongwen Zhang, Ying Feng, Yebin Liu, Qing Wang

Figure 1 for HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Figure 2 for HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Figure 3 for HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Figure 4 for HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Recent text-to-3D methods employing diffusion models have made significant advancements in 3D human generation. However, these approaches face challenges due to the limitations of the text-to-image diffusion model, which lacks an understanding of 3D structures. Consequently, these methods struggle to achieve high-quality human generation, resulting in smooth geometry and cartoon-like appearances. In this paper, we observed that fine-tuning text-to-image diffusion models with normal maps enables their adaptation into text-to-normal diffusion models, which enhances the 2D perception of 3D geometry while preserving the priors learned from large-scale datasets. Therefore, we propose HumanNorm, a novel approach for high-quality and realistic 3D human generation by learning the normal diffusion model including a normal-adapted diffusion model and a normal-aligned diffusion model. The normal-adapted diffusion model can generate high-fidelity normal maps corresponding to prompts with view-dependent text. The normal-aligned diffusion model learns to generate color images aligned with the normal maps, thereby transforming physical geometry details into realistic appearance. Leveraging the proposed normal diffusion model, we devise a progressive geometry generation strategy and coarse-to-fine texture generation strategy to enhance the efficiency and robustness of 3D human generation. Comprehensive experiments substantiate our method's ability to generate 3D humans with intricate geometry and realistic appearances, significantly outperforming existing text-to-3D methods in both geometry and texture quality. The project page of HumanNorm is https://humannorm.github.io/.

* The project page of HumanNorm is https://humannorm.github.io/

Via

Access Paper or Ask Questions

SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

Sep 09, 2023
Bin Wang, Zhengyuan Liu, Xin Huang, Fangkai Jiao, Yang Ding, Ai Ti Aw, Nancy F. Chen

Figure 1 for SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

Figure 2 for SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

Figure 3 for SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

Figure 4 for SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

We present SeaEval, a benchmark for multilingual foundation models. In addition to characterizing how these models understand and reason with natural language, we also investigate how well they comprehend cultural practices, nuances, and values. Alongside standard accuracy metrics, we investigate the brittleness of foundation models in the dimensions of semantics and multilinguality. Our analyses span both open-sourced and closed models, leading to empirical results across classic NLP tasks, reasoning, and cultural comprehension. Key findings indicate (1) Most models exhibit varied behavior when given paraphrased instructions. (2) Many models still suffer from exposure bias (e.g., positional bias, majority label bias). (3) For questions rooted in factual, scientific, and commonsense knowledge, consistent responses are expected across multilingual queries that are semantically equivalent. Yet, most models surprisingly demonstrate inconsistent performance on these queries. (4) Multilingually-trained models have not attained "balanced multilingual" capabilities. Our endeavors underscore the need for more generalizable semantic representations and enhanced multilingual contextualization. SeaEval can serve as a launchpad for more thorough investigations and evaluations for multilingual and multicultural scenarios.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

GAME-UP: Game-Aware Mode Enumeration and Understanding for Trajectory Prediction

May 28, 2023
Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman

Figure 1 for GAME-UP: Game-Aware Mode Enumeration and Understanding for Trajectory Prediction

Figure 2 for GAME-UP: Game-Aware Mode Enumeration and Understanding for Trajectory Prediction

Figure 3 for GAME-UP: Game-Aware Mode Enumeration and Understanding for Trajectory Prediction

Figure 4 for GAME-UP: Game-Aware Mode Enumeration and Understanding for Trajectory Prediction

Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose GAME-UP, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic numerical analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive subset of Waymo Open Motion Dataset, including three subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering twice as many possible interactions versus a baseline model.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Inverting the Imaging Process by Learning an Implicit Camera Model

Apr 25, 2023
Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Qing Wang

Figure 1 for Inverting the Imaging Process by Learning an Implicit Camera Model

Figure 2 for Inverting the Imaging Process by Learning an Implicit Camera Model

Figure 3 for Inverting the Imaging Process by Learning an Implicit Camera Model

Figure 4 for Inverting the Imaging Process by Learning an Implicit Camera Model

Representing visual signals with implicit coordinate-based neural networks, as an effective replacement of the traditional discrete signal representation, has gained considerable popularity in computer vision and graphics. In contrast to existing implicit neural representations which focus on modelling the scene only, this paper proposes a novel implicit camera model which represents the physical imaging process of a camera as a deep neural network. We demonstrate the power of this new implicit camera model on two inverse imaging tasks: i) generating all-in-focus photos, and ii) HDR imaging. Specifically, we devise an implicit blur generator and an implicit tone mapper to model the aperture and exposure of the camera's imaging process, respectively. Our implicit camera model is jointly learned together with implicit scene models under multi-focus stack and multi-exposure bracket supervision. We have demonstrated the effectiveness of our new model on a large number of test images and videos, producing accurate and visually appealing all-in-focus and high dynamic range images. In principle, our new implicit neural camera model has the potential to benefit a wide array of other inverse imaging tasks.

* Accepted to CVPR 2023. Project page: https://xhuangcv.github.io/neucam/

Via

Access Paper or Ask Questions

Local Implicit Ray Function for Generalizable Radiance Field Representation

Apr 25, 2023
Xin Huang, Qi Zhang, Ying Feng, Xiaoyu Li, Xuan Wang, Qing Wang

Figure 1 for Local Implicit Ray Function for Generalizable Radiance Field Representation

Figure 2 for Local Implicit Ray Function for Generalizable Radiance Field Representation

Figure 3 for Local Implicit Ray Function for Generalizable Radiance Field Representation

Figure 4 for Local Implicit Ray Function for Generalizable Radiance Field Representation

We propose LIRF (Local Implicit Ray Function), a generalizable neural rendering approach for novel view rendering. Current generalizable neural radiance fields (NeRF) methods sample a scene with a single ray per pixel and may therefore render blurred or aliased views when the input views and rendered views capture scene content with different resolutions. To solve this problem, we propose LIRF to aggregate the information from conical frustums to construct a ray. Given 3D positions within conical frustums, LIRF takes 3D coordinates and the features of conical frustums as inputs and predicts a local volumetric radiance field. Since the coordinates are continuous, LIRF renders high-quality novel views at a continuously-valued scale via volume rendering. Besides, we predict the visible weights for each input view via transformer-based feature matching to improve the performance in occluded areas. Experimental results on real-world scenes validate that our method outperforms state-of-the-art methods on novel view rendering of unseen scenes at arbitrary scales.

* Accepted to CVPR 2023. Project page: https://xhuangcv.github.io/lirf/

Via

Access Paper or Ask Questions

Privileged Prior Information Distillation for Image Matting

Nov 25, 2022
Cheng Lyu, Jiake Xie, Bo Xu, Cheng Lu, Han Huang, Xin Huang, Ming Wu, Chuang Zhang, Yong Tang

Figure 1 for Privileged Prior Information Distillation for Image Matting

Figure 2 for Privileged Prior Information Distillation for Image Matting

Figure 3 for Privileged Prior Information Distillation for Image Matting

Figure 4 for Privileged Prior Information Distillation for Image Matting

Performance of trimap-free image matting methods is limited when trying to decouple the deterministic and undetermined regions, especially in the scenes where foregrounds are semantically ambiguous, chromaless, or high transmittance. In this paper, we propose a novel framework named Privileged Prior Information Distillation for Image Matting (PPID-IM) that can effectively transfer privileged prior environment-aware information to improve the performance of students in solving hard foregrounds. The prior information of trimap regulates only the teacher model during the training stage, while not being fed into the student network during actual inference. In order to achieve effective privileged cross-modality (i.e. trimap and RGB) information distillation, we introduce a Cross-Level Semantic Distillation (CLSD) module that reinforces the trimap-free students with more knowledgeable semantic representations and environment-aware information. We also propose an Attention-Guided Local Distillation module that efficiently transfers privileged local attributes from the trimap-based teacher to trimap-free students for the guidance of local-region optimization. Extensive experiments demonstrate the effectiveness and superiority of our PPID framework on the task of image matting. In addition, our trimap-free IndexNet-PPID surpasses the other competing state-of-the-art methods by a large margin, especially in scenarios with chromaless, weak texture, or irregular objects.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

Characterizing the Efficiency of Graph Neural Network Frameworks with a Magnifying Glass

Nov 06, 2022
Xin Huang, Jongryool Kim, Bradley Rees, Chul-Ho Lee

Figure 1 for Characterizing the Efficiency of Graph Neural Network Frameworks with a Magnifying Glass

Figure 2 for Characterizing the Efficiency of Graph Neural Network Frameworks with a Magnifying Glass

Figure 3 for Characterizing the Efficiency of Graph Neural Network Frameworks with a Magnifying Glass

Figure 4 for Characterizing the Efficiency of Graph Neural Network Frameworks with a Magnifying Glass

Graph neural networks (GNNs) have received great attention due to their success in various graph-related learning tasks. Several GNN frameworks have then been developed for fast and easy implementation of GNN models. Despite their popularity, they are not well documented, and their implementations and system performance have not been well understood. In particular, unlike the traditional GNNs that are trained based on the entire graph in a full-batch manner, recent GNNs have been developed with different graph sampling techniques for mini-batch training of GNNs on large graphs. While they improve the scalability, their training times still depend on the implementations in the frameworks as sampling and its associated operations can introduce non-negligible overhead and computational cost. In addition, it is unknown how much the frameworks are 'eco-friendly' from a green computing perspective. In this paper, we provide an in-depth study of two mainstream GNN frameworks along with three state-of-the-art GNNs to analyze their performance in terms of runtime and power/energy consumption. We conduct extensive benchmark experiments at several different levels and present detailed analysis results and observations, which could be helpful for further improvement and optimization.

* Accepted by IEEE IISWC 2022

Via

Access Paper or Ask Questions

P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving

Nov 03, 2022
Qiao Sun, Xin Huang, Brian C. Williams, Hang Zhao

Figure 1 for P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving

Figure 2 for P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving

Figure 3 for P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving

Figure 4 for P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving

Motion prediction is crucial in enabling safe motion planning for autonomous vehicles in interactive scenarios. It allows the planner to identify potential conflicts with other traffic agents and generate safe plans. Existing motion predictors often focus on reducing prediction errors, yet it remains an open question on how well they help identify the conflicts for the planner. In this paper, we evaluate state-of-the-art predictors through novel conflict-related metrics, such as the success rate of identifying conflicts. Surprisingly, the predictors suffer from a low success rate and thus lead to a large percentage of collisions when we test the prediction-planning system in an interactive simulator. To fill the gap, we propose a simple but effective alternative that combines a physics-based trajectory generator and a learning-based relation predictor to identify conflicts and infer conflict relations. We demonstrate that our predictor, P4P, achieves superior performance over existing learning-based predictors in realistic interactive driving scenarios from Waymo Open Motion Dataset.

* 7 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

InterSim: Interactive Traffic Simulation via Explicit Relation Modeling

Oct 26, 2022
Qiao Sun, Xin Huang, Brian C. Williams, Hang Zhao

Figure 1 for InterSim: Interactive Traffic Simulation via Explicit Relation Modeling

Figure 2 for InterSim: Interactive Traffic Simulation via Explicit Relation Modeling

Figure 3 for InterSim: Interactive Traffic Simulation via Explicit Relation Modeling

Figure 4 for InterSim: Interactive Traffic Simulation via Explicit Relation Modeling

Interactive traffic simulation is crucial to autonomous driving systems by enabling testing for planners in a more scalable and safe way compared to real-world road testing. Existing approaches learn an agent model from large-scale driving data to simulate realistic traffic scenarios, yet it remains an open question to produce consistent and diverse multi-agent interactive behaviors in crowded scenes. In this work, we present InterSim, an interactive traffic simulator for testing autonomous driving planners. Given a test plan trajectory from the ego agent, InterSim reasons about the interaction relations between the agents in the scene and generates realistic trajectories for each environment agent that are consistent with the relations. We train and validate our model on a large-scale interactive driving dataset. Experiment results show that InterSim achieves better simulation realism and reactivity in two simulation tasks compared to a state-of-the-art learning-based traffic simulator.

* Accepted at IROS 2022. Author version with 8 pages, 4 figures, and 2 tables. Code and demo available at paper website: https://tsinghua-mars-lab.github.io/InterSim/

Via

Access Paper or Ask Questions

Multi-Agent Chance-Constrained Stochastic Shortest Path with Application to Risk-Aware Intelligent Intersection

Oct 03, 2022
Majid Khonji, Rashid Alyassi, Wolfgang Merkt, Areg Karapetyan, Xin Huang, Sungkweon Hong, Jorge Dias, Brian Williams

Figure 1 for Multi-Agent Chance-Constrained Stochastic Shortest Path with Application to Risk-Aware Intelligent Intersection

Figure 2 for Multi-Agent Chance-Constrained Stochastic Shortest Path with Application to Risk-Aware Intelligent Intersection

Figure 3 for Multi-Agent Chance-Constrained Stochastic Shortest Path with Application to Risk-Aware Intelligent Intersection

Figure 4 for Multi-Agent Chance-Constrained Stochastic Shortest Path with Application to Risk-Aware Intelligent Intersection

In transportation networks, where traffic lights have traditionally been used for vehicle coordination, intersections act as natural bottlenecks. A formidable challenge for existing automated intersections lies in detecting and reasoning about uncertainty from the operating environment and human-driven vehicles. In this paper, we propose a risk-aware intelligent intersection system for autonomous vehicles (AVs) as well as human-driven vehicles (HVs). We cast the problem as a novel class of Multi-agent Chance-Constrained Stochastic Shortest Path (MCC-SSP) problems and devise an exact Integer Linear Programming (ILP) formulation that is scalable in the number of agents' interaction points (e.g., potential collision points at the intersection). In particular, when the number of agents within an interaction point is small, which is often the case in intersections, the ILP has a polynomial number of variables and constraints. To further improve the running time performance, we show that the collision risk computation can be performed offline. Additionally, a trajectory optimization workflow is provided to generate risk-aware trajectories for any given intersection. The proposed framework is implemented in CARLA simulator and evaluated under a fully autonomous intersection with AVs only as well as in a hybrid setup with a signalized intersection for HVs and an intelligent scheme for AVs. As verified via simulations, the featured approach improves intersection's efficiency by up to $200\%$ while also conforming to the specified tunable risk threshold.

Via

Access Paper or Ask Questions