Alert button
Picture for Xiaojie Guo

Xiaojie Guo

Alert button

TSTTC: A Large-Scale Dataset for Time-to-Contact Estimation in Driving Scenarios

Sep 06, 2023
Yuheng Shi, Zehao Huang, Yan Yan, Naiyan Wang, Xiaojie Guo

Figure 1 for TSTTC: A Large-Scale Dataset for Time-to-Contact Estimation in Driving Scenarios
Figure 2 for TSTTC: A Large-Scale Dataset for Time-to-Contact Estimation in Driving Scenarios
Figure 3 for TSTTC: A Large-Scale Dataset for Time-to-Contact Estimation in Driving Scenarios
Figure 4 for TSTTC: A Large-Scale Dataset for Time-to-Contact Estimation in Driving Scenarios

Time-to-Contact (TTC) estimation is a critical task for assessing collision risk and is widely used in various driver assistance and autonomous driving systems. The past few decades have witnessed development of related theories and algorithms. The prevalent learning-based methods call for a large-scale TTC dataset in real-world scenarios. In this work, we present a large-scale object oriented TTC dataset in the driving scene for promoting the TTC estimation by a monocular camera. To collect valuable samples and make data with different TTC values relatively balanced, we go through thousands of hours of driving data and select over 200K sequences with a preset data distribution. To augment the quantity of small TTC cases, we also generate clips using the latest Neural rendering methods. Additionally, we provide several simple yet effective TTC estimation baselines and evaluate them extensively on the proposed dataset to demonstrate their effectiveness. The proposed dataset is publicly available at https://open-dataset.tusen.ai/TSTTC.

* 19 pages, 9 figures 
Viaarxiv icon

Practical Edge Detection via Robust Collaborative Learning

Aug 27, 2023
Yuanbin Fu, Xiaojie Guo

Figure 1 for Practical Edge Detection via Robust Collaborative Learning
Figure 2 for Practical Edge Detection via Robust Collaborative Learning
Figure 3 for Practical Edge Detection via Robust Collaborative Learning
Figure 4 for Practical Edge Detection via Robust Collaborative Learning

Edge detection, as a core component in a wide range of visionoriented tasks, is to identify object boundaries and prominent edges in natural images. An edge detector is desired to be both efficient and accurate for practical use. To achieve the goal, two key issues should be concerned: 1) How to liberate deep edge models from inefficient pre-trained backbones that are leveraged by most existing deep learning methods, for saving the computational cost and cutting the model size; and 2) How to mitigate the negative influence from noisy or even wrong labels in training data, which widely exist in edge detection due to the subjectivity and ambiguity of annotators, for the robustness and accuracy. In this paper, we attempt to simultaneously address the above problems via developing a collaborative learning based model, termed PEdger. The principle behind our PEdger is that, the information learned from different training moments and heterogeneous (recurrent and non recurrent in this work) architectures, can be assembled to explore robust knowledge against noisy annotations, even without the help of pre-training on extra data. Extensive ablation studies together with quantitative and qualitative experimental comparisons on the BSDS500 and NYUD datasets are conducted to verify the effectiveness of our design, and demonstrate its superiority over other competitors in terms of accuracy, speed, and model size. Codes can be found at https://github.co/ForawardStar/PEdger.

Viaarxiv icon

Single Image Reflection Separation via Component Synergy

Aug 19, 2023
Qiming Hu, Xiaojie Guo

Figure 1 for Single Image Reflection Separation via Component Synergy
Figure 2 for Single Image Reflection Separation via Component Synergy
Figure 3 for Single Image Reflection Separation via Component Synergy
Figure 4 for Single Image Reflection Separation via Component Synergy

The reflection superposition phenomenon is complex and widely distributed in the real world, which derives various simplified linear and nonlinear formulations of the problem. In this paper, based on the investigation of the weaknesses of existing models, we propose a more general form of the superposition model by introducing a learnable residue term, which can effectively capture residual information during decomposition, guiding the separated layers to be complete. In order to fully capitalize on its advantages, we further design the network structure elaborately, including a novel dual-stream interaction mechanism and a powerful decomposition network with a semantic pyramid encoder. Extensive experiments and ablation studies are conducted to verify our superiority over state-of-the-art approaches on multiple real-world benchmark datasets. Our code is publicly available at https://github.com/mingcv/DSRNet.

* Accepted to ICCV 2023 
Viaarxiv icon

Embracing Uncertainty: Adaptive Vague Preference Policy Learning for Multi-round Conversational Recommendation

Jun 07, 2023
Gangyi Zhang, Chongming Gao, Wenqiang Lei, Xiaojie Guo, Shijun Li, Lingfei Wu, Hongshen Chen, Zhuozhi Ding, Sulong Xu, Xiangnan He

Figure 1 for Embracing Uncertainty: Adaptive Vague Preference Policy Learning for Multi-round Conversational Recommendation
Figure 2 for Embracing Uncertainty: Adaptive Vague Preference Policy Learning for Multi-round Conversational Recommendation
Figure 3 for Embracing Uncertainty: Adaptive Vague Preference Policy Learning for Multi-round Conversational Recommendation
Figure 4 for Embracing Uncertainty: Adaptive Vague Preference Policy Learning for Multi-round Conversational Recommendation

Conversational recommendation systems (CRS) effectively address information asymmetry by dynamically eliciting user preferences through multi-turn interactions. Existing CRS widely assumes that users have clear preferences. Under this assumption, the agent will completely trust the user feedback and treat the accepted or rejected signals as strong indicators to filter items and reduce the candidate space, which may lead to the problem of over-filtering. However, in reality, users' preferences are often vague and volatile, with uncertainty about their desires and changing decisions during interactions. To address this issue, we introduce a novel scenario called Vague Preference Multi-round Conversational Recommendation (VPMCR), which considers users' vague and volatile preferences in CRS.VPMCR employs a soft estimation mechanism to assign a non-zero confidence score for all candidate items to be displayed, naturally avoiding the over-filtering problem. In the VPMCR setting, we introduce an solution called Adaptive Vague Preference Policy Learning (AVPPL), which consists of two main components: Uncertainty-aware Soft Estimation (USE) and Uncertainty-aware Policy Learning (UPL). USE estimates the uncertainty of users' vague feedback and captures their dynamic preferences using a choice-based preferences extraction module and a time-aware decaying strategy. UPL leverages the preference distribution estimated by USE to guide the conversation and adapt to changes in users' preferences to make recommendations or ask for attributes. Our extensive experiments demonstrate the effectiveness of our method in the VPMCR scenario, highlighting its potential for practical applications and improving the overall performance and applicability of CRS in real-world settings, particularly for users with vague or dynamic preferences.

Viaarxiv icon

Automated Code generation for Information Technology Tasks in YAML through Large Language Models

May 05, 2023
Saurabh Pujar, Luca Buratti, Xiaojie Guo, Nicolas Dupuis, Burn Lewis, Sahil Suneja, Atin Sood, Ganesh Nalawade, Matthew Jones, Alessandro Morari, Ruchir Puri

Figure 1 for Automated Code generation for Information Technology Tasks in YAML through Large Language Models
Figure 2 for Automated Code generation for Information Technology Tasks in YAML through Large Language Models
Figure 3 for Automated Code generation for Information Technology Tasks in YAML through Large Language Models
Figure 4 for Automated Code generation for Information Technology Tasks in YAML through Large Language Models

The recent improvement in code generation capabilities due to the use of large language models has mainly benefited general purpose programming languages. Domain specific languages, such as the ones used for IT Automation, have received far less attention, despite involving many active developers and being an essential component of modern cloud platforms. This work focuses on the generation of Ansible-YAML, a widely used markup language for IT Automation. We present Ansible Wisdom, a natural-language to Ansible-YAML code generation tool, aimed at improving IT automation productivity. Ansible Wisdom is a transformer-based model, extended by training with a new dataset containing Ansible-YAML. We also develop two novel performance metrics for YAML and Ansible to capture the specific characteristics of this domain. Results show that Ansible Wisdom can accurately generate Ansible script from natural language prompts with performance comparable or better than existing state of the art code generation models.

Viaarxiv icon

Adaptive Texture Filtering for Single-Domain Generalized Segmentation

Mar 06, 2023
Xinhui Li, Mingjia Li, Yaxing Wang, Chuan-Xian Ren, Xiaojie Guo

Figure 1 for Adaptive Texture Filtering for Single-Domain Generalized Segmentation
Figure 2 for Adaptive Texture Filtering for Single-Domain Generalized Segmentation
Figure 3 for Adaptive Texture Filtering for Single-Domain Generalized Segmentation
Figure 4 for Adaptive Texture Filtering for Single-Domain Generalized Segmentation

Domain generalization in semantic segmentation aims to alleviate the performance degradation on unseen domains through learning domain-invariant features. Existing methods diversify images in the source domain by adding complex or even abnormal textures to reduce the sensitivity to domain specific features. However, these approaches depend heavily on the richness of the texture bank, and training them can be time-consuming. In contrast to importing textures arbitrarily or augmenting styles randomly, we focus on the single source domain itself to achieve generalization. In this paper, we present a novel adaptive texture filtering mechanism to suppress the influence of texture without using augmentation, thus eliminating the interference of domain-specific features. Further, we design a hierarchical guidance generalization network equipped with structure-guided enhancement modules, which purpose is to learn the domain-invariant generalized knowledge. Extensive experiments together with ablation studies on widely-used datasets are conducted to verify the effectiveness of the proposed model, and reveal its superiority over other state-of-the-art alternatives.

* Accepted by AAAI 2023 
Viaarxiv icon

Face Inverse Rendering via Hierarchical Decoupling

Jan 17, 2023
Meng Wang, Xiaojie Guo, Wenjing Dai, Jiawan Zhang

Figure 1 for Face Inverse Rendering via Hierarchical Decoupling
Figure 2 for Face Inverse Rendering via Hierarchical Decoupling
Figure 3 for Face Inverse Rendering via Hierarchical Decoupling
Figure 4 for Face Inverse Rendering via Hierarchical Decoupling

Previous face inverse rendering methods often require synthetic data with ground truth and/or professional equipment like a lighting stage. However, a model trained on synthetic data or using pre-defined lighting priors is typically unable to generalize well for real-world situations, due to the gap between synthetic data/lighting priors and real data. Furthermore, for common users, the professional equipment and skill make the task expensive and complex. In this paper, we propose a deep learning framework to disentangle face images in the wild into their corresponding albedo, normal, and lighting components. Specifically, a decomposition network is built with a hierarchical subdivision strategy, which takes image pairs captured from arbitrary viewpoints as input. In this way, our approach can greatly mitigate the pressure from data preparation, and significantly broaden the applicability of face inverse rendering. Extensive experiments are conducted to demonstrate the efficacy of our design, and show its superior performance in face relighting over other state-of-the-art alternatives. {Our code is available at \url{https://github.com/AutoHDR/HD-Net.git}}

* IEEE Transactions on Image Processing, Volume: 31; Year: 2022; Page: 5748 - 5761  
Viaarxiv icon

Theoretical Characterization of How Neural Network Pruning Affects its Generalization

Jan 05, 2023
Hongru Yang, Yingbin Liang, Xiaojie Guo, Lingfei Wu, Zhangyang Wang

Figure 1 for Theoretical Characterization of How Neural Network Pruning Affects its Generalization
Figure 2 for Theoretical Characterization of How Neural Network Pruning Affects its Generalization
Figure 3 for Theoretical Characterization of How Neural Network Pruning Affects its Generalization
Figure 4 for Theoretical Characterization of How Neural Network Pruning Affects its Generalization

It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even slightly boost the generalization performance. Theoretical understanding for such experimental observations are yet to be developed. This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. Specifically, this work considers a classification task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance. More surprisingly, the generalization bound gets better as the pruning fraction gets larger. To complement this positive result, this work further shows a negative result: there exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing. This further suggests that pruning can change the feature learning process, which leads to the performance drop of the pruned neural network.

Viaarxiv icon

Multi-objective Deep Data Generation with Correlated Property Control

Oct 06, 2022
Shiyu Wang, Xiaojie Guo, Xuanyang Lin, Bo Pan, Yuanqi Du, Yinkai Wang, Yanfang Ye, Ashley Ann Petersen, Austin Leitgeb, Saleh AlKhalifa, Kevin Minbiole, Bill Wuest, Amarda Shehu, Liang Zhao

Figure 1 for Multi-objective Deep Data Generation with Correlated Property Control
Figure 2 for Multi-objective Deep Data Generation with Correlated Property Control
Figure 3 for Multi-objective Deep Data Generation with Correlated Property Control
Figure 4 for Multi-objective Deep Data Generation with Correlated Property Control

Developing deep generative models has been an emerging field due to the ability to model and generate complex data for various purposes, such as image synthesis and molecular design. However, the advancement of deep generative models is limited by challenges to generate objects that possess multiple desired properties: 1) the existence of complex correlation among real-world properties is common but hard to identify; 2) controlling individual property enforces an implicit partially control of its correlated properties, which is difficult to model; 3) controlling multiple properties under various manners simultaneously is hard and under-explored. We address these challenges by proposing a novel deep generative framework that recovers semantics and the correlation of properties through disentangled latent vectors. The correlation is handled via an explainable mask pooling layer, and properties are precisely retained by generated objects via the mutual dependence between latent vectors and properties. Our generative model preserves properties of interest while handling correlation and conflicts of properties under a multi-objective optimization framework. The experiments demonstrate our model's superior performance in generating data with desired properties.

* This paper has been accepted by NeurIPS 2022 
Viaarxiv icon

Adaptive Perception Transformer for Temporal Action Localization

Aug 25, 2022
Yizheng Ouyang, Tianjin Zhang, Weibo Gu, Hongfa Wang, Liming Wang, Xiaojie Guo

Figure 1 for Adaptive Perception Transformer for Temporal Action Localization
Figure 2 for Adaptive Perception Transformer for Temporal Action Localization
Figure 3 for Adaptive Perception Transformer for Temporal Action Localization
Figure 4 for Adaptive Perception Transformer for Temporal Action Localization

Temporal action localization aims to predict the boundary and category of each action instance in untrimmed long videos. Most of previous methods based on anchors or proposals neglect the global-local context interaction in entire video sequences. Besides, their multi-stage designs cannot generate action boundaries and categories straightforwardly. To address the above issues, this paper proposes a novel end-to-end model, called adaptive perception transformer (AdaPerFormer for short). Specifically, AdaPerFormer explores a dual-branch multi-head self-attention mechanism. One branch takes care of the global perception attention, which can model entire video sequences and aggregate global relevant contexts. While the other branch concentrates on the local convolutional shift to aggregate intra-frame and inter-frame information through our bidirectional shift operation. The end-to-end nature produces the boundaries and categories of video actions without extra steps. Extensive experiments together with ablation studies are provided to reveal the effectiveness of our design. Our method achieves a state-of-the-art accuracy on the THUMOS14 dataset (65.8\% in terms of mAP@0.5, 42.6\% mAP@0.7, and 62.7\% mAP@Avg), and obtains competitive performance on the ActivityNet-1.3 dataset with an average mAP of 36.1\%. The code and models are available at https://github.com/SouperO/AdaPerFormer.

Viaarxiv icon