Alert button
Picture for Yu Liu

Yu Liu

Alert button

Devil in the Landscapes: Inferring Epidemic Exposure Risks from Street View Imagery

Nov 09, 2023
Zhenyu Han, Yanxin Xi, Tong Xia, Yu Liu, Yong Li

Built environment supports all the daily activities and shapes our health. Leveraging informative street view imagery, previous research has established the profound correlation between the built environment and chronic, non-communicable diseases; however, predicting the exposure risk of infectious diseases remains largely unexplored. The person-to-person contacts and interactions contribute to the complexity of infectious disease, which is inherently different from non-communicable diseases. Besides, the complex relationships between street view imagery and epidemic exposure also hinder accurate predictions. To address these problems, we construct a regional mobility graph informed by the gravity model, based on which we propose a transmission-aware graph convolutional network (GCN) to capture disease transmission patterns arising from human mobility. Experiments show that the proposed model significantly outperforms baseline models by 8.54% in weighted F1, shedding light on a low-cost, scalable approach to assess epidemic exposure risks from street view imagery.

* Published in ACM SIGSPATIAL 2023 
Viaarxiv icon

CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing

Oct 29, 2023
Rukai Wei, Yu Liu, Jingkuan Song, Heng Cui, Yanzhao Xie, Ke Zhou

Compressing videos into binary codes can improve retrieval speed and reduce storage overhead. However, learning accurate hash codes for video retrieval can be challenging due to high local redundancy and complex global dependencies between video frames, especially in the absence of labels. Existing self-supervised video hashing methods have been effective in designing expressive temporal encoders, but have not fully utilized the temporal dynamics and spatial appearance of videos due to less challenging and unreliable learning tasks. To address these challenges, we begin by utilizing the contrastive learning task to capture global spatio-temporal information of videos for hashing. With the aid of our designed augmentation strategies, which focus on spatial and temporal variations to create positive pairs, the learning framework can generate hash codes that are invariant to motion, scale, and viewpoint. Furthermore, we incorporate two collaborative learning tasks, i.e., frame order verification and scene change regularization, to capture local spatio-temporal details within video frames, thereby enhancing the perception of temporal structure and the modeling of spatio-temporal relationships. Our proposed Contrastive Hashing with Global-Local Spatio-temporal Information (CHAIN) outperforms state-of-the-art self-supervised video hashing methods on four video benchmark datasets. Our codes will be released.

* 12 pages, 8 figures, accepted by ACM MM 2023 
Viaarxiv icon

Towards Large-scale Masked Face Recognition

Oct 25, 2023
Manyuan Zhang, Bingqi Ma, Guanglu Song, Yunxiao Wang, Hongsheng Li, Yu Liu

Figure 1 for Towards Large-scale Masked Face Recognition
Figure 2 for Towards Large-scale Masked Face Recognition
Figure 3 for Towards Large-scale Masked Face Recognition
Figure 4 for Towards Large-scale Masked Face Recognition

During the COVID-19 coronavirus epidemic, almost everyone is wearing masks, which poses a huge challenge for deep learning-based face recognition algorithms. In this paper, we will present our \textbf{championship} solutions in ICCV MFR WebFace260M and InsightFace unconstrained tracks. We will focus on four challenges in large-scale masked face recognition, i.e., super-large scale training, data noise handling, masked and non-masked face recognition accuracy balancing, and how to design inference-friendly model architecture. We hope that the discussion on these four aspects can guide future research towards more robust masked face recognition systems.

* the top1 solution for ICCV2021-MFR challenge 
Viaarxiv icon

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

Oct 24, 2023
Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li

Figure 1 for Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
Figure 2 for Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
Figure 3 for Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
Figure 4 for Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

The introduction of DETR represents a new paradigm for object detection. However, its decoder conducts classification and box localization using shared queries and cross-attention layers, leading to suboptimal results. We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object. Salient regions provide vital information for classification, while the boundaries around them are more favorable for box regression. Unfortunately, such spatial misalignment between these two tasks greatly hinders DETR's training. Therefore, in this work, we focus on decoupling localization and classification tasks in DETR. To achieve this, we introduce a new design scheme called spatially decoupled DETR (SD-DETR), which includes a task-aware query generation module and a disentangled feature learning process. We elaborately design the task-aware query initialization process and divide the cross-attention block in the decoder to allow the task-aware queries to match different visual regions. Meanwhile, we also observe that the prediction misalignment problem for high classification confidence and precise localization exists, so we propose an alignment loss to further guide the spatially decoupled DETR training. Through extensive experiments, we demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work. For instance, we improve the performance of Conditional DETR by 4.5 AP. By spatially disentangling the two tasks, our method overcomes the misalignment problem and greatly improves the performance of DETR for object detection.

* accepted by ICCV2023 
Viaarxiv icon

Masked Pretraining for Multi-Agent Decision Making

Oct 18, 2023
Jie Liu, Yinmin Zhang, Chuming Li, Chao Yang, Yaodong Yang, Yu Liu, Wanli Ouyang

Building a single generalist agent with zero-shot capability has recently sparked significant advancements in decision-making. However, extending this capability to multi-agent scenarios presents challenges. Most current works struggle with zero-shot capabilities, due to two challenges particular to the multi-agent settings: a mismatch between centralized pretraining and decentralized execution, and varying agent numbers and action spaces, making it difficult to create generalizable representations across diverse downstream tasks. To overcome these challenges, we propose a \textbf{Mask}ed pretraining framework for \textbf{M}ulti-\textbf{a}gent decision making (MaskMA). This model, based on transformer architecture, employs a mask-based collaborative learning strategy suited for decentralized execution with partial observation. Moreover, MaskMA integrates a generalizable action representation by dividing the action space into actions toward self-information and actions related to other entities. This flexibility allows MaskMA to tackle tasks with varying agent numbers and thus different action spaces. Extensive experiments in SMAC reveal MaskMA, with a single model pretrained on 11 training maps, can achieve an impressive 77.8% zero-shot win rate on 60 unseen test maps by decentralized execution, while also performing effectively on other types of downstream tasks (\textit{e.g.,} varied policies collaboration and ad hoc team play).

* 17 pages 
Viaarxiv icon

Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models

Oct 17, 2023
Zhanhui Zhou, Jie Liu, Chao Yang, Jing Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao

A single language model (LM), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences. Recent approaches thus pursue customization, training separate principle-based reward models to represent different alignment objectives (e.g. helpfulness, harmlessness, or honesty). Different LMs can then be trained for different preferences through multi-objective RLHF (MORLHF) with different objective weightings. Yet, RLHF is unstable and resource-heavy, especially for MORLHF with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO folds LM learning directly into reward modeling, aligning LMs with the weighted sum of all principle-based rewards using pure cross-entropy loss. While theoretically guaranteed to produce the same optimal solutions as MORLHF, MODPO is practically more stable and computationally efficient, obviating value function modeling and online sample collection. Empirical results in safety alignment and long-form question answering confirm that MODPO matches or outperforms existing methods, consistently producing one of the most competitive LM fronts that cater to diverse preferences with 3 times fewer computations compared with MORLHF.

Viaarxiv icon

Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations

Oct 13, 2023
Lu Li, Yuxin Pan, Ruobing Chen, Jie Liu, Zilin Wang, Yu Liu, Zhiheng Li

Figure 1 for Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations
Figure 2 for Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations
Figure 3 for Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations
Figure 4 for Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations

Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations. Considering that obtaining expert demonstrations can be costly, the focus of current IRL techniques is on learning a better-than-demonstrator policy using a reward function derived from sub-optimal demonstrations. However, existing IRL algorithms primarily tackle the challenge of trajectory ranking ambiguity when learning the reward function. They overlook the crucial role of considering the degree of difference between trajectories in terms of their returns, which is essential for further removing reward ambiguity. Additionally, it is important to note that the reward of a single transition is heavily influenced by the context information within the trajectory. To address these issues, we introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework. Unlike existing approaches, DRASRL takes into account both the ranking of trajectories and the degrees of dissimilarity between them to collaboratively eliminate reward ambiguity when learning a sequence of contextually informed reward signals. Specifically, we leverage the distance between policies, from which the trajectories are generated, as a measure to quantify the degree of differences between traces. This distance-aware information is then used to infer embeddings in the representation space for reward learning, employing the contrastive learning technique. Meanwhile, we integrate the pairwise ranking loss function to incorporate ranking information into the latent features. Moreover, we resort to the Transformer architecture to capture the contextual dependencies within the trajectories in the latent space, leading to more accurate reward estimation. Through extensive experimentation, our DRASRL framework demonstrates significant performance improvements over previous SOTA methods.

Viaarxiv icon

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Oct 12, 2023
Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu

Figure 1 for LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Figure 2 for LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Figure 3 for LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Figure 4 for LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity. In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios. Specificially, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules. By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger. Detailed benchmark results reveal the significant potential of such methods in building scalable and efficient decision intelligence. The code is available as part of OpenDILab at https://github.com/opendilab/LightZero.

* NeurIPS 2023 Spotlight 
Viaarxiv icon

Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

Oct 09, 2023
Shiyue Cao, Yueqin Yin, Lianghua Huang, Yu Liu, Xin Zhao, Deli Zhao, Kaiqi Huang

Figure 1 for Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
Figure 2 for Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
Figure 3 for Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
Figure 4 for Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

Vector-quantized image modeling has shown great potential in synthesizing high-quality images. However, generating high-resolution images remains a challenging task due to the quadratic computational overhead of the self-attention process. In this study, we seek to explore a more efficient two-stage framework for high-resolution image generation with improvements in the following three aspects. (1) Based on the observation that the first quantization stage has solid local property, we employ a local attention-based quantization model instead of the global attention mechanism used in previous methods, leading to better efficiency and reconstruction quality. (2) We emphasize the importance of multi-grained feature interaction during image generation and introduce an efficient attention mechanism that combines global attention (long-range semantic consistency within the whole image) and local attention (fined-grained details). This approach results in faster generation speed, higher generation fidelity, and improved resolution. (3) We propose a new generation pipeline incorporating autoencoding training and autoregressive generation strategy, demonstrating a better paradigm for image synthesis. Extensive experiments demonstrate the superiority of our approach in high-quality and high-resolution image reconstruction and generation.

* This paper is accepted to ICCV2023 
Viaarxiv icon

Continuous Invariance Learning

Oct 09, 2023
Yong Lin, Fan Zhou, Lu Tan, Lintao Ma, Jiameng Liu, Yansu He, Yuan Yuan, Yu Liu, James Zhang, Yujiu Yang, Hao Wang

Figure 1 for Continuous Invariance Learning
Figure 2 for Continuous Invariance Learning
Figure 3 for Continuous Invariance Learning
Figure 4 for Continuous Invariance Learning

Invariance learning methods aim to learn invariant features in the hope that they generalize under distributional shifts. Although many tasks are naturally characterized by continuous domains, current invariance learning techniques generally assume categorically indexed domains. For example, auto-scaling in cloud computing often needs a CPU utilization prediction model that generalizes across different times (e.g., time of a day and date of a year), where `time' is a continuous domain index. In this paper, we start by theoretically showing that existing invariance learning methods can fail for continuous domain problems. Specifically, the naive solution of splitting continuous domains into discrete ones ignores the underlying relationship among domains, and therefore potentially leads to suboptimal performance. To address this challenge, we then propose Continuous Invariance Learning (CIL), which extracts invariant features across continuously indexed domains. CIL is a novel adversarial procedure that measures and controls the conditional independence between the labels and continuous domain indices given the extracted features. Our theoretical analysis demonstrates the superiority of CIL over existing invariance learning methods. Empirical results on both synthetic and real-world datasets (including data collected from production systems) show that CIL consistently outperforms strong baselines among all the tasks.

Viaarxiv icon