Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yizhou Wang

Bayesian Intervention Optimization for Causal Discovery

Jun 16, 2024

Yuxuan Wang, Mingzhou Liu, Xinwei Sun, Wei Wang, Yizhou Wang

Figure 1 for Bayesian Intervention Optimization for Causal Discovery

Figure 2 for Bayesian Intervention Optimization for Causal Discovery

Figure 3 for Bayesian Intervention Optimization for Causal Discovery

Figure 4 for Bayesian Intervention Optimization for Causal Discovery

Abstract:Causal discovery is crucial for understanding complex systems and informing decisions. While observational data can uncover causal relationships under certain assumptions, it often falls short, making active interventions necessary. Current methods, such as Bayesian and graph-theoretical approaches, do not prioritize decision-making and often rely on ideal conditions or information gain, which is not directly related to hypothesis testing. We propose a novel Bayesian optimization-based method inspired by Bayes factors that aims to maximize the probability of obtaining decisive and correct evidence. Our approach uses observational data to estimate causal models under different hypotheses, evaluates potential interventions pre-experimentally, and iteratively updates priors to refine interventions. We demonstrate the effectiveness of our method through various experiments. Our contributions provide a robust framework for efficient causal discovery through active interventions, enhancing the practical application of theoretical advancements.

Via

Access Paper or Ask Questions

CLoG: Benchmarking Continual Learning of Image Generation Models

Jun 07, 2024

Haotian Zhang, Junting Zhou, Haowei Lin, Hang Ye, Jianhua Zhu, Zihao Wang, Liangcai Gao, Yizhou Wang, Yitao Liang

Figure 1 for CLoG: Benchmarking Continual Learning of Image Generation Models

Figure 2 for CLoG: Benchmarking Continual Learning of Image Generation Models

Figure 3 for CLoG: Benchmarking Continual Learning of Image Generation Models

Figure 4 for CLoG: Benchmarking Continual Learning of Image Generation Models

Abstract:Continual Learning (CL) poses a significant challenge in Artificial Intelligence, aiming to mirror the human ability to incrementally acquire knowledge and skills. While extensive research has focused on CL within the context of classification tasks, the advent of increasingly powerful generative models necessitates the exploration of Continual Learning of Generative models (CLoG). This paper advocates for shifting the research focus from classification-based CL to CLoG. We systematically identify the unique challenges presented by CLoG compared to traditional classification-based CL. We adapt three types of existing CL methodologies, replay-based, regularization-based, and parameter-isolation-based methods to generative tasks and introduce comprehensive benchmarks for CLoG that feature great diversity and broad task coverage. Our benchmarks and results yield intriguing insights that can be valuable for developing future CLoG methods. Additionally, we will release a codebase designed to facilitate easy benchmarking and experimentation in CLoG publicly at https://github.com/linhaowei1/CLoG. We believe that shifting the research focus to CLoG will benefit the continual learning community and illuminate the path for next-generation AI-generated content (AIGC) in a lifelong learning paradigm.

Via

Access Paper or Ask Questions

Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

Jun 03, 2024

Fei Gao, Siwen Wang, Churan Wang, Fandong Zhang, Hong-Yu Zhou, Yizhou Wang, Gang Yu, Yizhou Yu

Figure 1 for Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

Figure 2 for Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

Figure 3 for Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

Figure 4 for Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

Abstract:Medical image analysis suffers from a shortage of data, whether annotated or not. This becomes even more pronounced when it comes to 3D medical images. Self-Supervised Learning (SSL) can partially ease this situation by using unlabeled data. However, most existing SSL methods can only make use of data in a single dimensionality (e.g. 2D or 3D), and are incapable of enlarging the training dataset by using data with differing dimensionalities jointly. In this paper, we propose a new cross-dimensional SSL framework based on a pseudo-3D transformation (CDSSL-P3D), that can leverage both 2D and 3D data for joint pre-training. Specifically, we introduce an image transformation based on the im2col algorithm, which converts 2D images into a format consistent with 3D data. This transformation enables seamless integration of 2D and 3D data, and facilitates cross-dimensional self-supervised learning for 3D medical image analysis. We run extensive experiments on 13 downstream tasks, including 2D and 3D classification and segmentation. The results indicate that our CDSSL-P3D achieves superior performance, outperforming other advanced SSL methods.

* MICCAI 2024 accept

Via

Access Paper or Ask Questions

Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

May 28, 2024

Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

Figure 1 for Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

Figure 2 for Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

Figure 3 for Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

Figure 4 for Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

Abstract:Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Instruct-ReID is the first exploration of a general ReID setting, where existing 6 ReID tasks can be viewed as special cases by assigning different instructions. To facilitate research in this new instruct-ReID task, we propose a large-scale OmniReID++ benchmark equipped with diverse data and comprehensive evaluation methods e.g., task specific and task-free evaluation settings. In the task-specific evaluation setting, gallery sets are categorized according to specific ReID tasks. We propose a novel baseline model, IRM, with an adaptive triplet loss to handle various retrieval tasks within a unified framework. For task-free evaluation setting, where target person images are retrieved from task-agnostic gallery sets, we further propose a new method called IRM++ with novel memory bank-assisted learning. Extensive evaluations of IRM and IRM++ on OmniReID++ benchmark demonstrate the superiority of our proposed methods, achieving state-of-the-art performance on 10 test sets. The datasets, the model, and the code will be available at https://github.com/hwz-zju/Instruct-ReID

* arXiv admin note: substantial text overlap with arXiv:2306.07520

Via

Access Paper or Ask Questions

SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

May 03, 2024

Qian Long, Fangwei Zhong, Mingdong Wu, Yizhou Wang, Song-Chun Zhu

Figure 1 for SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

Figure 2 for SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

Figure 3 for SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

Figure 4 for SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

Abstract:Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.

* AAAI 2024 Cooperative Multi-Agent Systems Decision-Making and Learning (CMASDL) Workshop

Via

Access Paper or Ask Questions

The 8th AI City Challenge

Apr 15, 2024

Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty(+14 more)

Abstract:The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements.

* Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

Via

Access Paper or Ask Questions

Rewrite the Stars

Mar 29, 2024

Xu Ma, Xiyang Dai, Yue Bai, Yizhou Wang, Yun Fu

Abstract:Recent studies have drawn attention to the untapped potential of the "star operation" (element-wise multiplication) in network design. While intuitive explanations abound, the foundational rationale behind its application remains largely unexplored. Our study attempts to reveal the star operation's ability to map inputs into high-dimensional, non-linear feature spaces -- akin to kernel tricks -- without widening the network. We further introduce StarNet, a simple yet powerful prototype, demonstrating impressive performance and low latency under compact network structure and efficient budget. Like stars in the sky, the star operation appears unremarkable but holds a vast universe of potential. Our work encourages further exploration across tasks, with codes available at https://github.com/ma-xu/Rewrite-the-Stars.

* Accepted by CVPR 2024. Codes are made publically available at https://github.com/ma-xu/Rewrite-the-Stars

Via

Access Paper or Ask Questions

Don't Judge by the Look: Towards Motion Coherent Video Representation

Mar 25, 2024

Yitian Zhang, Yue Bai, Huan Wang, Yizhou Wang, Yun Fu

Figure 1 for Don't Judge by the Look: Towards Motion Coherent Video Representation

Figure 2 for Don't Judge by the Look: Towards Motion Coherent Video Representation

Figure 3 for Don't Judge by the Look: Towards Motion Coherent Video Representation

Figure 4 for Don't Judge by the Look: Towards Motion Coherent Video Representation

Abstract:Current training pipelines in object recognition neglect Hue Jittering when doing data augmentation as it not only brings appearance changes that are detrimental to classification, but also the implementation is inefficient in practice. In this study, we investigate the effect of hue variance in the context of video understanding and find this variance to be beneficial since static appearances are less important in videos that contain motion information. Based on this observation, we propose a data augmentation method for video understanding, named Motion Coherent Augmentation (MCA), that introduces appearance variation in videos and implicitly encourages the model to prioritize motion patterns, rather than static appearances. Concretely, we propose an operation SwapMix to efficiently modify the appearance of video samples, and introduce Variation Alignment (VA) to resolve the distribution shift caused by SwapMix, enforcing the model to learn appearance invariant representations. Comprehensive empirical evaluation across various architectures and different datasets solidly validates the effectiveness and generalization ability of MCA, and the application of VA in other augmentation methods. Code is available at https://github.com/BeSpontaneous/MCA-pytorch.

* Accepted by ICLR2024

Via

Access Paper or Ask Questions

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Mar 19, 2024

Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr

Abstract:We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini. Our approach consists of a detection prompting toolkit inspired by high-precision detection priors and a new Chain-of-Thought to implement these prompts. Specifically, the prompts in the toolkit are designed to guide the MLLM to focus on regional information (e.g., zooming in), read coordinates according to measure standards (e.g., overlaying rulers and compasses), and infer from the contextual information (e.g., overlaying scene graphs). Building upon these tools, the new detection chain-of-thought can automatically decompose the task into simple subtasks, diagnose the predictions, and plan for progressive box refinements. The effectiveness of our framework is demonstrated across a spectrum of detection tasks, especially hard cases. Compared to existing state-of-the-art methods, GPT-4V with our DetToolChain improves state-of-the-art object detectors by +21.5% AP50 on MS COCO Novel class set for open-vocabulary detection, +24.23% Acc on RefCOCO val set for zero-shot referring expression comprehension, +14.5% AP on D-cube describe object detection FULL setting.

Via

Access Paper or Ask Questions

Efficient Action Counting with Dynamic Queries

Mar 05, 2024

Zishi Li, Xiaoxuan Ma, Qiuyan Shang, Wentao Zhu, Hai Ci, Yu Qiao, Yizhou Wang

Abstract:Temporal repetition counting aims to quantify the repeated action cycles within a video. The majority of existing methods rely on the similarity correlation matrix to characterize the repetitiveness of actions, but their scalability is hindered due to the quadratic computational complexity. In this work, we introduce a novel approach that employs an action query representation to localize repeated action cycles with linear computational complexity. Based on this representation, we further develop two key components to tackle the essential challenges of temporal repetition counting. Firstly, to facilitate open-set action counting, we propose the dynamic update scheme on action queries. Unlike static action queries, this approach dynamically embeds video features into action queries, offering a more flexible and generalizable representation. Secondly, to distinguish between actions of interest and background noise actions, we incorporate inter-query contrastive learning to regularize the video representations corresponding to different action queries. As a result, our method significantly outperforms previous works, particularly in terms of long video sequences, unseen actions, and actions at various speeds. On the challenging RepCountA benchmark, we outperform the state-of-the-art method TransRAC by 26.5% in OBO accuracy, with a 22.7% mean error decrease and 94.1% computational burden reduction. Code is available at https://github.com/lizishi/DeTRC.

* code: https://github.com/lizishi/DeTRC, proj page: https://shirleymaxx.github.io/DeTRC/

Via

Access Paper or Ask Questions