Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuan Zhang

Airon Technology CO. LTD, Carnegie Mellon University

RICASSO: Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure

Oct 14, 2024

Xuan Zhang, Sin Chee Chin, Tingxuan Gao, Wenming Yang

Figure 1 for RICASSO: Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure

Figure 2 for RICASSO: Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure

Figure 3 for RICASSO: Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure

Figure 4 for RICASSO: Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure

Abstract:In real-world scenarios, deep learning models often face challenges from both imbalanced (long-tailed) and out-of-distribution (OOD) data. However, existing joint methods rely on real OOD data, which leads to unnecessary trade-offs. In contrast, our research shows that data mixing, a potent augmentation technique for long-tailed recognition, can generate pseudo-OOD data that exhibit the features of both in-distribution (ID) data and OOD data. Therefore, by using mixed data instead of real OOD data, we can address long-tailed recognition and OOD detection holistically. We propose a unified framework called Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure (RICASSO), where "self-supervised" denotes that we only use ID data for outlier exposure. RICASSO includes three main strategies: Norm-Odd-Duality-Based Outlier Exposure: Uses mixed data as pseudo-OOD data, enabling simultaneous ID data rebalancing and outlier exposure through a single loss function. Ambiguity-Aware Logits Adjustment: Utilizes the ambiguity of ID data to adaptively recalibrate logits. Contrastive Boundary-Center Learning: Combines Virtual Boundary Learning and Dual-Entropy Center Learning to use mixed data for better feature separation and clustering, with Representation Consistency Learning for robustness. Extensive experiments demonstrate that RICASSO achieves state-of-the-art performance in long-tailed recognition and significantly improves OOD detection compared to our baseline (27% improvement in AUROC and 61% reduction in FPR on the iNaturalist2018 dataset). On iNaturalist2018, we even outperforms methods using real OOD data. The code will be made public soon.

* 14 pages, 2 figures

Via

Access Paper or Ask Questions

IRIS: Interactive Responsive Intelligent Segmentation for 3D Affordance Analysis

Sep 17, 2024

Meng Chu, Xuan Zhang

Abstract:Recent advancements in large language and vision-language models have significantly enhanced multimodal understanding, yet translating high-level linguistic instructions into precise robotic actions in 3D space remains challenging. This paper introduces IRIS (Interactive Responsive Intelligent Segmentation), a novel training-free multimodal system for 3D affordance segmentation, alongside a benchmark for evaluating interactive language-guided affordance in everyday environments. IRIS integrates a large multimodal model with a specialized 3D vision network, enabling seamless fusion of 2D and 3D visual understanding with language comprehension. To facilitate evaluation, we present a dataset of 10 typical indoor environments, each with 50 images annotated for object actions and 3D affordance segmentation. Extensive experiments demonstrate IRIS's capability in handling interactive 3D affordance segmentation tasks across diverse settings, showcasing competitive performance across various metrics. Our results highlight IRIS's potential for enhancing human-robot interaction based on affordance understanding in complex indoor environments, advancing the development of more intuitive and efficient robotic systems for real-world applications.

Via

Access Paper or Ask Questions

Ask-before-Plan: Proactive Language Agents for Real-World Planning

Jun 18, 2024

Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, Tat-Seng Chua

Figure 1 for Ask-before-Plan: Proactive Language Agents for Real-World Planning

Figure 2 for Ask-before-Plan: Proactive Language Agents for Real-World Planning

Figure 3 for Ask-before-Plan: Proactive Language Agents for Real-World Planning

Figure 4 for Ask-before-Plan: Proactive Language Agents for Real-World Planning

Abstract:The evolution of large language models (LLMs) has enhanced the planning capabilities of language agents in diverse real-world scenarios. Despite these advancements, the potential of LLM-powered agents to comprehend ambiguous user instructions for reasoning and decision-making is still under exploration. In this work, we introduce a new task, Proactive Agent Planning, which requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction, invoke external tools to collect valid information, and generate a plan to fulfill the user's demands. To study this practical problem, we establish a new benchmark dataset, Ask-before-Plan. To tackle the deficiency of LLMs in proactive planning, we propose a novel multi-agent framework, Clarification-Execution-Planning (\texttt{CEP}), which consists of three agents specialized in clarification, execution, and planning. We introduce the trajectory tuning scheme for the clarification agent and static execution agent, as well as the memory recollection mechanism for the dynamic execution agent. Extensive evaluations and comprehensive analyses conducted on the Ask-before-Plan dataset validate the effectiveness of our proposed framework.

Via

Access Paper or Ask Questions

Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Jun 13, 2024

Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin

Figure 1 for Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Figure 2 for Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Figure 3 for Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Figure 4 for Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Abstract:The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT decoding might overlook. This deliberation, however, comes at the cost of significantly increased inference complexity. In this work, we demonstrate that fine-tuning LLMs leveraging the search tree constructed by ToT allows CoT to achieve similar or better performance, thereby avoiding the substantial inference burden. This is achieved through Chain of Preference Optimization (CPO), where LLMs are fine-tuned to align each step of the CoT reasoning paths with those of ToT using the inherent preference information in the tree-search process. Extensive experimental results show that CPO significantly improves LLM performance in solving a variety of complex problems, including question answering, fact verification, and arithmetic reasoning, demonstrating its effectiveness. Our code is available at https://github.com/sail-sg/CPO.

Via

Access Paper or Ask Questions

Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

Jun 01, 2024

Shengyu Tao, Mengtian Zhang, Zixi Zhao, Haoyang Li, Ruifei Ma, Yunhong Che, Xin Sun, Lin Su, Xiangyu Chen, Zihao Zhou(+11 more)

Figure 1 for Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

Figure 2 for Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

Figure 3 for Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

Figure 4 for Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

Abstract:Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed machine learning approach can quantify and visualize temporally resolved losses concerning thermodynamics and kinetics only using electric signals. Our method enables non-destructive degradation pattern characterization, expediting temperature-adaptable predictions of entire lifetime trajectories, rather than end-of-life points. The verification speed is 25 times faster yet maintaining 95.1% accuracy across temperatures. Such advances facilitate more sustainable management of defective prototypes before massive production, establishing a 19.76 billion USD scrap material recycling market by 2060 in China. By incorporating stepwise charge acceptance as a measure of the initial manufacturing variability of normally identical batteries, we can immediately identify long-term degradation variations. We attribute the predictive power to interpreting machine learning insights using material-agnostic featurization taxonomy for degradation pattern decoupling. Our findings offer new possibilities for dynamic system analysis, such as battery prototype degradation, demonstrating that complex pattern evolutions can be accurately predicted in a non-destructive and data-driven fashion by integrating physics-informed machine learning.

Via

Access Paper or Ask Questions

Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

Apr 26, 2024

Xuan Zhang, Wei Gao

Figure 1 for Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

Figure 2 for Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

Figure 3 for Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

Figure 4 for Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

Abstract:Retrieval-augmented language models have exhibited promising performance across various areas of natural language processing (NLP), including fact-critical tasks. However, due to the black-box nature of advanced large language models (LLMs) and the non-retrieval-oriented supervision signal of specific tasks, the training of retrieval model faces significant challenges under the setting of black-box LLM. We propose an approach leveraging Fine-grained Feedback with Reinforcement Retrieval (FFRR) to enhance fact-checking on news claims by using black-box LLM. FFRR adopts a two-level strategy to gather fine-grained feedback from the LLM, which serves as a reward for optimizing the retrieval policy, by rating the retrieved documents based on the non-retrieval ground truth of the task. We evaluate our model on two public datasets for real-world news claim verification, and the results demonstrate that FFRR achieves significant improvements over strong LLM-enabled and non-LLM baselines.

* Accepted by COLING 2024

Via

Access Paper or Ask Questions

SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations

Mar 28, 2024

Xuan Zhang, Jacob Helwig, Yuchao Lin, Yaochen Xie, Cong Fu, Stephan Wojtowytsch, Shuiwang Ji

Figure 1 for SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations

Figure 2 for SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations

Figure 3 for SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations

Figure 4 for SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations

Abstract:We consider using deep neural networks to solve time-dependent partial differential equations (PDEs), where multi-scale processing is crucial for modeling complex, time-evolving dynamics. While the U-Net architecture with skip connections is commonly used by prior studies to enable multi-scale processing, our analysis shows that the need for features to evolve across layers results in temporally misaligned features in skip connections, which limits the model's performance. To address this limitation, we propose SineNet, consisting of multiple sequentially connected U-shaped network blocks, referred to as waves. In SineNet, high-resolution features are evolved progressively through multiple stages, thereby reducing the amount of misalignment within each stage. We furthermore analyze the role of skip connections in enabling both parallel and sequential processing of multi-scale information. Our method is rigorously tested on multiple PDE datasets, including the Navier-Stokes equations and shallow water equations, showcasing the advantages of our proposed approach over conventional U-Nets with a comparable parameter budget. We further demonstrate that increasing the number of waves in SineNet while maintaining the same number of parameters leads to a monotonically improved performance. The results highlight the effectiveness of SineNet and the potential of our approach in advancing the state-of-the-art in neural PDE solver design. Our code is available as part of AIRS (https://github.com/divelab/AIRS).

* The Twelfth International Conference on Learning Representations

Via

Access Paper or Ask Questions

Enhancing Continuous Domain Adaptation with Multi-Path Transfer Curriculum

Feb 26, 2024

Hanbing Liu, Jingge Wang, Xuan Zhang, Ye Guo, Yang Li

Abstract:Addressing the large distribution gap between training and testing data has long been a challenge in machine learning, giving rise to fields such as transfer learning and domain adaptation. Recently, Continuous Domain Adaptation (CDA) has emerged as an effective technique, closing this gap by utilizing a series of intermediate domains. This paper contributes a novel CDA method, W-MPOT, which rigorously addresses the domain ordering and error accumulation problems overlooked by previous studies. Specifically, we construct a transfer curriculum over the source and intermediate domains based on Wasserstein distance, motivated by theoretical analysis of CDA. Then we transfer the source model to the target domain through multiple valid paths in the curriculum using a modified version of continuous optimal transport. A bidirectional path consistency constraint is introduced to mitigate the impact of accumulated mapping errors during continuous transfer. We extensively evaluate W-MPOT on multiple datasets, achieving up to 54.1\% accuracy improvement on multi-session Alzheimer MR image classification and 94.7\% MSE reduction on battery capacity estimation.

Via

Access Paper or Ask Questions

On the Multi-turn Instruction Following for Conversational Web Agents

Feb 23, 2024

Yang Deng, Xuan Zhang, Wenxuan Zhang, Yifei Yuan, See-Kiong Ng, Tat-Seng Chua

Abstract:Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method.

Via

Access Paper or Ask Questions

SATac: A Thermoluminescence Enabled Tactile Sensor for Concurrent Perception of Temperature, Pressure, and Shear

Feb 01, 2024

Ziwu Song, Ran Yu, Xuan Zhang, Kit Wa Sou, Shilong Mu, Dengfeng Peng, Xiao-Ping Zhang, Wenbo Ding

Figure 1 for SATac: A Thermoluminescence Enabled Tactile Sensor for Concurrent Perception of Temperature, Pressure, and Shear

Figure 2 for SATac: A Thermoluminescence Enabled Tactile Sensor for Concurrent Perception of Temperature, Pressure, and Shear

Figure 3 for SATac: A Thermoluminescence Enabled Tactile Sensor for Concurrent Perception of Temperature, Pressure, and Shear

Figure 4 for SATac: A Thermoluminescence Enabled Tactile Sensor for Concurrent Perception of Temperature, Pressure, and Shear

Abstract:Most vision-based tactile sensors use elastomer deformation to infer tactile information, which can not sense some modalities, like temperature. As an important part of human tactile perception, temperature sensing can help robots better interact with the environment. In this work, we propose a novel multimodal vision-based tactile sensor, SATac, which can simultaneously perceive information of temperature, pressure, and shear. SATac utilizes thermoluminescence of strontium aluminate (SA) to sense a wide range of temperatures with exceptional resolution. Additionally, the pressure and shear can also be perceived by analyzing Voronoi diagram. A series of experiments are conducted to verify the performance of our proposed sensor. We also discuss the possible application scenarios and demonstrate how SATac could benefit robot perception capabilities.

Via

Access Paper or Ask Questions