Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi Cai

Comparison of Single Carrier FTN-QAM and PCS-QAM for Amplifier-less Coherent Communication Systems

Jan 25, 2026

Dongdong Zou, Fan Li, Wei Wang, Zhongxing Tian, Yuheng Liu, Gangxiang Shen, Yi Cai

Abstract:A performance comparison of FTN-QAM and PCS-QAM for amplifier-less short-reach coherent communication systems is provided. With the applications of phase tracking partial response DFE and turbo equalization strategy, FTN-16QAM exhibits about 0.9dB power margin advantage over PCS-64QAM.

Via

Access Paper or Ask Questions

Rethinking Explanation Evaluation under the Retraining Scheme

Nov 11, 2025

Yi Cai, Thibaud Ardoin, Mayank Gulati, Gerhard Wunder

Figure 1 for Rethinking Explanation Evaluation under the Retraining Scheme

Figure 2 for Rethinking Explanation Evaluation under the Retraining Scheme

Figure 3 for Rethinking Explanation Evaluation under the Retraining Scheme

Figure 4 for Rethinking Explanation Evaluation under the Retraining Scheme

Abstract:Feature attribution has gained prominence as a tool for explaining model decisions, yet evaluating explanation quality remains challenging due to the absence of ground-truth explanations. To circumvent this, explanation-guided input manipulation has emerged as an indirect evaluation strategy, measuring explanation effectiveness through the impact of input modifications on model outcomes during inference. Despite the widespread use, a major concern with inference-based schemes is the distribution shift caused by such manipulations, which undermines the reliability of their assessments. The retraining-based scheme ROAR overcomes this issue by adapting the model to the altered data distribution. However, its evaluation results often contradict the theoretical foundations of widely accepted explainers. This work investigates this misalignment between empirical observations and theoretical expectations. In particular, we identify the sign issue as a key factor responsible for residual information that ultimately distorts retraining-based evaluation. Based on the analysis, we show that a straightforward reframing of the evaluation process can effectively resolve the identified issue. Building on the existing framework, we further propose novel variants that jointly structure a comprehensive perspective on explanation evaluation. These variants largely improve evaluation efficiency over the standard retraining protocol, thereby enhancing practical applicability for explainer selection and benchmarking. Following our proposed schemes, empirical results across various data scales provide deeper insights into the performance of carefully selected explainers, revealing open challenges and future directions in explainability research.

Via

Access Paper or Ask Questions

CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction

May 28, 2025

Jiali Chen, Xusen Hei, HongFei Liu, Yuancheng Wei, Zikun Deng, Jiayuan Xie, Yi Cai, Li Qing

Abstract:Computer-aided design (CAD) is crucial in prototyping 3D objects through geometric instructions (i.e., CAD programs). In practical design workflows, designers often engage in time-consuming reviews and refinements of these prototypes by comparing them with reference images. To bridge this gap, we introduce the CAD review task to automatically detect and correct potential errors, ensuring consistency between the constructed 3D objects and reference images. However, recent advanced multimodal large language models (MLLMs) struggle to recognize multiple geometric components and perform spatial geometric operations within the CAD program, leading to inaccurate reviews. In this paper, we propose the CAD program repairer (ReCAD) framework to effectively detect program errors and provide helpful feedback on error correction. Additionally, we create a dataset, CADReview, consisting of over 20K program-image pairs, with diverse errors for the CAD review task. Extensive experiments demonstrate that our ReCAD significantly outperforms existing MLLMs, which shows great potential in design applications.

* ACL 2025 main conference

Via

Access Paper or Ask Questions

Simultaneously Exposing and Jamming Covert Communications via Disco Reconfigurable Intelligent Surfaces

May 18, 2025

Huan Huang, Hongliang Zhang, Yi Cai, Dusit Niyato, A. Lee Swindlehurst, Zhu Han

Figure 1 for Simultaneously Exposing and Jamming Covert Communications via Disco Reconfigurable Intelligent Surfaces

Figure 2 for Simultaneously Exposing and Jamming Covert Communications via Disco Reconfigurable Intelligent Surfaces

Figure 3 for Simultaneously Exposing and Jamming Covert Communications via Disco Reconfigurable Intelligent Surfaces

Figure 4 for Simultaneously Exposing and Jamming Covert Communications via Disco Reconfigurable Intelligent Surfaces

Abstract:Covert communications provide a stronger privacy protection than cryptography and physical-layer security (PLS). However, previous works on covert communications have implicitly assumed the validity of channel reciprocity, i.e., wireless channels remain constant or approximately constant during their coherence time. In this work, we investigate covert communications in the presence of a disco RIS (DRIS) deployed by the warden Willie, where the DRIS with random and time-varying reflective coefficients acts as a "disco ball", introducing timevarying fully-passive jamming (FPJ). Consequently, the channel reciprocity assumption no longer holds. The DRIS not only jams the covert transmissions between Alice and Bob, but also decreases the error probabilities of Willie's detections, without either Bob's channel knowledge or additional jamming power. To quantify the impact of the DRIS on covert communications, we first design a detection rule for the warden Willie in the presence of time-varying FPJ introduced by the DRIS. Then, we define the detection error probabilities, i.e., the false alarm rate (FAR) and the missed detection rate (MDR), as the monitoring performance metrics for Willie's detections, and the signal-to-jamming-plusnoise ratio (SJNR) as a communication performance metric for the covert transmissions between Alice and Bob. Based on the detection rule, we derive the detection threshold for the warden Willie to detect whether communications between Alice and Bob is ongoing, considering the time-varying DRIS-based FPJ. Moreover, we conduct theoretical analyses of the FAR and the MDR at the warden Willie, as well as SJNR at Bob, and then present unique properties of the DRIS-based FPJ in covert communications. We present numerical results to validate the derived theoretical analyses and evaluate the impact of DRIS on covert communications.

* This paper has been submitted for publication

Via

Access Paper or Ask Questions

Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning

May 14, 2025

Dayong Liang, Changmeng Zheng, Zhiyuan Wen, Yi Cai, Xiao-Yong Wei, Qing Li

Abstract:Traditional scene graphs primarily focus on spatial relationships, limiting vision-language models' (VLMs) ability to reason about complex interactions in visual scenes. This paper addresses two key challenges: (1) conventional detection-to-construction methods produce unfocused, contextually irrelevant relationship sets, and (2) existing approaches fail to form persistent memories for generalizing interaction reasoning to new scenes. We propose Interaction-augmented Scene Graph Reasoning (ISGR), a framework that enhances VLMs' interactional reasoning through three complementary components. First, our dual-stream graph constructor combines SAM-powered spatial relation extraction with interaction-aware captioning to generate functionally salient scene graphs with spatial grounding. Second, we employ targeted interaction queries to activate VLMs' latent knowledge of object functionalities, converting passive recognition into active reasoning about how objects work together. Finally, we introduce a lone-term memory reinforcement learning strategy with a specialized interaction-focused reward function that transforms transient patterns into long-term reasoning heuristics. Extensive experiments demonstrate that our approach significantly outperforms baseline methods on interaction-heavy reasoning benchmarks, with particularly strong improvements on complex scene understanding tasks. The source code can be accessed at https://github.com/open_upon_acceptance.

Via

Access Paper or Ask Questions

A Reputation System for Large Language Model-based Multi-agent Systems to Avoid the Tragedy of the Commons

May 08, 2025

Siyue Ren, Wanli Fu, Xinkun Zou, Chen Shen, Yi Cai, Chen Chu, Zhen Wang, Shuyue Hu

Abstract:The tragedy of the commons, where individual self-interest leads to collectively disastrous outcomes, is a pervasive challenge in human society. Recent studies have demonstrated that similar phenomena can arise in generative multi-agent systems (MASs). To address this challenge, this paper explores the use of reputation systems as a remedy. We propose RepuNet, a dynamic, dual-level reputation framework that models both agent-level reputation dynamics and system-level network evolution. Specifically, driven by direct interactions and indirect gossip, agents form reputations for both themselves and their peers, and decide whether to connect or disconnect other agents for future interactions. Through two distinct scenarios, we show that RepuNet effectively mitigates the 'tragedy of the commons', promoting and sustaining cooperation in generative MASs. Moreover, we find that reputation systems can give rise to rich emergent behaviors in generative MASs, such as the formation of cooperative clusters, the social isolation of exploitative agents, and the preference for sharing positive gossip rather than negative ones.

Via

Access Paper or Ask Questions

Collaborative Multi-LoRA Experts with Achievement-based Multi-Tasks Loss for Unified Multimodal Information Extraction

May 08, 2025

Li Yuan, Yi Cai, Xudong Shen, Qing Li, Qingbao Huang, Zikun Deng, Tao Wang

Abstract:Multimodal Information Extraction (MIE) has gained attention for extracting structured information from multimedia sources. Traditional methods tackle MIE tasks separately, missing opportunities to share knowledge across tasks. Recent approaches unify these tasks into a generation problem using instruction-based T5 models with visual adaptors, optimized through full-parameter fine-tuning. However, this method is computationally intensive, and multi-task fine-tuning often faces gradient conflicts, limiting performance. To address these challenges, we propose collaborative multi-LoRA experts with achievement-based multi-task loss (C-LoRAE) for MIE tasks. C-LoRAE extends the low-rank adaptation (LoRA) method by incorporating a universal expert to learn shared multimodal knowledge from cross-MIE tasks and task-specific experts to learn specialized instructional task features. This configuration enhances the model's generalization ability across multiple tasks while maintaining the independence of various instruction tasks and mitigating gradient conflicts. Additionally, we propose an achievement-based multi-task loss to balance training progress across tasks, addressing the imbalance caused by varying numbers of training samples in MIE tasks. Experimental results on seven benchmark datasets across three key MIE tasks demonstrate that C-LoRAE achieves superior overall performance compared to traditional fine-tuning methods and LoRA methods while utilizing a comparable number of training parameters to LoRA.

* Accepted by IJCAI 2025

Via

Access Paper or Ask Questions

RMG: Real-Time Expressive Motion Generation with Self-collision Avoidance for 6-DOF Companion Robotic Arms

Mar 13, 2025

Jiansheng Li, Haotian Song, Jinni Zhou, Qiang Nie, Yi Cai

Figure 1 for RMG: Real-Time Expressive Motion Generation with Self-collision Avoidance for 6-DOF Companion Robotic Arms

Figure 2 for RMG: Real-Time Expressive Motion Generation with Self-collision Avoidance for 6-DOF Companion Robotic Arms

Figure 3 for RMG: Real-Time Expressive Motion Generation with Self-collision Avoidance for 6-DOF Companion Robotic Arms

Figure 4 for RMG: Real-Time Expressive Motion Generation with Self-collision Avoidance for 6-DOF Companion Robotic Arms

Abstract:The six-degree-of-freedom (6-DOF) robotic arm has gained widespread application in human-coexisting environments. While previous research has predominantly focused on functional motion generation, the critical aspect of expressive motion in human-robot interaction remains largely unexplored. This paper presents a novel real-time motion generation planner that enhances interactivity by creating expressive robotic motions between arbitrary start and end states within predefined time constraints. Our approach involves three key contributions: first, we develop a mapping algorithm to construct an expressive motion dataset derived from human dance movements; second, we train motion generation models in both Cartesian and joint spaces using this dataset; third, we introduce an optimization algorithm that guarantees smooth, collision-free motion while maintaining the intended expressive style. Experimental results demonstrate the effectiveness of our method, which can generate expressive and generalized motions in under 0.5 seconds while satisfying all specified constraints.

Via

Access Paper or Ask Questions

ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

Mar 13, 2025

Bolin Chen, Baoquan Zhao, Haoran Xie, Yi Cai, Qing Li, Xudong Mao

Figure 1 for ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

Figure 2 for ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

Figure 3 for ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

Figure 4 for ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

Abstract:Style transfer involves transferring the style from a reference image to the content of a target image. Recent advancements in LoRA-based (Low-Rank Adaptation) methods have shown promise in effectively capturing the style of a single image. However, these approaches still face significant challenges such as content inconsistency, style misalignment, and content leakage. In this paper, we comprehensively analyze the limitations of the standard diffusion parameterization, which learns to predict noise, in the context of style transfer. To address these issues, we introduce ConsisLoRA, a LoRA-based method that enhances both content and style consistency by optimizing the LoRA weights to predict the original image rather than noise. We also propose a two-step training strategy that decouples the learning of content and style from the reference image. To effectively capture both the global structure and local details of the content image, we introduce a stepwise loss transition strategy. Additionally, we present an inference guidance method that enables continuous control over content and style strengths during inference. Through both qualitative and quantitative evaluations, our method demonstrates significant improvements in content and style consistency while effectively reducing content leakage.

Via

Access Paper or Ask Questions

Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model

Feb 03, 2025

Jiali Chen, Xusen Hei, Yuqi Xue, Zihan Wu, Jiayuan Xie, Yi Cai

Figure 1 for Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model

Figure 2 for Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model

Figure 3 for Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model

Figure 4 for Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model

Abstract:Chinese literary classics hold significant cultural and educational value, offering deep insights into morality, history, and human nature. These works often include classical Chinese and complex narratives, making them difficult for children to read. To bridge this gap, we introduce a child-friendly literary adaptation (CLA) task to adapt the Chinese literary classic into engaging and accessible text for children. However, recent large language models (LLMs) overlook children's reading preferences (\ie, vivid character portrayals, concise narrative structures, and appropriate readability), which poses challenges in CLA. In this paper, we propose a method called InstructChild, which augments the LLM with these preferences for adaptation. Specifically, we first obtain the characters' personalities and narrative structure as additional information for fine-grained instruction tuning. Then, we devise a readability metric as the reward to align the LLM with the children's reading level. Finally, a lookahead decoding strategy is applied to improve the readability of the generated text during inference. To support the evaluation of CLA task, we construct the Classic4Children dataset, which comprises both the original and child-friendly versions of the Four Great Classical Novels of Chinese literature. Experimental results show that our InstructChild significantly improves automatic and human evaluation performance.

* Accepted at NAACL 2025 Findings

Via

Access Paper or Ask Questions