Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinyu Li

NeuroBridge: Bio-Inspired Self-Supervised EEG-to-Image Decoding via Cognitive Priors and Bidirectional Semantic Alignment

Nov 10, 2025

Wenjiang Zhang, Sifeng Wang, Yuwei Su, Xinyu Li, Chen Zhang, Suyu Zhong

Abstract:Visual neural decoding seeks to reconstruct or infer perceived visual stimuli from brain activity patterns, providing critical insights into human cognition and enabling transformative applications in brain-computer interfaces and artificial intelligence. Current approaches, however, remain constrained by the scarcity of high-quality stimulus-brain response pairs and the inherent semantic mismatch between neural representations and visual content. Inspired by perceptual variability and co-adaptive strategy of the biological systems, we propose a novel self-supervised architecture, named NeuroBridge, which integrates Cognitive Prior Augmentation (CPA) with Shared Semantic Projector (SSP) to promote effective cross-modality alignment. Specifically, CPA simulates perceptual variability by applying asymmetric, modality-specific transformations to both EEG signals and images, enhancing semantic diversity. Unlike previous approaches, SSP establishes a bidirectional alignment process through a co-adaptive strategy, which mutually aligns features from two modalities into a shared semantic space for effective cross-modal learning. NeuroBridge surpasses previous state-of-the-art methods under both intra-subject and inter-subject settings. In the intra-subject scenario, it achieves the improvements of 12.3% in top-1 accuracy and 10.2% in top-5 accuracy, reaching 63.2% and 89.9% respectively on a 200-way zero-shot retrieval task. Extensive experiments demonstrate the effectiveness, robustness, and scalability of the proposed framework for neural visual decoding.

* AAAI 2026

Via

Access Paper or Ask Questions

Evaluating LLM Alignment on Personality Inference from Real-World Interview Data

Sep 16, 2025

Jianfeng Zhu, Julina Maharjan, Xinyu Li, Karin G. Coifman, Ruoming Jin

Abstract:Large Language Models (LLMs) are increasingly deployed in roles requiring nuanced psychological understanding, such as emotional support agents, counselors, and decision-making assistants. However, their ability to interpret human personality traits, a critical aspect of such applications, remains unexplored, particularly in ecologically valid conversational settings. While prior work has simulated LLM "personas" using discrete Big Five labels on social media data, the alignment of LLMs with continuous, ground-truth personality assessments derived from natural interactions is largely unexamined. To address this gap, we introduce a novel benchmark comprising semi-structured interview transcripts paired with validated continuous Big Five trait scores. Using this dataset, we systematically evaluate LLM performance across three paradigms: (1) zero-shot and chain-of-thought prompting with GPT-4.1 Mini, (2) LoRA-based fine-tuning applied to both RoBERTa and Meta-LLaMA architectures, and (3) regression using static embeddings from pretrained BERT and OpenAI's text-embedding-3-small. Our results reveal that all Pearson correlations between model predictions and ground-truth personality traits remain below 0.26, highlighting the limited alignment of current LLMs with validated psychological constructs. Chain-of-thought prompting offers minimal gains over zero-shot, suggesting that personality inference relies more on latent semantic representation than explicit reasoning. These findings underscore the challenges of aligning LLMs with complex human attributes and motivate future work on trait-specific prompting, context-aware modeling, and alignment-oriented fine-tuning.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions

TimeSeriesGym: A Scalable Benchmark for (Time Series) Machine Learning Engineering Agents

May 19, 2025

Yifu Cai, Xinyu Li, Mononito Goswami, Michał Wiliński, Gus Welter, Artur Dubrawski

Abstract:We introduce TimeSeriesGym, a scalable benchmarking framework for evaluating Artificial Intelligence (AI) agents on time series machine learning engineering challenges. Existing benchmarks lack scalability, focus narrowly on model building in well-defined settings, and evaluate only a limited set of research artifacts (e.g., CSV submission files). To make AI agent benchmarking more relevant to the practice of machine learning engineering, our framework scales along two critical dimensions. First, recognizing that effective ML engineering requires a range of diverse skills, TimeSeriesGym incorporates challenges from diverse sources spanning multiple domains and tasks. We design challenges to evaluate both isolated capabilities (including data handling, understanding research repositories, and code translation) and their combinations, and rather than addressing each challenge independently, we develop tools that support designing multiple challenges at scale. Second, we implement evaluation mechanisms for multiple research artifacts, including submission files, code, and models, using both precise numeric measures and more flexible LLM-based evaluation approaches. This dual strategy balances objective assessment with contextual judgment. Although our initial focus is on time series applications, our framework can be readily extended to other data modalities, broadly enhancing the comprehensiveness and practical utility of agentic AI evaluation. We open-source our benchmarking framework to facilitate future research on the ML engineering capabilities of AI agents.

* Open source code available at https://github.com/moment-timeseries-foundation-model/TimeSeriesGym. YC, XL, MG and MW contributed equally, and should be considered joint first authors

Via

Access Paper or Ask Questions

CheXLearner: Text-Guided Fine-Grained Representation Learning for Progression Detection

May 11, 2025

Yuanzhuo Wang, Junwen Duan, Xinyu Li, Jianxin Wang

Figure 1 for CheXLearner: Text-Guided Fine-Grained Representation Learning for Progression Detection

Figure 2 for CheXLearner: Text-Guided Fine-Grained Representation Learning for Progression Detection

Figure 3 for CheXLearner: Text-Guided Fine-Grained Representation Learning for Progression Detection

Figure 4 for CheXLearner: Text-Guided Fine-Grained Representation Learning for Progression Detection

Abstract:Temporal medical image analysis is essential for clinical decision-making, yet existing methods either align images and text at a coarse level - causing potential semantic mismatches - or depend solely on visual information, lacking medical semantic integration. We present CheXLearner, the first end-to-end framework that unifies anatomical region detection, Riemannian manifold-based structure alignment, and fine-grained regional semantic guidance. Our proposed Med-Manifold Alignment Module (Med-MAM) leverages hyperbolic geometry to robustly align anatomical structures and capture pathologically meaningful discrepancies across temporal chest X-rays. By introducing regional progression descriptions as supervision, CheXLearner achieves enhanced cross-modal representation learning and supports dynamic low-level feature optimization. Experiments show that CheXLearner achieves 81.12% (+17.2%) average accuracy and 80.32% (+11.05%) F1-score on anatomical region progression detection - substantially outperforming state-of-the-art baselines, especially in structurally complex regions. Additionally, our model attains a 91.52% average AUC score in downstream disease classification, validating its superior feature representation.

Via

Access Paper or Ask Questions

GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers

Apr 30, 2025

Xinyu Li, Qi Yao, Yuanda Wang

Figure 1 for GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers

Figure 2 for GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers

Figure 3 for GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers

Figure 4 for GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers

Abstract:Garment sewing patterns are fundamental design elements that bridge the gap between design concepts and practical manufacturing. The generative modeling of sewing patterns is crucial for creating diversified garments. However, existing approaches are limited either by reliance on a single input modality or by suboptimal generation efficiency. In this work, we present \textbf{\textit{GarmentDiffusion}}, a new generative model capable of producing centimeter-precise, vectorized 3D sewing patterns from multimodal inputs (text, image, and incomplete sewing pattern). Our method efficiently encodes 3D sewing pattern parameters into compact edge token representations, achieving a sequence length that is $\textbf{10}\times$ shorter than that of the autoregressive SewingGPT in DressCode. By employing a diffusion transformer, we simultaneously denoise all edge tokens along the temporal axis, while maintaining a constant number of denoising steps regardless of dataset-specific edge and panel statistics. With all combination of designs of our model, the sewing pattern generation speed is accelerated by $\textbf{100}\times$ compared to SewingGPT. We achieve new state-of-the-art results on DressCodeData, as well as on the largest sewing pattern dataset, namely GarmentCodeData. The project website is available at https://shenfu-research.github.io/Garment-Diffusion/.

* The 34th International Joint Conference on Artificial Intelligence (IJCAI 2025)

Via

Access Paper or Ask Questions

DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal

Apr 07, 2025

Wanzhou Liu, Zhexiao Xiong, Xinyu Li, Nathan Jacobs

Abstract:Recent novel view synthesis (NVS) techniques, including Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have greatly advanced 3D scene reconstruction with high-quality rendering and realistic detail recovery. Effectively removing occlusions while preserving scene details can further enhance the robustness and applicability of these techniques. However, existing approaches for object and occlusion removal predominantly rely on generative priors, which, despite filling the resulting holes, introduce new artifacts and blurriness. Moreover, existing benchmark datasets for evaluating occlusion removal methods lack realistic complexity and viewpoint variations. To address these issues, we introduce DeclutterSet, a novel dataset featuring diverse scenes with pronounced occlusions distributed across foreground, midground, and background, exhibiting substantial relative motion across viewpoints. We further introduce DeclutterNeRF, an occlusion removal method free from generative priors. DeclutterNeRF introduces joint multi-view optimization of learnable camera parameters, occlusion annealing regularization, and employs an explainable stochastic structural similarity loss, ensuring high-quality, artifact-free reconstructions from incomplete images. Experiments demonstrate that DeclutterNeRF significantly outperforms state-of-the-art methods on our proposed DeclutterSet, establishing a strong baseline for future research.

* Accepted by CVPR 2025 4th CV4Metaverse Workshop. 15 pages, 10 figures. Code and data at: https://github.com/wanzhouliu/declutter-nerf

Via

Access Paper or Ask Questions

Automatic MILP Model Construction for Multi-Robot Task Allocation and Scheduling Based on Large Language Models

Mar 18, 2025

Mingming Peng, Zhendong Chen, Jie Yang, Jin Huang, Zhengqi Shi, Qihao Liu, Xinyu Li, Liang Gao

Abstract:With the accelerated development of Industry 4.0, intelligent manufacturing systems increasingly require efficient task allocation and scheduling in multi-robot systems. However, existing methods rely on domain expertise and face challenges in adapting to dynamic production constraints. Additionally, enterprises have high privacy requirements for production scheduling data, which prevents the use of cloud-based large language models (LLMs) for solution development. To address these challenges, there is an urgent need for an automated modeling solution that meets data privacy requirements. This study proposes a knowledge-augmented mixed integer linear programming (MILP) automated formulation framework, integrating local LLMs with domain-specific knowledge bases to generate executable code from natural language descriptions automatically. The framework employs a knowledge-guided DeepSeek-R1-Distill-Qwen-32B model to extract complex spatiotemporal constraints (82% average accuracy) and leverages a supervised fine-tuned Qwen2.5-Coder-7B-Instruct model for efficient MILP code generation (90% average accuracy). Experimental results demonstrate that the framework successfully achieves automatic modeling in the aircraft skin manufacturing case while ensuring data privacy and computational efficiency. This research provides a low-barrier and highly reliable technical path for modeling in complex industrial scenarios.

Via

Access Paper or Ask Questions

RGB-Phase Speckle: Cross-Scene Stereo 3D Reconstruction via Wrapped Pre-Normalization

Mar 08, 2025

Kai Yang, Zijian Bai, Yang Xiao, Xinyu Li, Xiaohan Shi

Abstract:3D reconstruction garners increasing attention alongside the advancement of high-level image applications, where dense stereo matching (DSM) serves as a pivotal technique. Previous studies often rely on publicly available datasets for training, focusing on modifying network architectures or incorporating specialized modules to extract domain-invariant features and thus improve model robustness. In contrast, inspired by single-frame structured-light phase-shifting encoding, this study introduces RGB-Speckle, a cross-scene 3D reconstruction framework based on an active stereo camera system, designed to enhance robustness. Specifically, we propose a novel phase pre-normalization encoding-decoding method: first, we randomly perturb phase-shift maps and embed them into the three RGB channels to generate color speckle patterns; subsequently, the camera captures phase-encoded images modulated by objects as input to a stereo matching network. This technique effectively mitigates external interference and ensures consistent input data for RGB-Speckle, thereby bolstering cross-domain 3D reconstruction stability. To validate the proposed method, we conduct complex experiments: (1) construct a color speckle dataset for complex scenarios based on the proposed encoding scheme; (2) evaluate the impact of the phase pre-normalization encoding-decoding technique on 3D reconstruction accuracy; and (3) further investigate its robustness across diverse conditions. Experimental results demonstrate that the proposed RGB-Speckle model offers significant advantages in cross-domain and cross-scene 3D reconstruction tasks, enhancing model generalization and reinforcing robustness in challenging environments, thus providing a novel solution for robust 3D reconstruction research.

* Submitted to ICCV 2025

Via

Access Paper or Ask Questions

NowYouSee Me: Context-Aware Automatic Audio Description

Dec 13, 2024

Seon-Ho Lee, Jue Wang, David Fan, Zhikang Zhang, Linda Liu, Xiang Hao, Vimal Bhat, Xinyu Li

Figure 1 for NowYouSee Me: Context-Aware Automatic Audio Description

Figure 2 for NowYouSee Me: Context-Aware Automatic Audio Description

Figure 3 for NowYouSee Me: Context-Aware Automatic Audio Description

Figure 4 for NowYouSee Me: Context-Aware Automatic Audio Description

Abstract:Audio Description (AD) plays a pivotal role as an application system aimed at guaranteeing accessibility in multimedia content, which provides additional narrations at suitable intervals to describe visual elements, catering specifically to the needs of visually impaired audiences. In this paper, we introduce $\mathrm{CA^3D}$, the pioneering unified Context-Aware Automatic Audio Description system that provides AD event scripts with precise locations in the long cinematic content. Specifically, $\mathrm{CA^3D}$ system consists of: 1) a Temporal Feature Enhancement Module to efficiently capture longer term dependencies, 2) an anchor-based AD event detector with feature suppression module that localizes the AD events and extracts discriminative feature for AD generation, and 3) a self-refinement module that leverages the generated output to tweak AD event boundaries from coarse to fine. Unlike conventional methods which rely on metadata and ground truth AD timestamp for AD detection and generation tasks, the proposed $\mathrm{CA^3D}$ is the first end-to-end trainable system that only uses visual cue. Extensive experiments demonstrate that the proposed $\mathrm{CA^3D}$ improves existing architectures for both AD event detection and script generation metrics, establishing the new state-of-the-art performances in the AD automation.

* WACV 2025
* 10 pages

Via

Access Paper or Ask Questions

Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance

Dec 12, 2024

Yizhou Fan, Luzhen Tang, Huixiao Le, Kejie Shen, Shufang Tan, Yueying Zhao, Yuan Shen, Xinyu Li, Dragan Gašević

Figure 1 for Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance

Figure 2 for Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance

Figure 3 for Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance

Figure 4 for Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance

Abstract:With the continuous development of technological and educational innovation, learners nowadays can obtain a variety of support from agents such as teachers, peers, education technologies, and recently, generative artificial intelligence such as ChatGPT. The concept of hybrid intelligence is still at a nascent stage, and how learners can benefit from a symbiotic relationship with various agents such as AI, human experts and intelligent learning systems is still unknown. The emerging concept of hybrid intelligence also lacks deep insights and understanding of the mechanisms and consequences of hybrid human-AI learning based on strong empirical research. In order to address this gap, we conducted a randomised experimental study and compared learners' motivations, self-regulated learning processes and learning performances on a writing task among different groups who had support from different agents (ChatGPT, human expert, writing analytics tools, and no extra tool). A total of 117 university students were recruited, and their multi-channel learning, performance and motivation data were collected and analysed. The results revealed that: learners who received different learning support showed no difference in post-task intrinsic motivation; there were significant differences in the frequency and sequences of the self-regulated learning processes among groups; ChatGPT group outperformed in the essay score improvement but their knowledge gain and transfer were not significantly different. Our research found that in the absence of differences in motivation, learners with different supports still exhibited different self-regulated learning processes, ultimately leading to differentiated performance. What is particularly noteworthy is that AI technologies such as ChatGPT may promote learners' dependence on technology and potentially trigger metacognitive laziness.

Via

Access Paper or Ask Questions