Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jian Li

Robotics and Intelligent Manufacturing & School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China, Shenzhen Institute of Artificial Intelligence and Robotics for Society, China

From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective

Mar 17, 2025

Chen Zhao, Zhizhou Chen, Yunzhe Xu, Enxuan Gu, Jian Li, Zili Yi, Qian Wang, Jian Yang, Ying Tai

Abstract:Ultra-high-definition (UHD) image restoration faces significant challenges due to its high resolution, complex content, and intricate details. To cope with these challenges, we analyze the restoration process in depth through a progressive spectral perspective, and deconstruct the complex UHD restoration problem into three progressive stages: zero-frequency enhancement, low-frequency restoration, and high-frequency refinement. Building on this insight, we propose a novel framework, ERR, which comprises three collaborative sub-networks: the zero-frequency enhancer (ZFE), the low-frequency restorer (LFR), and the high-frequency refiner (HFR). Specifically, the ZFE integrates global priors to learn global mapping, while the LFR restores low-frequency information, emphasizing reconstruction of coarse-grained content. Finally, the HFR employs our designed frequency-windowed kolmogorov-arnold networks (FW-KAN) to refine textures and details, producing high-quality image restoration. Our approach significantly outperforms previous UHD methods across various tasks, with extensive ablation studies validating the effectiveness of each component. The code is available at \href{https://github.com/NJU-PCALab/ERR}{here}.

* Accepted to CVPR 2025

Via

Access Paper or Ask Questions

AoECR: AI-ization of Elderly Care Robot

Feb 27, 2025

Linkun Zhou, Jian Li, Yadong Mo, Xiangyan Zhang, Ying Zhang, Shimin Wei

Abstract:Autonomous interaction is crucial for the effective use of elderly care robots. However, developing universal AI architectures is extremely challenging due to the diversity in robot configurations and a lack of dataset. We proposed a universal architecture for the AI-ization of elderly care robots, called AoECR. Specifically, based on a nursing bed, we developed a patient-nurse interaction dataset tailored for elderly care scenarios and fine-tuned a large language model to enable it to perform nursing manipulations. Additionally, the inference process included a self-check chain to ensure the security of control commands. An expert optimization process further enhanced the humanization and personalization of the interactive responses. The physical experiment demonstrated that the AoECR exhibited zero-shot generalization capabilities across diverse scenarios, understood patients' instructions, implemented secure control commands, and delivered humanized and personalized interactive responses. In general, our research provides a valuable dataset reference and AI-ization solutions for elderly care robots.

* 11 pages

Via

Access Paper or Ask Questions

Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting

Feb 18, 2025

Yingying Zhang, Zhenyu Wu, Jian Li, Yong Liu

Figure 1 for Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting

Figure 2 for Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting

Figure 3 for Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting

Figure 4 for Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting

Abstract:Transformers serve as the foundational architecture for many successful large-scale models, demonstrating the ability to overfit the training data while maintaining strong generalization on unseen data, a phenomenon known as benign overfitting. However, research on how the training dynamics influence error bounds within the context of benign overfitting has been limited. This paper addresses this gap by developing a generalization theory for a two-layer transformer with labeled flip noise. Specifically, we present generalization error bounds for both benign and harmful overfitting under varying signal-to-noise ratios (SNR), where the training dynamics are categorized into three distinct stages, each with its corresponding error bounds. Additionally, we conduct extensive experiments to identify key factors that influence test errors in transformers. Our experimental results align closely with the theoretical predictions, validating our findings.

Via

Access Paper or Ask Questions

Dynamic Frequency-Adaptive Knowledge Distillation for Speech Enhancement

Feb 07, 2025

Xihao Yuan, Siqi Liu, Hanting Chen, Lu Zhou, Jian Li, Jie Hu

Abstract:Deep learning-based speech enhancement (SE) models have recently outperformed traditional techniques, yet their deployment on resource-constrained devices remains challenging due to high computational and memory demands. This paper introduces a novel dynamic frequency-adaptive knowledge distillation (DFKD) approach to effectively compress SE models. Our method dynamically assesses the model's output, distinguishing between high and low-frequency components, and adapts the learning objectives to meet the unique requirements of different frequency bands, capitalizing on the SE task's inherent characteristics. To evaluate the DFKD's efficacy, we conducted experiments on three state-of-the-art models: DCCRN, ConTasNet, and DPTNet. The results demonstrate that our method not only significantly enhances the performance of the compressed model (student model) but also surpasses other logit-based knowledge distillation methods specifically for SE tasks.

* 5 pages, 2 figures, accepted by ICASSP2025

Via

Access Paper or Ask Questions

FactorGCL: A Hypergraph-Based Factor Model with Temporal Residual Contrastive Learning for Stock Returns Prediction

Feb 05, 2025

Yitong Duan, Weiran Wang, Jian Li

Figure 1 for FactorGCL: A Hypergraph-Based Factor Model with Temporal Residual Contrastive Learning for Stock Returns Prediction

Figure 2 for FactorGCL: A Hypergraph-Based Factor Model with Temporal Residual Contrastive Learning for Stock Returns Prediction

Figure 3 for FactorGCL: A Hypergraph-Based Factor Model with Temporal Residual Contrastive Learning for Stock Returns Prediction

Figure 4 for FactorGCL: A Hypergraph-Based Factor Model with Temporal Residual Contrastive Learning for Stock Returns Prediction

Abstract:As a fundamental method in economics and finance, the factor model has been extensively utilized in quantitative investment. In recent years, there has been a paradigm shift from traditional linear models with expert-designed factors to more flexible nonlinear machine learning-based models with data-driven factors, aiming to enhance the effectiveness of these factor models. However, due to the low signal-to-noise ratio in market data, mining effective factors in data-driven models remains challenging. In this work, we propose a hypergraph-based factor model with temporal residual contrastive learning (FactorGCL) that employs a hypergraph structure to better capture high-order nonlinear relationships among stock returns and factors. To mine hidden factors that supplement human-designed prior factors for predicting stock returns, we design a cascading residual hypergraph architecture, in which the hidden factors are extracted from the residual information after removing the influence of prior factors. Additionally, we propose a temporal residual contrastive learning method to guide the extraction of effective and comprehensive hidden factors by contrasting stock-specific residual information over different time periods. Our extensive experiments on real stock market data demonstrate that FactorGCL not only outperforms existing state-of-the-art methods but also mines effective hidden factors for predicting stock returns.

Via

Access Paper or Ask Questions

Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior

Jan 31, 2025

Tongda Xu, Xiyan Cai, Xinjie Zhang, Xingtong Ge, Dailan He, Ming Sun, Jingjing Liu, Ya-Qin Zhang, Jian Li, Yan Wang

Abstract:Recent advancements in diffusion models have been leveraged to address inverse problems without additional training, and Diffusion Posterior Sampling (DPS) (Chung et al., 2022a) is among the most popular approaches. Previous analyses suggest that DPS accomplishes posterior sampling by approximating the conditional score. While in this paper, we demonstrate that the conditional score approximation employed by DPS is not as effective as previously assumed, but rather aligns more closely with the principle of maximizing a posterior (MAP). This assertion is substantiated through an examination of DPS on 512x512 ImageNet images, revealing that: 1) DPS's conditional score estimation significantly diverges from the score of a well-trained conditional diffusion model and is even inferior to the unconditional score; 2) The mean of DPS's conditional score estimation deviates significantly from zero, rendering it an invalid score estimation; 3) DPS generates high-quality samples with significantly lower diversity. In light of the above findings, we posit that DPS more closely resembles MAP than a conditional score estimator, and accordingly propose the following enhancements to DPS: 1) we explicitly maximize the posterior through multi-step gradient ascent and projection; 2) we utilize a light-weighted conditional score estimator trained with only 100 images and 8 GPU hours. Extensive experimental results indicate that these proposed improvements significantly enhance DPS's performance. The source code for these improvements is provided in https://github.com/tongdaxu/Rethinking-Diffusion-Posterior-Sampling-From-Conditional-Score-Estimator-to-Maximizing-a-Posterior.

* ICLR 2025

Via

Access Paper or Ask Questions

Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Jan 12, 2025

Ming Dai, Jian Li, Jiedong Zhuang, Xian Zhang, Wankou Yang

Figure 1 for Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Figure 2 for Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Figure 3 for Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Figure 4 for Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Abstract:Multi-task visual grounding involves the simultaneous execution of localization and segmentation in images based on textual expressions. The majority of advanced methods predominantly focus on transformer-based multimodal fusion, aiming to extract robust multimodal representations. However, ambiguity between referring expression comprehension (REC) and referring image segmentation (RIS) is error-prone, leading to inconsistencies between multi-task predictions. Besides, insufficient multimodal understanding directly contributes to biased target perception. To overcome these challenges, we propose a Coarse-to-fine Consistency Constraints Visual Grounding architecture ($\text{C}^3\text{VG}$), which integrates implicit and explicit modeling approaches within a two-stage framework. Initially, query and pixel decoders are employed to generate preliminary detection and segmentation outputs, a process referred to as the Rough Semantic Perception (RSP) stage. These coarse predictions are subsequently refined through the proposed Mask-guided Interaction Module (MIM) and a novel explicit bidirectional consistency constraint loss to ensure consistent representations across tasks, which we term the Refined Consistency Interaction (RCI) stage. Furthermore, to address the challenge of insufficient multimodal understanding, we leverage pre-trained models based on visual-linguistic fusion representations. Empirical evaluations on the RefCOCO, RefCOCO+, and RefCOCOg datasets demonstrate the efficacy and soundness of $\text{C}^3\text{VG}$, which significantly outperforms state-of-the-art REC and RIS methods by a substantial margin. Code and model will be available at \url{https://github.com/Dmmm1997/C3VG}.

* AAAI2025

Via

Access Paper or Ask Questions

Block-Based Multi-Scale Image Rescaling

Dec 16, 2024

Jian Li, Siwang Zhou

Figure 1 for Block-Based Multi-Scale Image Rescaling

Figure 2 for Block-Based Multi-Scale Image Rescaling

Figure 3 for Block-Based Multi-Scale Image Rescaling

Figure 4 for Block-Based Multi-Scale Image Rescaling

Abstract:Image rescaling (IR) seeks to determine the optimal low-resolution (LR) representation of a high-resolution (HR) image to reconstruct a high-quality super-resolution (SR) image. Typically, HR images with resolutions exceeding 2K possess rich information that is unevenly distributed across the image. Traditional image rescaling methods often fall short because they focus solely on the overall scaling rate, ignoring the varying amounts of information in different parts of the image. To address this limitation, we propose a Block-Based Multi-Scale Image Rescaling Framework (BBMR), tailored for IR tasks involving HR images of 2K resolution and higher. BBMR consists of two main components: the Downscaling Module and the Upscaling Module. In the Downscaling Module, the HR image is segmented into sub-blocks of equal size, with each sub-block receiving a dynamically allocated scaling rate while maintaining a constant overall scaling rate. For the Upscaling Module, we introduce the Joint Super-Resolution method (JointSR), which performs SR on these sub-blocks with varying scaling rates and effectively eliminates blocking artifacts. Experimental results demonstrate that BBMR significantly enhances the SR image quality on the of 2K and 4K test dataset compared to initial network image rescaling methods.

* This paper has been accepted by AAAI2025

Via

Access Paper or Ask Questions

Automatic Item Generation for Personality Situational Judgment Tests with Large Language Models

Dec 10, 2024

Chang-Jin Li, Jiyuan Zhang, Yun Tang, Jian Li

Abstract:Personality assessment, particularly through situational judgment tests (SJTs), is a vital tool for psychological research, talent selection, and educational evaluation. This study explores the potential of GPT-4, a state-of-the-art large language model (LLM), to automate the generation of personality situational judgment tests (PSJTs) in Chinese. Traditional SJT development is labor-intensive and prone to biases, while GPT-4 offers a scalable, efficient alternative. Two studies were conducted: Study 1 evaluated the impact of prompt design and temperature settings on content validity, finding that optimized prompts with a temperature of 1.0 produced creative and accurate items. Study 2 assessed the psychometric properties of GPT-4-generated PSJTs, revealing that they demonstrated satisfactory reliability and validity, surpassing the performance of manually developed tests in measuring the Big Five personality traits. This research highlights GPT-4's effectiveness in developing high-quality PSJTs, providing a scalable and innovative method for psychometric test development. These findings expand the possibilities of automatic item generation and the application of LLMs in psychology, and offer practical implications for streamlining test development processes in resource-limited settings.

* Submitted to Organizational Research Methods. 48 pages (main text), 12 pages (appendix), and 3 figures

Via

Access Paper or Ask Questions

Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs

Dec 10, 2024

Xiaqiang Tang, Jian Li, Nan Du, Sihong Xie

Figure 1 for Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs

Figure 2 for Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs

Figure 3 for Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs

Figure 4 for Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs

Abstract:Despite the superior performance of Large language models on many NLP tasks, they still face significant limitations in memorizing extensive world knowledge. Recent studies have demonstrated that leveraging the Retrieval-Augmented Generation (RAG) framework, combined with Knowledge Graphs that encapsulate extensive factual data in a structured format, robustly enhances the reasoning capabilities of LLMs. However, deploying such systems in real-world scenarios presents challenges: the continuous evolution of non-stationary environments may lead to performance degradation and user satisfaction requires a careful balance of performance and responsiveness. To address these challenges, we introduce a Multi-objective Multi-Armed Bandit enhanced RAG framework, supported by multiple retrieval methods with diverse capabilities under rich and evolving retrieval contexts in practice. Within this framework, each retrieval method is treated as a distinct ``arm''. The system utilizes real-time user feedback to adapt to dynamic environments, by selecting the appropriate retrieval method based on input queries and the historical multi-objective performance of each arm. Extensive experiments conducted on two benchmark KGQA datasets demonstrate that our method significantly outperforms baseline methods in non-stationary settings while achieving state-of-the-art performance in stationary environments. Code and data are available at https://github.com/FUTUREEEEEE/Dynamic-RAG.git

* AAAI 2025

Via

Access Paper or Ask Questions