Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yan Yang

Two-Step Data Augmentation for Masked Face Detection and Recognition: Turning Fake Masks to Real

Dec 13, 2025

Yan Yang, George Bebis, Mircea Nicolescu

Abstract:Data scarcity and distribution shift pose major challenges for masked face detection and recognition. We propose a two-step generative data augmentation framework that combines rule-based mask warping with unpaired image-to-image translation using GANs, enabling the generation of realistic masked-face samples beyond purely synthetic transformations. Compared to rule-based warping alone, the proposed approach yields consistent qualitative improvements and complements existing GAN-based masked face generation methods such as IAMGAN. We introduce a non-mask preservation loss and stochastic noise injection to stabilize training and enhance sample diversity. Experimental observations highlight the effectiveness of the proposed components and suggest directions for future improvements in data-centric augmentation for face recognition tasks.

* (2022) In Proceedings of the 2nd International Conference on Image Processing and Vision Engineering - IMPROVE; ISBN 978-989-758-563-0; ISSN 2795-4943, SciTePress, pages 126-134
* 9 pages, 9 figures. Conference version

Via

Access Paper or Ask Questions

Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

Nov 17, 2025

Chunqiu Steven Xia, Zhe Wang, Yan Yang, Yuxiang Wei, Lingming Zhang

Figure 1 for Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

Figure 2 for Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

Figure 3 for Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

Figure 4 for Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

Abstract:Large Language Models (LLMs) are reshaping almost all industries, including software engineering. In recent years, a number of LLM agents have been proposed to solve real-world software problems. Such software agents are typically equipped with a suite of coding tools and can autonomously decide the next actions to form complete trajectories to solve end-to-end software tasks. While promising, they typically require dedicated design and may still be suboptimal, since it can be extremely challenging and costly to exhaust the entire agent scaffold design space. Recognizing that software agents are inherently software themselves that can be further refined/modified, researchers have proposed a number of self-improving software agents recently, including the Darwin-Gödel Machine (DGM). Meanwhile, such self-improving agents require costly offline training on specific benchmarks and may not generalize well across different LLMs or benchmarks. In this paper, we propose Live-SWE-agent, the first live software agent that can autonomously and continuously evolve itself on-the-fly during runtime when solving real-world software problems. More specifically, Live-SWE-agent starts with the most basic agent scaffold with only access to bash tools (e.g., mini-SWE-agent), and autonomously evolves its own scaffold implementation while solving real-world software problems. Our evaluation on the widely studied SWE-bench Verified benchmark shows that Live-SWE-agent can achieve an impressive solve rate of 75.4% without test-time scaling, outperforming all existing open-source software agents and approaching the performance of the best proprietary solution. Moreover, Live-SWE-agent outperforms state-of-the-art manually crafted software agents on the recent SWE-Bench Pro benchmark, achieving the best-known solve rate of 45.8%.

Via

Access Paper or Ask Questions

Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction

Nov 10, 2025

Changyue Shi, Chuxiao Yang, Xinyuan Hu, Minghao Chen, Wenwen Pan, Yan Yang, Jiajun Ding, Zhou Yu, Jun Yu

Figure 1 for Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction

Figure 2 for Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction

Figure 3 for Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction

Figure 4 for Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction

Abstract:Dynamic Gaussian Splatting approaches have achieved remarkable performance for 4D scene reconstruction. However, these approaches rely on dense-frame video sequences for photorealistic reconstruction. In real-world scenarios, due to equipment constraints, sometimes only sparse frames are accessible. In this paper, we propose Sparse4DGS, the first method for sparse-frame dynamic scene reconstruction. We observe that dynamic reconstruction methods fail in both canonical and deformed spaces under sparse-frame settings, especially in areas with high texture richness. Sparse4DGS tackles this challenge by focusing on texture-rich areas. For the deformation network, we propose Texture-Aware Deformation Regularization, which introduces a texture-based depth alignment loss to regulate Gaussian deformation. For the canonical Gaussian field, we introduce Texture-Aware Canonical Optimization, which incorporates texture-based noise into the gradient descent process of canonical Gaussians. Extensive experiments show that when taking sparse frames as inputs, our method outperforms existing dynamic or few-shot techniques on NeRF-Synthetic, HyperNeRF, NeRF-DS, and our iPhone-4D datasets.

* AAAI 2026

Via

Access Paper or Ask Questions

Fast Time-Varying mmWave MIMO Channel Estimation and Reconstruction: An Efficient Rank-Aware Matrix Completion Method

Nov 08, 2025

Tianyu Jiang, Yan Yang, Hongjin Liu, Runyu Han, Bo Ai, Mohsen Guizani

Abstract:We address the problem of fast time-varying channel estimation in millimeter-wave (mmWave) MIMO systems with imperfect channel state information (CSI) and facilitate efficient channel reconstruction. Specifically, leveraging the low-rank and sparse characteristics of the mmWave channel matrix, a two-phase rank-aware compressed sensing framework is proposed for efficient channel estimation and reconstruction. In the first phase, a robust rank-one matrix completion (R1MC) algorithm is used to reconstruct part of the observed channel matrix through low-rank matrix completion (LRMC). To address abrupt rank changes caused by user mobility, a discrete-time autoregressive (AR) model is established that leverages temporal rank correlations across consecutive time instances to enable adaptive observation matrix completion, thereby improving estimation accuracy under dynamic conditions. In the second phase, a rank-aware block orthogonal matching pursuit (RA-BOMP) algorithm is developed for sparse channel recovery with low computational complexity. Furthermore, a rank-aware measurement matrix design is introduced to improve angle estimation accuracy. Simulation results demonstrate that, compared with existing benchmark algorithms, the proposed approach achieves superior channel estimation performance while significantly reducing computational complexity and training overhead.

Via

Access Paper or Ask Questions

T$^\text{3}$SVFND: Towards an Evolving Fake News Detector for Emergencies with Test-time Training on Short Video Platforms

Jul 27, 2025

Liyuan Zhang, Zeyun Cheng, Yan Yang, Yong Liu, Jinke Ma

Abstract:The existing methods for fake news videos detection may not be generalized, because there is a distribution shift between short video news of different events, and the performance of such techniques greatly drops if news records are coming from emergencies. We propose a new fake news videos detection framework (T$^3$SVFND) using Test-Time Training (TTT) to alleviate this limitation, enhancing the robustness of fake news videos detection. Specifically, we design a self-supervised auxiliary task based on Mask Language Modeling (MLM) that masks a certain percentage of words in text and predicts these masked words by combining contextual information from different modalities (audio and video). In the test-time training phase, the model adapts to the distribution of test data through auxiliary tasks. Extensive experiments on the public benchmark demonstrate the effectiveness of the proposed model, especially for the detection of emergency news.

* 16 pages, 3 figures, published to DASFAA 2025

Via

Access Paper or Ask Questions

NTIRE 2025 Image Shadow Removal Challenge Report

Jun 18, 2025

Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailian Chen, Zongwei Wu, Radu Timofte, Mingjia Li, Jin Hu, Hainuo Wang, Hengxing Liu(+72 more)

Figure 1 for NTIRE 2025 Image Shadow Removal Challenge Report

Figure 2 for NTIRE 2025 Image Shadow Removal Challenge Report

Figure 3 for NTIRE 2025 Image Shadow Removal Challenge Report

Figure 4 for NTIRE 2025 Image Shadow Removal Challenge Report

Abstract:This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were evaluated with images from the WSRD+ dataset, simulating interactions between self- and cast-shadows with a large number of diverse objects, textures, and materials.

Via

Access Paper or Ask Questions

Towards Prospective Medical Image Reconstruction via Knowledge-Informed Dynamic Optimal Transport

May 23, 2025

Taoran Zheng, Xing Li, Yan Yang, Xiang Gu, Zongben Xu, Jian Sun

Abstract:Medical image reconstruction from measurement data is a vital but challenging inverse problem. Deep learning approaches have achieved promising results, but often requires paired measurement and high-quality images, which is typically simulated through a forward model, i.e., retrospective reconstruction. However, training on simulated pairs commonly leads to performance degradation on real prospective data due to the retrospective-to-prospective gap caused by incomplete imaging knowledge in simulation. To address this challenge, this paper introduces imaging Knowledge-Informed Dynamic Optimal Transport (KIDOT), a novel dynamic optimal transport framework with optimality in the sense of preserving consistency with imaging physics in transport, that conceptualizes reconstruction as finding a dynamic transport path. KIDOT learns from unpaired data by modeling reconstruction as a continuous evolution path from measurements to images, guided by an imaging knowledge-informed cost function and transport equation. This dynamic and knowledge-aware approach enhances robustness and better leverages unpaired data while respecting acquisition physics. Theoretically, we demonstrate that KIDOT naturally generalizes dynamic optimal transport, ensuring its mathematical rationale and solution existence. Extensive experiments on MRI and CT reconstruction demonstrate KIDOT's superior performance.

Via

Access Paper or Ask Questions

A Preliminary Study for GPT-4o on Image Restoration

May 08, 2025

Hao Yang, Yan Yang, Ruikun Zhang, Liyuan Pan

Figure 1 for A Preliminary Study for GPT-4o on Image Restoration

Figure 2 for A Preliminary Study for GPT-4o on Image Restoration

Figure 3 for A Preliminary Study for GPT-4o on Image Restoration

Figure 4 for A Preliminary Study for GPT-4o on Image Restoration

Abstract:OpenAI's GPT-4o model, integrating multi-modal inputs and outputs within an autoregressive architecture, has demonstrated unprecedented performance in image generation. In this work, we investigate its potential impact on the image restoration community. We present the first systematic evaluation of GPT-4o across diverse restoration tasks. Our experiments reveal that, although restoration outputs from GPT-4o are visually appealing, they often suffer from pixel-level structural fidelity when compared to ground-truth images. Common issues are variations in image proportions, shifts in object positions and quantities, and changes in viewpoint.To address it, taking image dehazing, derainning, and low-light enhancement as representative case studies, we show that GPT-4o's outputs can serve as powerful visual priors, substantially enhancing the performance of existing dehazing networks. It offers practical guidelines and a baseline framework to facilitate the integration of GPT-4o into future image restoration pipelines. We hope the study on GPT-4o image restoration will accelerate innovation in the broader field of image generation areas. To support further research, we will release GPT-4o-restored images from over 10 widely used image restoration datasets.

Via

Access Paper or Ask Questions

Cross-organ all-in-one parallel compressed sensing magnetic resonance imaging

May 07, 2025

Baoshun Shi, Zheng Liu, Xin Meng, Yan Yang

Abstract:Recent advances in deep learning-based parallel compressed sensing magnetic resonance imaging (p-CSMRI) have significantly improved reconstruction quality. However, current p-CSMRI methods often require training separate deep neural network (DNN) for each organ due to anatomical variations, creating a barrier to developing generalized medical image reconstruction systems. To address this, we propose CAPNet (cross-organ all-in-one deep unfolding p-CSMRI network), a unified framework that implements a p-CSMRI iterative algorithm via three specialized modules: auxiliary variable module, prior module, and data consistency module. Recognizing that p-CSMRI systems often employ varying sampling ratios for different organs, resulting in organ-specific artifact patterns, we introduce an artifact generation submodule, which extracts and integrates artifact features into the data consistency module to enhance the discriminative capability of the overall network. For the prior module, we design an organ structure-prompt generation submodule that leverages structural features extracted from the segment anything model (SAM) to create cross-organ prompts. These prompts are strategically incorporated into the prior module through an organ structure-aware Mamba submodule. Comprehensive evaluations on a cross-organ dataset confirm that CAPNet achieves state-of-the-art reconstruction performance across multiple anatomical structures using a single unified model. Our code will be published at https://github.com/shibaoshun/CAPNet.

Via

Access Paper or Ask Questions

ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs

Apr 17, 2025

Yan Yang, Yixia Li, Hongru Wang, Xuetao Wei, Jianqiao Yu, Yun Chen, Guanhua Chen

Abstract:With the proliferation of task-specific large language models, delta compression has emerged as a method to mitigate the resource challenges of deploying numerous such models by effectively compressing the delta model parameters. Previous delta-sparsification methods either remove parameters randomly or truncate singular vectors directly after singular value decomposition (SVD). However, these methods either disregard parameter importance entirely or evaluate it with too coarse a granularity. In this work, we introduce ImPart, a novel importance-aware delta sparsification approach. Leveraging SVD, it dynamically adjusts sparsity ratios of different singular vectors based on their importance, effectively retaining crucial task-specific knowledge even at high sparsity ratios. Experiments show that ImPart achieves state-of-the-art delta sparsification performance, demonstrating $2\times$ higher compression ratio than baselines at the same performance level. When integrated with existing methods, ImPart sets a new state-of-the-art on delta quantization and model merging.

Via

Access Paper or Ask Questions