Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiarui Zhao

Teacher-Guided Student Self-Knowledge Distillation Using Diffusion Model

Feb 02, 2026

Yu Wang, Chuanguang Yang, Zhulin An, Weilun Feng, Jiarui Zhao, Chengqing Yu, Libo Huang, Boyu Diao, Yongjun Xu

Abstract:Existing Knowledge Distillation (KD) methods often align feature information between teacher and student by exploring meaningful feature processing and loss functions. However, due to the difference in feature distributions between the teacher and student, the student model may learn incompatible information from the teacher. To address this problem, we propose teacher-guided student Diffusion Self-KD, dubbed as DSKD. Instead of the direct teacher-student alignment, we leverage the teacher classifier to guide the sampling process of denoising student features through a light-weight diffusion model. We then propose a novel locality-sensitive hashing (LSH)-guided feature distillation method between the original and denoised student features. The denoised student features encapsulate teacher knowledge and could be regarded as a teacher role. In this way, our DSKD method could eliminate discrepancies in mapping manners and feature distributions between the teacher and student, while learning meaningful knowledge from the teacher. Experiments on visual recognition tasks demonstrate that DSKD significantly outperforms existing KD methods across various models and datasets. Our code is attached in supplementary material.

Via

Access Paper or Ask Questions

LongCat-Flash-Thinking-2601 Technical Report

Jan 23, 2026

Meituan LongCat Team, Anchun Gui, Bei Li, Bingyang Tao, Bole Zhou, Borun Chen, Chao Zhang, Chen Gao, Chen Zhang, Chengcheng Han(+151 more)

Abstract:We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on a wide range of agentic benchmarks, including agentic search, agentic tool use, and tool-integrated reasoning. Beyond benchmark performance, the model demonstrates strong generalization to complex tool interactions and robust behavior under noisy real-world environments. Its advanced capability stems from a unified training framework that combines domain-parallel expert training with subsequent fusion, together with an end-to-end co-design of data construction, environments, algorithms, and infrastructure spanning from pre-training to post-training. In particular, the model's strong generalization capability in complex tool-use are driven by our in-depth exploration of environment scaling and principled task construction. To optimize long-tailed, skewed generation and multi-turn agentic interactions, and to enable stable training across over 10,000 environments spanning more than 20 domains, we systematically extend our asynchronous reinforcement learning framework, DORA, for stable and efficient large-scale multi-environment training. Furthermore, recognizing that real-world tasks are inherently noisy, we conduct a systematic analysis and decomposition of real-world noise patterns, and design targeted training procedures to explicitly incorporate such imperfections into the training process, resulting in improved robustness for real-world applications. To further enhance performance on complex reasoning tasks, we introduce a Heavy Thinking mode that enables effective test-time scaling by jointly expanding reasoning depth and width through intensive parallel thinking.

Via

Access Paper or Ask Questions

DCL-SE: Dynamic Curriculum Learning for Spatiotemporal Encoding of Brain Imaging

Nov 19, 2025

Meihua Zhou, Xinyu Tong, Jiarui Zhao, Min Cheng, Li Yang, Lei Tian, Nan Wan

Figure 1 for DCL-SE: Dynamic Curriculum Learning for Spatiotemporal Encoding of Brain Imaging

Figure 2 for DCL-SE: Dynamic Curriculum Learning for Spatiotemporal Encoding of Brain Imaging

Figure 3 for DCL-SE: Dynamic Curriculum Learning for Spatiotemporal Encoding of Brain Imaging

Figure 4 for DCL-SE: Dynamic Curriculum Learning for Spatiotemporal Encoding of Brain Imaging

Abstract:High-dimensional neuroimaging analyses for clinical diagnosis are often constrained by compromises in spatiotemporal fidelity and by the limited adaptability of large-scale, general-purpose models. To address these challenges, we introduce Dynamic Curriculum Learning for Spatiotemporal Encoding (DCL-SE), an end-to-end framework centered on data-driven spatiotemporal encoding (DaSE). We leverage Approximate Rank Pooling (ARP) to efficiently encode three-dimensional volumetric brain data into information-rich, two-dimensional dynamic representations, and then employ a dynamic curriculum learning strategy, guided by a Dynamic Group Mechanism (DGM), to progressively train the decoder, refining feature extraction from global anatomical structures to fine pathological details. Evaluated across six publicly available datasets, including Alzheimer's disease and brain tumor classification, cerebral artery segmentation, and brain age prediction, DCL-SE consistently outperforms existing methods in accuracy, robustness, and interpretability. These findings underscore the critical importance of compact, task-specific architectures in the era of large-scale pretrained networks.

Via

Access Paper or Ask Questions

Message Feedback Interference Cancellation Aided UAMP Iterative Detector for OTFS Systems

Jan 05, 2024

Xiangxiang Li, Haiyan Wang, Yao Ge, Xiaohong Shen, Jiarui Zhao

Figure 1 for Message Feedback Interference Cancellation Aided UAMP Iterative Detector for OTFS Systems

Figure 2 for Message Feedback Interference Cancellation Aided UAMP Iterative Detector for OTFS Systems

Figure 3 for Message Feedback Interference Cancellation Aided UAMP Iterative Detector for OTFS Systems

Abstract:The designing of efficient signal detectors is important and yet challenge for orthogonal time frequency space (OTFS) systems in high-mobility scenarios. In this letter, we develop an efficient message feedback interference cancellation aided unitary approximate message passing (denoted as UAMPMFIC) iterative detector, where the latest feedback messages from variable nodes are utilized for more reliable interference cancellation and performance improvement. A fast recursive scheme is leveraged in the proposed UAMP-MFIC detector to prevent complexity increasing. To further alleviate the error-propagation and improve the receiver performance, we also develop the bidirectional symbol detection structures, where Turbo UAMP-MFIC detector and iterative weight UAMP-MFIC detector are proposed to efficiently fuse the estimation results of forward and backward UAMP-MFIC detectors. The simulation results are finally provided to demonstrate performance improvement of our proposed detectors over existing detectors.

Via

Access Paper or Ask Questions