Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhi Li

Collaborative Optimization of Multiclass Imbalanced Learning: Density-Aware and Region-Guided Boosting

Dec 27, 2025

Chuantao Li, Zhi Li, Jiahao Xu, Jie Li, Sheng Li

Abstract:Numerous studies attempt to mitigate classification bias caused by class imbalance. However, existing studies have yet to explore the collaborative optimization of imbalanced learning and model training. This constraint hinders further performance improvements. To bridge this gap, this study proposes a collaborative optimization Boosting model of multiclass imbalanced learning. This model is simple but effective by integrating the density factor and the confidence factor, this study designs a noise-resistant weight update mechanism and a dynamic sampling strategy. Rather than functioning as independent components, these modules are tightly integrated to orchestrate weight updates, sample region partitioning, and region-guided sampling. Thus, this study achieves the collaborative optimization of imbalanced learning and model training. Extensive experiments on 20 public imbalanced datasets demonstrate that the proposed model significantly outperforms eight state-of-the-art baselines. The code for the proposed model is available at: https://github.com/ChuantaoLi/DARG.

Via

Access Paper or Ask Questions

StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

Nov 10, 2025

Tianrui Feng, Zhi Li, Shuo Yang, Haocheng Xi, Muyang Li, Xiuyu Li, Lvmin Zhang, Keting Yang, Kelly Peng, Song Han(+4 more)

Figure 1 for StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

Figure 2 for StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

Figure 3 for StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

Figure 4 for StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

Abstract:Generative models are reshaping the live-streaming industry by redefining how content is created, styled, and delivered. Previous image-based streaming diffusion models have powered efficient and creative live streaming products but have hit limits on temporal consistency due to the foundation of image-based designs. Recent advances in video diffusion have markedly improved temporal consistency and sampling efficiency for offline generation. However, offline generation systems primarily optimize throughput by batching large workloads. In contrast, live online streaming operates under strict service-level objectives (SLOs): time-to-first-frame must be minimal, and every frame must meet a per-frame deadline with low jitter. Besides, scalable multi-GPU serving for real-time streams remains largely unresolved so far. To address this, we present StreamDiffusionV2, a training-free pipeline for interactive live streaming with video diffusion models. StreamDiffusionV2 integrates an SLO-aware batching scheduler and a block scheduler, together with a sink-token--guided rolling KV cache, a motion-aware noise controller, and other system-level optimizations. Moreover, we introduce a scalable pipeline orchestration that parallelizes the diffusion process across denoising steps and network layers, achieving near-linear FPS scaling without violating latency guarantees. The system scales seamlessly across heterogeneous GPU environments and supports flexible denoising steps (e.g., 1--4), enabling both ultra-low-latency and higher-quality modes. Without TensorRT or quantization, StreamDiffusionV2 renders the first frame within 0.5s and attains 58.28 FPS with a 14B-parameter model and 64.52 FPS with a 1.3B-parameter model on four H100 GPUs, making state-of-the-art generative live streaming practical and accessible--from individual creators to enterprise-scale platforms.

* Project Page: http://streamdiffusionv2.github.io

Via

Access Paper or Ask Questions

A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation

Oct 31, 2025

Liyang He, Zhenya Huang, Cheng Yang, Rui Li, Zheng Zhang, Kai Zhang, Zhi Li, Qi Liu, Enhong Chen

Figure 1 for A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation

Figure 2 for A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation

Figure 3 for A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation

Figure 4 for A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation

Abstract:With the rapid growth of textual content on the Internet, efficient large-scale semantic text retrieval has garnered increasing attention from both academia and industry. Text hashing, which projects original texts into compact binary hash codes, is a crucial method for this task. By using binary codes, the semantic similarity computation for text pairs is significantly accelerated via fast Hamming distance calculations, and storage costs are greatly reduced. With the advancement of deep learning, deep text hashing has demonstrated significant advantages over traditional, data-independent hashing techniques. By leveraging deep neural networks, these methods can learn compact and semantically rich binary representations directly from data, overcoming the performance limitations of earlier approaches. This survey investigates current deep text hashing methods by categorizing them based on their core components: semantic extraction, hash code quality preservation, and other key technologies. We then present a detailed evaluation schema with results on several popular datasets, followed by a discussion of practical applications and open-source tools for implementation. Finally, we conclude by discussing key challenges and future research directions, including the integration of deep text hashing with large language models to further advance the field. The project for this survey can be accessed at https://github.com/hly1998/DeepTextHashing.

Via

Access Paper or Ask Questions

WeaveRec: An LLM-Based Cross-Domain Sequential Recommendation Framework with Model Merging

Oct 30, 2025

Min Hou, Xin Liu, Le Wu, Chenyi He, Hao Liu, Zhi Li, Xin Li, Si Wei

Figure 1 for WeaveRec: An LLM-Based Cross-Domain Sequential Recommendation Framework with Model Merging

Figure 2 for WeaveRec: An LLM-Based Cross-Domain Sequential Recommendation Framework with Model Merging

Figure 3 for WeaveRec: An LLM-Based Cross-Domain Sequential Recommendation Framework with Model Merging

Figure 4 for WeaveRec: An LLM-Based Cross-Domain Sequential Recommendation Framework with Model Merging

Abstract:Cross-Domain Sequential Recommendation (CDSR) seeks to improve user preference modeling by transferring knowledge from multiple domains. Despite the progress made in CDSR, most existing methods rely on overlapping users or items to establish cross-domain correlations-a requirement that rarely holds in real-world settings. The advent of large language models (LLM) and model-merging techniques appears to overcome this limitation by unifying multi-domain data without explicit overlaps. Yet, our empirical study shows that naively training an LLM on combined domains-or simply merging several domain-specific LLMs-often degrades performance relative to a model trained solely on the target domain. To address these challenges, we first experimentally investigate the cause of suboptimal performance in LLM-based cross-domain recommendation and model merging. Building on these insights, we introduce WeaveRec, which cross-trains multiple LoRA modules with source and target domain data in a weaving fashion, and fuses them via model merging. WeaveRec can be extended to multi-source domain scenarios and notably does not introduce additional inference-time cost in terms of latency or memory. Furthermore, we provide a theoretical guarantee that WeaveRec can reduce the upper bound of the expected error in the target domain. Extensive experiments on single-source, multi-source, and cross-platform cross-domain recommendation scenarios validate that WeaveRec effectively mitigates performance degradation and consistently outperforms baseline approaches in real-world recommendation tasks.

Via

Access Paper or Ask Questions

RCPU: Rotation-Constrained Error Compensation for Structured Pruning of a Large Language Model

Oct 09, 2025

Shuichiro Haruta, Kazunori Matsumoto, Zhi Li, Yanan Wang, Mori Kurokawa

Abstract:In this paper, we propose a rotation-constrained compensation method to address the errors introduced by structured pruning of large language models (LLMs). LLMs are trained on massive datasets and accumulate rich semantic knowledge in their representation space. In contrast, pruning is typically carried out with only a small amount of calibration data, which makes output mismatches unavoidable. Although direct least-squares fitting can reduce such errors, it tends to overfit to the limited calibration set, destructively modifying pretrained weights. To overcome this difficulty, we update the pruned parameters under a rotation constraint. This constrained update preserves the geometry of output representations (i.e., norms and inner products) and simultaneously re-aligns the pruned subspace with the original outputs. Furthermore, in rotation-constrained compensation, removing components that strongly contribute to the principal directions of the output makes error recovery difficult. Since input dimensions with large variance strongly affect these principal directions, we design a variance-aware importance score that ensures such dimensions are preferentially kept in the pruned model. By combining this scoring rule with rotation-constrained updates, the proposed method effectively compensates errors while retaining the components likely to be more important in a geometry-preserving manner. In the experiments, we apply the proposed method to LLaMA-7B and evaluate it on WikiText-2 and multiple language understanding benchmarks. The results demonstrate consistently better perplexity and task accuracy compared with existing baselines.

Via

Access Paper or Ask Questions

Towards personalized, precise and survey-free environment recognition: AI-enhanced sensor fusion without pre-deployment

Sep 16, 2025

Ruichen Wang, Zhikang Ni, Pengzhou Wang, Xiya Cao, Zhi Li, Bao Zhang

Abstract:Accurate and personalized environment recognition is essential for seamless indoor positioning and optimized connectivity, yet traditional fingerprinting requires costly site surveys and lacks user-level adaptation. We present a survey-free, on-device sensor-fusion framework that builds a personalized, lightweight multi-source fingerprint (FP) database from pedestrian dead reckoning (PDR), WiFi/cellular, GNSS, and interaction time tags. Matching is performed by an AI-enhanced dynamic time warping module (AIDTW) that aligns noisy, asynchronous sequences. To turn perception into continually improving actions, a cloud-edge online Reinforcement Learning from Human Feedback (RLHF) loop aggregates desensitized summaries and human feedback in the cloud to optimize a policy via proximal policy optimization (PPO), and periodically distills updates to devices. Across indoor/outdoor scenarios, our system reduces network-transition latency (measured by time-to-switch, TTS) by 32-65% in daily environments compared with conventional baselines, without site-specific pre-deployment.

* 5 pages, 7 figures, conference

Via

Access Paper or Ask Questions

Unfolding Framework with Complex-Valued Deformable Attention for High-Quality Computer-Generated Hologram Generation

Aug 29, 2025

Haomiao Zhang, Zhangyuan Li, Yanling Piao, Zhi Li, Xiaodong Wang, Miao Cao, Xiongfei Su, Qiang Song, Xin Yuan

Abstract:Computer-generated holography (CGH) has gained wide attention with deep learning-based algorithms. However, due to its nonlinear and ill-posed nature, challenges remain in achieving accurate and stable reconstruction. Specifically, ($i$) the widely used end-to-end networks treat the reconstruction model as a black box, ignoring underlying physical relationships, which reduces interpretability and flexibility. ($ii$) CNN-based CGH algorithms have limited receptive fields, hindering their ability to capture long-range dependencies and global context. ($iii$) Angular spectrum method (ASM)-based models are constrained to finite near-fields.In this paper, we propose a Deep Unfolding Network (DUN) that decomposes gradient descent into two modules: an adaptive bandwidth-preserving model (ABPM) and a phase-domain complex-valued denoiser (PCD), providing more flexibility. ABPM allows for wider working distances compared to ASM-based methods. At the same time, PCD leverages its complex-valued deformable self-attention module to capture global features and enhance performance, achieving a PSNR over 35 dB. Experiments on simulated and real data show state-of-the-art results.

Via

Access Paper or Ask Questions

CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

Jun 18, 2025

Junke Wang, Hongshun Ling, Li Zhang, Longqian Zhang, Fang Wang, Yuan Gao, Zhi Li

Figure 1 for CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

Figure 2 for CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

Figure 3 for CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

Figure 4 for CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

Abstract:Electronic Health Records (EHR)-based disease prediction models have demonstrated significant clinical value in promoting precision medicine and enabling early intervention. However, existing large language models face two major challenges: insufficient representation of medical knowledge and low efficiency in clinical deployment. To address these challenges, this study proposes the CKD-EHR (Clinical Knowledge Distillation for EHR) framework, which achieves efficient and accurate disease risk prediction through knowledge distillation techniques. Specifically, the large language model Qwen2.5-7B is first fine-tuned on medical knowledge-enhanced data to serve as the teacher model.It then generates interpretable soft labels through a multi-granularity attention distillation mechanism. Finally, the distilled knowledge is transferred to a lightweight BERT student model. Experimental results show that on the MIMIC-III dataset, CKD-EHR significantly outperforms the baseline model:diagnostic accuracy is increased by 9%, F1-score is improved by 27%, and a 22.2 times inference speedup is achieved. This innovative solution not only greatly improves resource utilization efficiency but also significantly enhances the accuracy and timeliness of diagnosis, providing a practical technical approach for resource optimization in clinical settings. The code and data for this research are available athttps://github.com/209506702/CKD_EHR.

* 20 pages,5 figures

Via

Access Paper or Ask Questions

Human-Robot Collaboration for the Remote Control of Mobile Humanoid Robots with Torso-Arm Coordination

May 09, 2025

Nikita Boguslavskii, Lorena Maria Genua, Zhi Li

Figure 1 for Human-Robot Collaboration for the Remote Control of Mobile Humanoid Robots with Torso-Arm Coordination

Figure 2 for Human-Robot Collaboration for the Remote Control of Mobile Humanoid Robots with Torso-Arm Coordination

Abstract:Recently, many humanoid robots have been increasingly deployed in various facilities, including hospitals and assisted living environments, where they are often remotely controlled by human operators. Their kinematic redundancy enhances reachability and manipulability, enabling them to navigate complex, cluttered environments and perform a wide range of tasks. However, this redundancy also presents significant control challenges, particularly in coordinating the movements of the robot's macro-micro structure (torso and arms). Therefore, we propose various human-robot collaborative (HRC) methods for coordinating the torso and arm of remotely controlled mobile humanoid robots, aiming to balance autonomy and human input to enhance system efficiency and task execution. The proposed methods include human-initiated approaches, where users manually control torso movements, and robot-initiated approaches, which autonomously coordinate torso and arm based on factors such as reachability, task goal, or inferred human intent. We conducted a user study with N=17 participants to compare the proposed approaches in terms of task performance, manipulability, and energy efficiency, and analyzed which methods were preferred by participants.

* This work has been accepted for publication in 2025 IEEE International Conference on Robotics and Automation (ICRA 2025). The final published version will be available via IEEE Xplore

Via

Access Paper or Ask Questions

InstructAttribute: Fine-grained Object Attributes editing with Instruction

May 01, 2025

Xingxi Yin, Jingfeng Zhang, Zhi Li, Yicheng Li, Yin Zhang

Abstract:Text-to-image (T2I) diffusion models, renowned for their advanced generative abilities, are extensively utilized in image editing applications, demonstrating remarkable effectiveness. However, achieving precise control over fine-grained attributes still presents considerable challenges. Existing image editing techniques either fail to modify the attributes of an object or struggle to preserve its structure and maintain consistency in other areas of the image. To address these challenges, we propose the Structure-Preserving and Attribute Amplification (SPAA), a training-free method which enables precise control over the color and material transformations of objects by editing the self-attention maps and cross-attention values. Furthermore, we constructed the Attribute Dataset, which encompasses nearly all colors and materials associated with various objects, by integrating multimodal large language models (MLLM) to develop an automated pipeline for data filtering and instruction labeling. Training on this dataset, we present our InstructAttribute, an instruction-based model designed to facilitate fine-grained editing of color and material attributes. Extensive experiments demonstrate that our method achieves superior performance in object-level color and material editing, outperforming existing instruction-based image editing approaches.

Via

Access Paper or Ask Questions