Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhi Li

CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

Jun 18, 2025

Junke Wang, Hongshun Ling, Li Zhang, Longqian Zhang, Fang Wang, Yuan Gao, Zhi Li

Abstract:Electronic Health Records (EHR)-based disease prediction models have demonstrated significant clinical value in promoting precision medicine and enabling early intervention. However, existing large language models face two major challenges: insufficient representation of medical knowledge and low efficiency in clinical deployment. To address these challenges, this study proposes the CKD-EHR (Clinical Knowledge Distillation for EHR) framework, which achieves efficient and accurate disease risk prediction through knowledge distillation techniques. Specifically, the large language model Qwen2.5-7B is first fine-tuned on medical knowledge-enhanced data to serve as the teacher model.It then generates interpretable soft labels through a multi-granularity attention distillation mechanism. Finally, the distilled knowledge is transferred to a lightweight BERT student model. Experimental results show that on the MIMIC-III dataset, CKD-EHR significantly outperforms the baseline model:diagnostic accuracy is increased by 9%, F1-score is improved by 27%, and a 22.2 times inference speedup is achieved. This innovative solution not only greatly improves resource utilization efficiency but also significantly enhances the accuracy and timeliness of diagnosis, providing a practical technical approach for resource optimization in clinical settings. The code and data for this research are available athttps://github.com/209506702/CKD_EHR.

* 20 pages,5 figures

Via

Access Paper or Ask Questions

Human-Robot Collaboration for the Remote Control of Mobile Humanoid Robots with Torso-Arm Coordination

May 09, 2025

Nikita Boguslavskii, Lorena Maria Genua, Zhi Li

Abstract:Recently, many humanoid robots have been increasingly deployed in various facilities, including hospitals and assisted living environments, where they are often remotely controlled by human operators. Their kinematic redundancy enhances reachability and manipulability, enabling them to navigate complex, cluttered environments and perform a wide range of tasks. However, this redundancy also presents significant control challenges, particularly in coordinating the movements of the robot's macro-micro structure (torso and arms). Therefore, we propose various human-robot collaborative (HRC) methods for coordinating the torso and arm of remotely controlled mobile humanoid robots, aiming to balance autonomy and human input to enhance system efficiency and task execution. The proposed methods include human-initiated approaches, where users manually control torso movements, and robot-initiated approaches, which autonomously coordinate torso and arm based on factors such as reachability, task goal, or inferred human intent. We conducted a user study with N=17 participants to compare the proposed approaches in terms of task performance, manipulability, and energy efficiency, and analyzed which methods were preferred by participants.

* This work has been accepted for publication in 2025 IEEE International Conference on Robotics and Automation (ICRA 2025). The final published version will be available via IEEE Xplore

Via

Access Paper or Ask Questions

InstructAttribute: Fine-grained Object Attributes editing with Instruction

May 01, 2025

Xingxi Yin, Jingfeng Zhang, Zhi Li, Yicheng Li, Yin Zhang

Abstract:Text-to-image (T2I) diffusion models, renowned for their advanced generative abilities, are extensively utilized in image editing applications, demonstrating remarkable effectiveness. However, achieving precise control over fine-grained attributes still presents considerable challenges. Existing image editing techniques either fail to modify the attributes of an object or struggle to preserve its structure and maintain consistency in other areas of the image. To address these challenges, we propose the Structure-Preserving and Attribute Amplification (SPAA), a training-free method which enables precise control over the color and material transformations of objects by editing the self-attention maps and cross-attention values. Furthermore, we constructed the Attribute Dataset, which encompasses nearly all colors and materials associated with various objects, by integrating multimodal large language models (MLLM) to develop an automated pipeline for data filtering and instruction labeling. Training on this dataset, we present our InstructAttribute, an instruction-based model designed to facilitate fine-grained editing of color and material attributes. Extensive experiments demonstrate that our method achieves superior performance in object-level color and material editing, outperforming existing instruction-based image editing approaches.

Via

Access Paper or Ask Questions

An End-to-End Framework for Optimizing Foot Trajectory and Force in Dry Adhesion Legged Wall-Climbing Robots

Apr 28, 2025

Jichun Xiao, Jiawei Nie, Lina Hao, Zhi Li

Abstract:Foot trajectory planning for dry adhesion legged climbing robots presents challenges, as the phases of foot detachment, swing, and adhesion significantly influence the adhesion and detachment forces essential for stable climbing. To tackle this, an end-to-end foot trajectory and force optimization framework (FTFOF) is proposed, which optimizes foot adhesion and detachment forces through trajectory adjustments. This framework accepts general foot trajectory constraints and user-defined parameters as input, ultimately producing an optimal single foot trajectory. It integrates three-segment $C^2$ continuous Bezier curves, tailored to various foot structures, enabling the generation of effective climbing trajectories. A dilate-based GRU predictive model establishes the relationship between foot trajectories and the corresponding foot forces. Multi-objective optimization algorithms, combined with a redundancy hierarchical strategy, identify the most suitable foot trajectory for specific tasks, thereby ensuring optimal performance across detachment force, adhesion force and vibration amplitude. Experimental validation on the quadruped climbing robot MST-M3F showed that, compared to commonly used trajectories in existing legged climbing robots, the proposed framework achieved reductions in maximum detachment force by 28 \%, vibration amplitude by 82 \%, which ensures the stable climbing of dry adhesion legged climbing robots.

Via

Access Paper or Ask Questions

A Universal Framework for Compressing Embeddings in CTR Prediction

Feb 21, 2025

Kefan Wang, Hao Wang, Kenan Song, Wei Guo, Kai Cheng, Zhi Li, Yong Liu, Defu Lian, Enhong Chen

Abstract:Accurate click-through rate (CTR) prediction is vital for online advertising and recommendation systems. Recent deep learning advancements have improved the ability to capture feature interactions and understand user interests. However, optimizing the embedding layer often remains overlooked. Embedding tables, which represent categorical and sequential features, can become excessively large, surpassing GPU memory limits and necessitating storage in CPU memory. This results in high memory consumption and increased latency due to frequent GPU-CPU data transfers. To tackle these challenges, we introduce a Model-agnostic Embedding Compression (MEC) framework that compresses embedding tables by quantizing pre-trained embeddings, without sacrificing recommendation quality. Our approach consists of two stages: first, we apply popularity-weighted regularization to balance code distribution between high- and low-frequency features. Then, we integrate a contrastive learning mechanism to ensure a uniform distribution of quantized codes, enhancing the distinctiveness of embeddings. Experiments on three datasets reveal that our method reduces memory usage by over 50x while maintaining or improving recommendation performance compared to existing models. The implementation code is accessible in our project repository https://github.com/USTC-StarTeam/MEC.

* Accepted by DASFAA2025

Via

Access Paper or Ask Questions

LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing

Feb 17, 2025

Zhengxiang Wang, Veronika Makarova, Zhi Li, Jordan Kodner, Owen Rambow

Abstract:The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus for reproducibility.

* 26 pages, 6 figures, 15 tables

Via

Access Paper or Ask Questions

Improved Extrinsic Calibration of Acoustic Cameras via Batch Optimization

Feb 10, 2025

Zhi Li, Jiang Wang, Xiaoyang Li, He Kong

Figure 1 for Improved Extrinsic Calibration of Acoustic Cameras via Batch Optimization

Figure 2 for Improved Extrinsic Calibration of Acoustic Cameras via Batch Optimization

Figure 3 for Improved Extrinsic Calibration of Acoustic Cameras via Batch Optimization

Figure 4 for Improved Extrinsic Calibration of Acoustic Cameras via Batch Optimization

Abstract:Acoustic cameras have found many applications in practice. Accurate and reliable extrinsic calibration of the microphone array and visual sensors within acoustic cameras is crucial for fusing visual and auditory measurements. Existing calibration methods either require prior knowledge of the microphone array geometry or rely on grid search which suffers from slow iteration speed or poor convergence. To overcome these limitations, in this paper, we propose an automatic calibration technique using a calibration board with both visual and acoustic markers to identify each microphone position in the camera frame. We formulate the extrinsic calibration problem (between microphones and the visual sensor) as a nonlinear least squares problem and employ a batch optimization strategy to solve the associated problem. Extensive numerical simulations and realworld experiments show that the proposed method improves both the accuracy and robustness of extrinsic parameter calibration for acoustic cameras, in comparison to existing methods. To benefit the community, we open-source all the codes and data at https://github.com/AISLAB-sustech/AcousticCamera.

* This paper was accepted and is going to be presented at ICASSP 2025

Via

Access Paper or Ask Questions

Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Dec 20, 2024

Jun Ma, Feifei Li, Sumin Kim, Reza Asakereh, Bao-Hiep Le, Dang-Khoa Nguyen-Vu, Alexander Pfefferle, Muxin Wei, Ruochen Gao, Donghang Lyu(+72 more)

Figure 1 for Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Figure 2 for Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Figure 3 for Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Abstract:Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spanning nine common imaging modalities from over 20 different institutions. The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline that substantially reduced computational requirements while maintaining state-of-the-art segmentation accuracy. Moreover, the post-challenge phase advanced the algorithms through the design of performance booster and reproducibility tasks, resulting in improved algorithms and validated reproducibility of the winning solution. Furthermore, the best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption. The data and code are publicly available to foster the further development of medical image segmentation foundation models and pave the way for impactful real-world applications.

* CVPR 2024 MedSAM on Laptop Competition Summary: https://www.codabench.org/competitions/1847/

Via

Access Paper or Ask Questions

GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

Dec 15, 2024

Xutao Liao, Shaohui Li, Yuhui Xu, Zhi Li, Yu Liu, You He

Figure 1 for GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

Figure 2 for GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

Figure 3 for GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

Figure 4 for GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

Abstract:Recent low-rank training methods, such as GaLore, have significantly reduced the memory required to optimize large language models (LLMs). However, these methods often suffer from time-consuming low-rank projection estimations. In particular, the singular value decomposition (SVD) in GaLore can consume more than 80\% of the total training time. To address this issue, we propose GaLore$+$, which uses cross-head low-rank projection to reduce the substantial time consumption in estimating low-rank projections for multi-head attention. In addition, we employ randomized subspace iteration to achieve fast SVD. To further enhance performance, we propose sparsely coded residuals to reduce the errors caused by low-rank approximation on the first- and second-order moments of the optimizers and weight updates. We evaluate GaLore$+$ on arithmetic reasoning and natural language generation datasets. Our experiments demonstrate that GaLore$+$ delivers superior performance while achieving approximately $4\times$ fine-tuning speed compared to vanilla GaLore.

Via

Access Paper or Ask Questions

A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM

Dec 10, 2024

Zongbo Liao, Xuanxuan Zhang, Tianxiang Zhang, Zhi Li, Zhenqi Zheng, Zhichao Wen, You Li

Figure 1 for A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM

Figure 2 for A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM

Figure 3 for A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM

Figure 4 for A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM

Abstract:LiDAR is widely used in Simultaneous Localization and Mapping (SLAM) and autonomous driving. The LiDAR odometry is of great importance in multi-sensor fusion. However, in some unstructured environments, the point cloud registration cannot constrain the poses of the LiDAR due to its sparse geometric features, which leads to the degeneracy of multi-sensor fusion accuracy. To address this problem, we propose a novel real-time approach to sense and compensate for the degeneracy of LiDAR. Firstly, this paper introduces the degeneracy factor with clear meaning, which can measure the degeneracy of LiDAR. Then, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering method adaptively perceives the degeneracy with better environmental generalization. Finally, the degeneracy perception results are utilized to fuse LiDAR and IMU, thus effectively resisting degeneracy effects. Experiments on our dataset show the method's high accuracy and robustness and validate our algorithm's adaptability to different environments and LiDAR scanning modalities.

* 13 pages

Via

Access Paper or Ask Questions