Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zihao Zhao

Improving Alignment in LVLMs with Debiased Self-Judgment

Aug 28, 2025

Sihan Yang, Chenhang Cui, Zihao Zhao, Yiyang Zhou, Weilong Yan, Ying Wei, Huaxiu Yao

Figure 1 for Improving Alignment in LVLMs with Debiased Self-Judgment

Figure 2 for Improving Alignment in LVLMs with Debiased Self-Judgment

Figure 3 for Improving Alignment in LVLMs with Debiased Self-Judgment

Figure 4 for Improving Alignment in LVLMs with Debiased Self-Judgment

Abstract:The rapid advancements in Large Language Models (LLMs) and Large Visual-Language Models (LVLMs) have opened up new opportunities for integrating visual and linguistic modalities. However, effectively aligning these modalities remains challenging, often leading to hallucinations--where generated outputs are not grounded in the visual input--and raising safety concerns across various domains. Existing alignment methods, such as instruction tuning and preference tuning, often rely on external datasets, human annotations, or complex post-processing, which limit scalability and increase costs. To address these challenges, we propose a novel approach that generates the debiased self-judgment score, a self-evaluation metric created internally by the model without relying on external resources. This enables the model to autonomously improve alignment. Our method enhances both decoding strategies and preference tuning processes, resulting in reduced hallucinations, enhanced safety, and improved overall capability. Empirical results show that our approach significantly outperforms traditional methods, offering a more effective solution for aligning LVLMs.

* EMNLP 2025 Findings

Via

Access Paper or Ask Questions

SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains

Jul 09, 2025

Krithika Ramesh, Daniel Smolyak, Zihao Zhao, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, Anjalie Field

Abstract:We present SynthTextEval, a toolkit for conducting comprehensive evaluations of synthetic text. The fluency of large language model (LLM) outputs has made synthetic text potentially viable for numerous applications, such as reducing the risks of privacy violations in the development and deployment of AI systems in high-stakes domains. Realizing this potential, however, requires principled consistent evaluations of synthetic data across multiple dimensions: its utility in downstream systems, the fairness of these systems, the risk of privacy leakage, general distributional differences from the source text, and qualitative feedback from domain experts. SynthTextEval allows users to conduct evaluations along all of these dimensions over synthetic data that they upload or generate using the toolkit's generation module. While our toolkit can be run over any data, we highlight its functionality and effectiveness over datasets from two high-stakes domains: healthcare and law. By consolidating and standardizing evaluation metrics, we aim to improve the viability of synthetic text, and in-turn, privacy-preservation in AI development.

Via

Access Paper or Ask Questions

Fine-grained List-wise Alignment for Generative Medication Recommendation

May 26, 2025

Chenxiao Fan, Chongming Gao, Wentao Shi, Yaxin Gong, Zihao Zhao, Fuli Feng

Abstract:Accurate and safe medication recommendations are critical for effective clinical decision-making, especially in multimorbidity cases. However, existing systems rely on point-wise prediction paradigms that overlook synergistic drug effects and potential adverse drug-drug interactions (DDIs). We propose FLAME, a fine-grained list-wise alignment framework for large language models (LLMs), enabling drug-by-drug generation of drug lists. FLAME formulates recommendation as a sequential decision process, where each step adds or removes a single drug. To provide fine-grained learning signals, we devise step-wise Group Relative Policy Optimization (GRPO) with potential-based reward shaping, which explicitly models DDIs and optimizes the contribution of each drug to the overall prescription. Furthermore, FLAME enhances patient modeling by integrating structured clinical knowledge and collaborative information into the representation space of LLMs. Experiments on benchmark datasets demonstrate that FLAME achieves state-of-the-art performance, delivering superior accuracy, controllable safety-accuracy trade-offs, and strong generalization across diverse clinical scenarios. Our code is available at https://github.com/cxfann/Flame.

Via

Access Paper or Ask Questions

UniCAD: Efficient and Extendable Architecture for Multi-Task Computer-Aided Diagnosis System

May 15, 2025

Yitao Zhu, Yuan Yin, Zhenrong Shen, Zihao Zhao, Haiyu Song, Sheng Wang, Dinggang Shen, Qian Wang

Abstract:The growing complexity and scale of visual model pre-training have made developing and deploying multi-task computer-aided diagnosis (CAD) systems increasingly challenging and resource-intensive. Furthermore, the medical imaging community lacks an open-source CAD platform to enable the rapid creation of efficient and extendable diagnostic models. To address these issues, we propose UniCAD, a unified architecture that leverages the robust capabilities of pre-trained vision foundation models to seamlessly handle both 2D and 3D medical images while requiring only minimal task-specific parameters. UniCAD introduces two key innovations: (1) Efficiency: A low-rank adaptation strategy is employed to adapt a pre-trained visual model to the medical image domain, achieving performance on par with fully fine-tuned counterparts while introducing only 0.17% trainable parameters. (2) Plug-and-Play: A modular architecture that combines a frozen foundation model with multiple plug-and-play experts, enabling diverse tasks and seamless functionality expansion. Building on this unified CAD architecture, we establish an open-source platform where researchers can share and access lightweight CAD experts, fostering a more equitable and efficient research ecosystem. Comprehensive experiments across 12 diverse medical datasets demonstrate that UniCAD consistently outperforms existing methods in both accuracy and deployment efficiency. The source code and project page are available at https://mii-laboratory.github.io/UniCAD/.

* 14 pages

Via

Access Paper or Ask Questions

Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks

Mar 05, 2025

Zihao Zhao, Chenxiao Fan, Chongming Gao, Fuli Feng, Xiangnan He

Abstract:Medication recommendation systems have garnered attention within healthcare for their potential to deliver personalized and efficacious drug combinations based on patient's clinical data. However, existing methodologies encounter challenges in adapting to diverse Electronic Health Records (EHR) systems and effectively utilizing unstructured data, resulting in limited generalization capabilities and suboptimal performance. Recently, interest is growing in harnessing Large Language Models (LLMs) in the medical domain to support healthcare professionals and enhance patient care. Despite the emergence of medical LLMs and their promising results in tasks like medical question answering, their practical applicability in clinical settings, particularly in medication recommendation, often remains underexplored. In this study, we evaluate both general-purpose and medical-specific LLMs for medication recommendation tasks. Our findings reveal that LLMs frequently encounter the challenge of overprescribing, leading to heightened clinical risks and diminished medication recommendation accuracy. To address this issue, we propose Language-Assisted Medication Recommendation (LAMO), which employs a parameter-efficient fine-tuning approach to tailor open-source LLMs for optimal performance in medication recommendation scenarios. LAMO leverages the wealth of clinical information within clinical notes, a resource often underutilized in traditional methodologies. As a result of our approach, LAMO outperforms previous state-of-the-art methods by over 10% in internal validation accuracy. Furthermore, temporal and external validations demonstrate LAMO's robust generalization capabilities across various temporal and hospital contexts. Additionally, an out-of-distribution medication recommendation experiment demonstrates LAMO's remarkable accuracy even with medications outside the training data.

Via

Access Paper or Ask Questions

Med-LEGO: Editing and Adapting toward Generalist Medical Image Diagnosis

Mar 03, 2025

Yitao Zhu, Yuan Yin, Jiaming Li, Mengjie Xu, Zihao Zhao, Honglin Xiong, Sheng Wang, Qian Wang

Figure 1 for Med-LEGO: Editing and Adapting toward Generalist Medical Image Diagnosis

Figure 2 for Med-LEGO: Editing and Adapting toward Generalist Medical Image Diagnosis

Figure 3 for Med-LEGO: Editing and Adapting toward Generalist Medical Image Diagnosis

Abstract:The adoption of visual foundation models has become a common practice in computer-aided diagnosis (CAD). While these foundation models provide a viable solution for creating generalist medical AI, privacy concerns make it difficult to pre-train or continuously update such models across multiple domains and datasets, leading many studies to focus on specialist models. To address this challenge, we propose Med-LEGO, a training-free framework that enables the seamless integration or updating of a generalist CAD model by combining multiple specialist models, similar to assembling LEGO bricks. Med-LEGO enhances LoRA (low-rank adaptation) by incorporating singular value decomposition (SVD) to efficiently capture the domain expertise of each specialist model with minimal additional parameters. By combining these adapted weights through simple operations, Med-LEGO allows for the easy integration or modification of specific diagnostic capabilities without the need for original data or retraining. Finally, the combined model can be further adapted to new diagnostic tasks, making it a versatile generalist model. Our extensive experiments demonstrate that Med-LEGO outperforms existing methods in both cross-domain and in-domain medical tasks while using only 0.18% of full model parameters. These merged models show better convergence and generalization to new tasks, providing an effective path toward generalist medical AI.

Via

Access Paper or Ask Questions

Privacy-Preserving Hybrid Ensemble Model for Network Anomaly Detection: Balancing Security and Data Protection

Feb 13, 2025

Shaobo Liu, Zihao Zhao, Weijie He, Jiren Wang, Jing Peng, Haoyuan Ma

Abstract:Privacy-preserving network anomaly detection has become an essential area of research due to growing concerns over the protection of sensitive data. Traditional anomaly de- tection models often prioritize accuracy while neglecting the critical aspect of privacy. In this work, we propose a hybrid ensemble model that incorporates privacy-preserving techniques to address both detection accuracy and data protection. Our model combines the strengths of several machine learning algo- rithms, including K-Nearest Neighbors (KNN), Support Vector Machines (SVM), XGBoost, and Artificial Neural Networks (ANN), to create a robust system capable of identifying network anomalies while ensuring privacy. The proposed approach in- tegrates advanced preprocessing techniques that enhance data quality and address the challenges of small sample sizes and imbalanced datasets. By embedding privacy measures into the model design, our solution offers a significant advancement over existing methods, ensuring both enhanced detection performance and strong privacy safeguards.

* Accepted by 2024 5th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering(ICBAIE 2024)

Via

Access Paper or Ask Questions

Artistic Neural Style Transfer Algorithms with Activation Smoothing

Nov 12, 2024

Xiangtian Li, Han Cao, Zhaoyang Zhang, Jiacheng Hu, Yuhui Jin, Zihao Zhao

Figure 1 for Artistic Neural Style Transfer Algorithms with Activation Smoothing

Figure 2 for Artistic Neural Style Transfer Algorithms with Activation Smoothing

Figure 3 for Artistic Neural Style Transfer Algorithms with Activation Smoothing

Figure 4 for Artistic Neural Style Transfer Algorithms with Activation Smoothing

Abstract:The works of Gatys et al. demonstrated the capability of Convolutional Neural Networks (CNNs) in creating artistic style images. This process of transferring content images in different styles is called Neural Style Transfer (NST). In this paper, we re-implement image-based NST, fast NST, and arbitrary NST. We also explore to utilize ResNet with activation smoothing in NST. Extensive experimental results demonstrate that smoothing transformation can greatly improve the quality of stylization results.

* 8 pages,7 figures

Via

Access Paper or Ask Questions

Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning

Nov 04, 2024

Zihao Zhao, Yijiang Li, Yuchen Yang, Wenqing Zhang, Nuno Vasconcelos, Yinzhi Cao

Abstract:Machine unlearning--enabling a trained model to forget specific data--is crucial for addressing biased data and adhering to privacy regulations like the General Data Protection Regulation (GDPR)'s "right to be forgotten". Recent works have paid little attention to privacy concerns, leaving the data intended for forgetting vulnerable to membership inference attacks. Moreover, they often come with high computational overhead. In this work, we propose Pseudo-Probability Unlearning (PPU), a novel method that enables models to forget data efficiently and in a privacy-preserving manner. Our method replaces the final-layer output probabilities of the neural network with pseudo-probabilities for the data to be forgotten. These pseudo-probabilities follow either a uniform distribution or align with the model's overall distribution, enhancing privacy and reducing risk of membership inference attacks. Our optimization strategy further refines the predictive probability distributions and updates the model's weights accordingly, ensuring effective forgetting with minimal impact on the model's overall performance. Through comprehensive experiments on multiple benchmarks, our method achieves over 20% improvements in forgetting error compared to the state-of-the-art. Additionally, our method enhances privacy by preventing the forgotten set from being inferred to around random guesses.

Via

Access Paper or Ask Questions

A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining

Oct 19, 2024

Wenyi Liu, Rui Wang, Yuanshuai Luo, Jianjun Wei, Zihao Zhao, Junming Huang

Figure 1 for A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining

Figure 2 for A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining

Figure 3 for A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining

Figure 4 for A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining

Abstract:With the explosive growth of Internet data, users are facing the problem of information overload, which makes it a challenge to efficiently obtain the required resources. Recommendation systems have emerged in this context. By filtering massive amounts of information, they provide users with content that meets their needs, playing a key role in scenarios such as advertising recommendation and product recommendation. However, traditional click-through rate prediction and TOP-K recommendation mechanisms are gradually unable to meet the recommendations needs in modern life scenarios due to high computational complexity, large memory consumption, long feature selection time, and insufficient feature interaction. This paper proposes a recommendations system model based on a separation embedding cross-network. The model uses an embedding neural network layer to transform sparse feature vectors into dense embedding vectors, and can independently perform feature cross operations on different dimensions, thereby improving the accuracy and depth of feature mining. Experimental results show that the model shows stronger adaptability and higher prediction accuracy in processing complex data sets, effectively solving the problems existing in existing models.

Via

Access Paper or Ask Questions