Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gang Li

Senior Member, IEEE

A Semiparametric Bayesian Method for Instrumental Variable Analysis with Partly Interval-Censored Time-to-Event Outcome

Jan 23, 2025

Elvis Han Cui, Xuyang Lu, Jin Zhou, Hua Zhou, Gang Li

Figure 1 for A Semiparametric Bayesian Method for Instrumental Variable Analysis with Partly Interval-Censored Time-to-Event Outcome

Figure 2 for A Semiparametric Bayesian Method for Instrumental Variable Analysis with Partly Interval-Censored Time-to-Event Outcome

Figure 3 for A Semiparametric Bayesian Method for Instrumental Variable Analysis with Partly Interval-Censored Time-to-Event Outcome

Figure 4 for A Semiparametric Bayesian Method for Instrumental Variable Analysis with Partly Interval-Censored Time-to-Event Outcome

Abstract:This paper develops a semiparametric Bayesian instrumental variable analysis method for estimating the causal effect of an endogenous variable when dealing with unobserved confounders and measurement errors with partly interval-censored time-to-event data, where event times are observed exactly for some subjects but left-censored, right-censored, or interval-censored for others. Our method is based on a two-stage Dirichlet process mixture instrumental variable (DPMIV) model which simultaneously models the first-stage random error term for the exposure variable and the second-stage random error term for the time-to-event outcome using a bivariate Gaussian mixture of the Dirichlet process (DPM) model. The DPM model can be broadly understood as a mixture model with an unspecified number of Gaussian components, which relaxes the normal error assumptions and allows the number of mixture components to be determined by the data. We develop an MCMC algorithm for the DPMIV model tailored for partly interval-censored data and conduct extensive simulations to assess the performance of our DPMIV method in comparison with some competing methods. Our simulations revealed that our proposed method is robust under different error distributions and can have superior performance over its parametric counterpart under various scenarios. We further demonstrate the effectiveness of our approach on an UK Biobank data to investigate the causal effect of systolic blood pressure on time-to-development of cardiovascular disease from the onset of diabetes mellitus.

Via

Access Paper or Ask Questions

Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Jan 11, 2025

Xiaoying Xing, Avinab Saha, Junfeng He, Susan Hao, Paul Vicol, Moonkyung Ryu, Gang Li, Sahil Singla, Sarah Young, Yinxiao Li(+2 more)

Figure 1 for Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Figure 2 for Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Figure 3 for Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Figure 4 for Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Abstract:Text-to-image (T2I) generation has made significant advances in recent years, but challenges still remain in the generation of perceptual artifacts, misalignment with complex prompts, and safety. The prevailing approach to address these issues involves collecting human feedback on generated images, training reward models to estimate human feedback, and then fine-tuning T2I models based on the reward models to align them with human preferences. However, while existing reward fine-tuning methods can produce images with higher rewards, they may change model behavior in unexpected ways. For example, fine-tuning for one quality aspect (e.g., safety) may degrade other aspects (e.g., prompt alignment), or may lead to reward hacking (e.g., finding a way to increase rewards without having the intended effect). In this paper, we propose Focus-N-Fix, a region-aware fine-tuning method that trains models to correct only previously problematic image regions. The resulting fine-tuned model generates images with the same high-level structure as the original model but shows significant improvements in regions where the original model was deficient in safety (over-sexualization and violence), plausibility, or other criteria. Our experiments demonstrate that Focus-N-Fix improves these localized quality aspects with little or no degradation to others and typically imperceptible changes in the rest of the image. Disclaimer: This paper contains images that may be overly sexual, violent, offensive, or harmful.

Via

Access Paper or Ask Questions

Large Language Models for Bioinformatics

Jan 10, 2025

Wei Ruan, Yanjun Lyu, Jing Zhang, Jiazhang Cai, Peng Shu, Yang Ge, Yao Lu, Shang Gao, Yue Wang, Peilong Wang(+45 more)

Figure 1 for Large Language Models for Bioinformatics

Abstract:With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we highlight emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.

* 64 pages, 1 figure

Via

Access Paper or Ask Questions

Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Dec 20, 2024

Guancheng Zeng, Wentao Ding, Beining Xu, Chi Zhang, Wenqiang Han, Gang Li, Jingjing Mo, Pengxu Qiu, Xinran Tao, Wang Tao(+1 more)

Figure 1 for Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Figure 2 for Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Figure 3 for Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Figure 4 for Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Abstract:Enterprises possess a vast array of API assets scattered across various functions, forming the backbone of existing business processes. By leveraging these APIs as functional tools, enterprises can design diverse, scenario-specific agent applications, driven by on-premise function-calling models as the core engine. However, generic models often fail to meet enterprise requirements in terms of computational efficiency, output accuracy, and stability, necessitating scenario-specific adaptation. In this paper, we propose a training pipeline for function-calling capabilities tailored to real-world business scenarios. This pipeline includes the synthesis and augmentation of scenario-specific function-calling data, model fine-tuning, and performance evaluation and analysis. Using this pipeline, we generated 1,260 fully AI-generated samples and 1,035 augmented manually-labeled samples in digital HR agent scenario. The Qwen2.5-Coder-7B-Instruct model was employed as the base model and fine-tuned using the LoRA method on four GPUs with 24GB VRAM. Our fine-tuned model demonstrated outstanding performance in evaluations and practical applications, surpassing GPT-4 and GPT-4o in accuracy on the test set. These results validate the reliability of the proposed pipeline for training scenario-specific function-calling models.

* 23 pages, 6 figures, 7 tables

Via

Access Paper or Ask Questions

Can video generation replace cinematographers? Research on the cinematic language of generated video

Dec 16, 2024

Xiaozhe Li, Kai WU, Siyi Yang, YiZhan Qu, Guohua. Zhang, Zhiyu Chen, Jiayao Li, Jiangchuan Mu, Xiaobin Hu, Wen Fang(+5 more)

Figure 1 for Can video generation replace cinematographers? Research on the cinematic language of generated video

Figure 2 for Can video generation replace cinematographers? Research on the cinematic language of generated video

Figure 3 for Can video generation replace cinematographers? Research on the cinematic language of generated video

Figure 4 for Can video generation replace cinematographers? Research on the cinematic language of generated video

Abstract:Recent advancements in text-to-video (T2V) generation have leveraged diffusion models to enhance the visual coherence of videos generated from textual descriptions. However, most research has primarily focused on object motion, with limited attention given to cinematic language in videos, which is crucial for cinematographers to convey emotion and narrative pacing. To address this limitation, we propose a threefold approach to enhance the ability of T2V models to generate controllable cinematic language. Specifically, we introduce a cinematic language dataset that encompasses shot framing, angle, and camera movement, enabling models to learn diverse cinematic styles. Building on this, to facilitate robust cinematic alignment evaluation, we present CameraCLIP, a model fine-tuned on the proposed dataset that excels in understanding complex cinematic language in generated videos and can further provide valuable guidance in the multi-shot composition process. Finally, we propose CLIPLoRA, a cost-guided dynamic LoRA composition method that facilitates smooth transitions and realistic blending of cinematic language by dynamically fusing multiple pre-trained cinematic LoRAs within a single video. Our experiments demonstrate that CameraCLIP outperforms existing models in assessing the alignment between cinematic language and video, achieving an R@1 score of 0.81. Additionally, CLIPLoRA improves the ability for multi-shot composition, potentially bridging the gap between automatically generated videos and those shot by professional cinematographers.

* 13 pages

Via

Access Paper or Ask Questions

3D Distance-color-coded Assessment of PCI Stent Apposition via Deep-learning-based Three-dimensional Multi-object Segmentation

Oct 26, 2024

Xiaoyang Qin, Hao Huang, Shuaichen Lin, Xinhao Zeng, Kaizhi Cao, Renxiong Wu, Yuming Huang, Junqing Yang, Yong Liu, Gang Li(+1 more)

Abstract:Coronary artery disease poses a significant global health challenge, often necessitating percutaneous coronary intervention (PCI) with stent implantation. Assessing stent apposition holds pivotal importance in averting and identifying PCI complications that lead to in-stent restenosis. Here we proposed a novel three-dimensional (3D) distance-color-coded assessment (DccA)for PCI stent apposition via deep-learning-based 3D multi-object segmentation in intravascular optical coherence tomography (IV-OCT). Our proposed 3D DccA accurately segments 3D vessel lumens and stents in IV-OCT images, using a spatial matching network and dual-layer training with style transfer. It quantifies and maps stent-lumen distances into a 3D color space, facilitating 3D visual assessment of PCI stent apposition. Achieving over 95% segmentation precision, our proposed DccA enhances clinical evaluation of PCI stent deployment and supports personalized treatment planning.

Via

Access Paper or Ask Questions

Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models

Oct 13, 2024

Gang Li, Wendi Yu, Yao Yao, Wei Tong, Yingbin Liang, Qihang Lin, Tianbao Yang

Figure 1 for Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models

Figure 2 for Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models

Figure 3 for Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models

Figure 4 for Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models

Abstract:In the real world, a learning-enabled system usually undergoes multiple cycles of model development to enhance the system's ability to handle difficult or emerging tasks. This continual model development process raises a significant issue that the model development for acquiring new or improving existing capabilities may inadvertently lose capabilities of the old model, also known as catastrophic forgetting. Existing continual learning studies focus on mitigating catastrophic forgetting by trading off performance on previous tasks and new tasks to ensure good average performance. However, they are inadequate for many applications especially in safety-critical domains, as failure to strictly preserve the performance of the old model not only introduces safety risks and uncertainties but also imposes substantial expenses in the re-improving and re-validation of existing properties. To address this issue, we introduce model developmental safety as a guarantee of a learning system such that in the model development process the new model should strictly preserve the existing protected capabilities of the old model while improving its performance on target tasks. To ensure the model developmental safety, we present a safety-centric framework by formulating the model developmental safety as data-dependent constraints. Under this framework, we study how to develop a pretrained vision-language model (aka the CLIP model) for acquiring new capabilities or improving existing capabilities of image classification. We propose an efficient constrained optimization algorithm with theoretical guarantee and use its insights to finetune a CLIP model with task-dependent heads for promoting the model developmental safety. Our experiments on improving vision perception capabilities on autonomous driving and scene recognition datasets demonstrate the efficacy of the proposed approach.

* 40 pages, 7 figures

Via

Access Paper or Ask Questions

GLRT-Based Metric Learning for Remote Sensing Object Retrieval

Oct 08, 2024

Linping Zhang, Yu Liu, Xueqian Wang, Gang Li, You He

Figure 1 for GLRT-Based Metric Learning for Remote Sensing Object Retrieval

Figure 2 for GLRT-Based Metric Learning for Remote Sensing Object Retrieval

Figure 3 for GLRT-Based Metric Learning for Remote Sensing Object Retrieval

Figure 4 for GLRT-Based Metric Learning for Remote Sensing Object Retrieval

Abstract:With the improvement in the quantity and quality of remote sensing images, content-based remote sensing object retrieval (CBRSOR) has become an increasingly important topic. However, existing CBRSOR methods neglect the utilization of global statistical information during both training and test stages, which leads to the overfitting of neural networks to simple sample pairs of samples during training and suboptimal metric performance. Inspired by the Neyman-Pearson theorem, we propose a generalized likelihood ratio test-based metric learning (GLRTML) approach, which can estimate the relative difficulty of sample pairs by incorporating global data distribution information during training and test phases. This guides the network to focus more on difficult samples during the training process, thereby encourages the network to learn more discriminative feature embeddings. In addition, GLRT is a more effective than traditional metric space due to the utilization of global data distribution information. Accurately estimating the distribution of embeddings is critical for GLRTML. However, in real-world applications, there is often a distribution shift between the training and target domains, which diminishes the effectiveness of directly using the distribution estimated on training data. To address this issue, we propose the clustering pseudo-labels-based fast parameter adaptation (CPLFPA) method. CPLFPA efficiently estimates the distribution of embeddings in the target domain by clustering target domain instances and re-estimating the distribution parameters for GLRTML. We reorganize datasets for CBRSOR tasks based on fine-grained ship remote sensing image slices (FGSRSI-23) and military aircraft recognition (MAR20) datasets. Extensive experiments on these datasets demonstrate the effectiveness of our proposed GLRTML and CPLFPA.

Via

Access Paper or Ask Questions

Deep Learning-based Automated Diagnosis of Obstructive Sleep Apnea and Sleep Stage Classification in Children Using Millimeter-wave Radar and Pulse Oximeter

Sep 28, 2024

Wei Wang, Ruobing Song, Yunxiao Wu, Li Zheng, Wenyu Zhang, Zhaoxi Chen, Gang Li, Zhifei Xu

Figure 1 for Deep Learning-based Automated Diagnosis of Obstructive Sleep Apnea and Sleep Stage Classification in Children Using Millimeter-wave Radar and Pulse Oximeter

Figure 2 for Deep Learning-based Automated Diagnosis of Obstructive Sleep Apnea and Sleep Stage Classification in Children Using Millimeter-wave Radar and Pulse Oximeter

Figure 3 for Deep Learning-based Automated Diagnosis of Obstructive Sleep Apnea and Sleep Stage Classification in Children Using Millimeter-wave Radar and Pulse Oximeter

Abstract:Study Objectives: To evaluate the agreement between the millimeter-wave radar-based device and polysomnography (PSG) in diagnosis of obstructive sleep apnea (OSA) and classification of sleep stage in children. Methods: 281 children, aged 1 to 18 years, who underwent sleep monitoring between September and November 2023 at the Sleep Center of Beijing Children's Hospital, Capital Medical University, were recruited in the study. All enrolled children underwent sleep monitoring by PSG and the millimeter-wave radar-based device, QSA600, simultaneously. QSA600 recordings were automatically analyzed using a deep learning model meanwhile the PSG data was manually scored. Results: The Obstructive Apnea-Hypopnea Index (OAHI) obtained from QSA600 and PSG demonstrates a high level of agreement with an intraclass correlation coefficient of 0.945 (95% CI: 0.93 to 0.96). Bland-Altman analysis indicates that the mean difference of OAHI between QSA600 and PSG is -0.10 events/h (95% CI: -11.15 to 10.96). The deep learning model evaluated through cross-validation showed good sensitivity (81.8%, 84.3% and 89.7%) and specificity (90.5%, 95.3% and 97.1%) values for diagnosing children with OAHI>1, OAHI>5 and OAHI>10. The area under the receiver operating characteristic curve is 0.923, 0.955 and 0.988, respectively. For sleep stage classification, the model achieved Kappa coefficients of 0.854, 0.781, and 0.734, with corresponding overall accuracies of 95.0%, 84.8%, and 79.7% for Wake-sleep classification, Wake-REM-Light-Deep classification, and Wake-REM-N1-N2 N3 classification, respectively. Conclusions: QSA600 has demonstrated high agreement with PSG in diagnosing OSA and performing sleep staging in children. The device is portable, low-load and suitable for follow up and long-term pediatric sleep assessment.

Via

Access Paper or Ask Questions

Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Sep 28, 2024

Wei Wang, Chenyang Li, Zhaoxi Chen, Wenyu Zhang, Zetao Wang, Xi Guo, Jian Guan, Gang Li

Figure 1 for Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Figure 2 for Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Figure 3 for Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Figure 4 for Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Abstract:Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a sleep-related breathing disorder associated with significant morbidity and mortality worldwide. The gold standard for OSAHS diagnosis, polysomnography (PSG), faces challenges in popularization due to its high cost and complexity. Recently, radar has shown potential in detecting sleep apnea-hypopnea events (SAE) with the advantages of low cost and non-contact monitoring. However, existing studies, especially those using deep learning, employ segment-based classification approach for SAE detection, making the task of event quantity estimation difficult. Additionally, radar-based SAE detection is susceptible to interference from body movements and the environment. Oxygen saturation (SpO2) can offer valuable information about OSAHS, but it also has certain limitations and cannot be used alone for diagnosis. In this study, we propose a method using millimeter-wave radar and pulse oximeter to detect SAE, called ROSA. It fuses information from both sensors, and directly predicts the temporal localization of SAE. Experimental results demonstrate a high degree of consistency (ICC=0.9864) between AHI from ROSA and PSG. This study presents an effective method with low-load device for the diagnosis of OSAHS.

Via

Access Paper or Ask Questions