Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Duc-Tai Le

Response-Aware Multimodal Learning for Post-Treatment Visual Acuity Forecasting

May 30, 2026

Phuoc-Nguyen Bui, Van-Vi Vo, Duc-Tai Le, Van-Nguyen Pham, Ki-Young Kim, Seung-Young Yu, Hyunseung Choo

Abstract:Long-term visual acuity (VA) outcomes after anti-VEGF therapy are central to patient counseling, expectation setting, and follow-up planning in diabetic macular edema (DME). However, in clinical practice, physicians must often estimate long-term visual trajectories based only on early post-treatment findings, making reliable prognostication difficult. Although prior OCT-based learning approaches have largely focused on short-term response or single-endpoint prediction, modeling VA trajectories across multiple future time points from early longitudinal observations remains insufficiently explored. In this study, we assembled a real-world cohort of 188 anti-VEGF-treated DME patients with paired baseline and month-1 OCT scans, along with tabular OCT-derived biomarkers and non-imaging clinical variables. Using only these early data, we formulate a multi-horizon VA forecasting problem aimed at predicting visual outcomes at 3, 6, 12, 18, and 24 months, reflecting clinically meaningful follow-up intervals. We propose ReVA, a response-aware multimodal framework that integrates structural features from baseline and month-1 OCT with the tabular variables to capture baseline disease status and early treatment response. ReVA uses spatial attention to preserve localized prognostic imaging features and a dependency-aware tabular encoder to model interactions among clinical variables. These multimodal representations are fused to predict patient-specific long-term visual acuity trajectories. The proposed framework achieves MAE=0.1246, RMSE=0.1621, and R^2=0.6064 for 24-month VA prediction, with consistent performance across all forecast horizons. Our findings show that incorporating early treatment-response signals enables clinically meaningful long-term visual acuity forecasting, supporting data-driven decision support for routine anti-VEGF management.

* Under review

Via

Access Paper or Ask Questions

Frequency Adapter with SAM for Generalized Medical Image Segmentation

May 11, 2026

Phuoc-Nguyen Bui, Van-Nguyen Pham, Duc-Tai Le, Junghyun Bum, Hyunseung Choo

Abstract:Medical image segmentation is a critical task in computer-aided diagnosis and treatment planning. However, deep learning models often struggle to generalize across datasets due to domain shifts arising from variations in imaging protocols, scanner types, and patient populations. Traditional domain generalization (DG) methods utilize causal feature learning, adversarial consistency, and style augmentation to improve segmentation robustness. While effective, these approaches rely on explicit feature alignment, adversarial objectives, or handcrafted augmentations, which may not fully exploit the capabilities of foundation models. Recently, the Segment Anything Model (SAM) has demonstrated strong generalization capabilities in segmentation tasks. SAM-based DG methods attempt to improve medical image segmentation. However, these approaches primarily operate in the spatial domain and overlook frequency-based discrepancies that significantly affect model robustness. In this work, we propose Frequency-based Domain Generalization with SAM (FSAM), a novel framework that integrates Low-Rank Adaptation (LoRA) for efficient fine-tuning and a frequency adapter to incorporate frequency-domain representations for single-source domain generalization. FSAM enhances SAM's segmentation robustness by extracting domain-invariant high-frequency features, mitigating frequency-related domain shifts. Experimental results on fundus and prostate datasets demonstrate that FSAM outperforms existing traditional DG and SAM-based DG approaches in domain generalization. Codes and pre-trained models will be made available on GitHub.

* Under review, 10 pages, 1 figure, 2 tables

Via

Access Paper or Ask Questions

Clinical Graph-Mediated Distillation for Unpaired MRI-to-CFI Hypertension Prediction

Mar 23, 2026

Dillan Imans, Phuoc-Nguyen Bui, Duc-Tai Le, Hyunseung Choo

Abstract:Retinal fundus imaging enables low-cost and scalable hypertension (HTN) screening, but HTN-related retinal cues are subtle, yielding high-variance predictions. Brain MRI provides stronger vascular and small-vessel-disease markers of HTN, yet it is expensive and rarely acquired alongside fundus images, resulting in modality-siloed datasets with disjoint MRI and fundus cohorts. We study this unpaired MRI-fundus regime and introduce Clinical Graph-Mediated Distillation (CGMD), a framework that transfers MRI-derived HTN knowledge to a fundus model without paired multimodal data. CGMD leverages shared structured biomarkers as a bridge by constructing a clinical similarity kNN graph spanning both cohorts. We train an MRI teacher, propagate its representations over the graph, and impute brain-informed representation targets for fundus patients. A fundus student is then trained with a joint objective combining HTN supervision, target distillation, and relational distillation. Experiments on our newly collected unpaired MRI-fundus-biomarker dataset show that CGMD consistently improves fundus-based HTN prediction over standard distillation and non-graph imputation baselines, with ablations confirming the importance of clinically grounded graph connectivity. Code is available at https://github.com/DillanImans/CGMD-unpaired-distillation.

* 10 pages, 2 figures, 2 tables. Under review at MICCAI 2026

Via

Access Paper or Ask Questions

Representation Learning with Semantic-aware Instance and Sparse Token Alignments

Jan 13, 2026

Phuoc-Nguyen Bui, Toan Duc Nguyen, Junghyun Bum, Duc-Tai Le, Hyunseung Choo

Abstract:Medical contrastive vision-language pre-training (VLP) has demonstrated significant potential in improving performance on downstream tasks. Traditional approaches typically employ contrastive learning, treating paired image-report samples as positives and unpaired ones as negatives. However, in medical datasets, there can be substantial similarities between images or reports from different patients. Rigidly treating all unpaired samples as negatives, can disrupt the underlying semantic structure and negatively impact the quality of the learned representations. In this paper, we propose a multi-level alignment framework, Representation Learning with Semantic-aware Instance and Sparse Token Alignments (SISTA) by exploiting the semantic correspondence between medical image and radiology reports at two levels, i.e., image-report and patch-word levels. Specifically, we improve the conventional contrastive learning by incorporating inter-report similarity to eliminate the false negatives and introduce a method to effectively align image patches with relevant word tokens. Experimental results demonstrate the effectiveness of the proposed framework in improving transfer performance across different datasets on three downstream tasks: image classification, image segmentation, and object detection. Notably, our framework achieves significant improvements in fine-grained tasks even with limited labeled data. Codes and pre-trained models will be made available.

* Under review, 8 pages

Via

Access Paper or Ask Questions

Unsupervised Domain Adaptation with SAM-RefiSeR for Enhanced Brain Tumor Segmentation

Jan 11, 2026

Dillan Imans, Phuoc-Nguyen Bui, Duc-Tai Le, Hyunseung Choo

Abstract:Unsupervised Domain Adaptation with SAM-RefiSeR for Enhanced Brain Tumor Segmentation

* Accepted in BIBM 2025

Via

Access Paper or Ask Questions

Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis

Nov 30, 2024

Phuoc-Nguyen Bui, Duc-Tai Le, Junghyun Bum, Hyunseung Choo

Figure 1 for Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis

Figure 2 for Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis

Figure 3 for Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis

Figure 4 for Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis

Abstract:Traditional deep learning methods in medical imaging often focus solely on segmentation or classification, limiting their ability to leverage shared information. Multi-task learning (MTL) addresses this by combining both tasks through shared representations but often struggles to balance local spatial features for segmentation and global semantic features for classification, leading to suboptimal performance. In this paper, we propose a simple yet effective UNet-based MTL model, where features extracted by the encoder are used to predict classification labels, while the decoder produces the segmentation mask. The model introduces an advanced encoder incorporating a novel ResFormer block that integrates local context from convolutional feature extraction with long-range dependencies modeled by the Transformer. This design captures broader contextual relationships and fine-grained details, improving classification and segmentation accuracy. To enhance classification performance, multi-scale features from different encoder levels are combined to leverage the hierarchical representation of the input image. For segmentation, the features passed to the decoder via skip connections are refined using a novel dilated feature enhancement (DFE) module, which captures information at different scales through three parallel convolution branches with varying dilation rates. This allows the decoder to detect lesions of varying sizes with greater accuracy. Experimental results across multiple medical datasets confirm the superior performance of our model in both segmentation and classification tasks, compared to state-of-the-art single-task and multi-task learning methods.

Via

Access Paper or Ask Questions

Regional Correlation Aided Mobile Traffic Prediction with Spatiotemporal Deep Learning

Dec 11, 2023

JeongJun Park, Lusungu J. Mwasinga, Huigyu Yang, Syed M. Raza, Duc-Tai Le, Moonseong Kim, Min Young Chung, Hyunseung Choo

Figure 1 for Regional Correlation Aided Mobile Traffic Prediction with Spatiotemporal Deep Learning

Figure 2 for Regional Correlation Aided Mobile Traffic Prediction with Spatiotemporal Deep Learning

Figure 3 for Regional Correlation Aided Mobile Traffic Prediction with Spatiotemporal Deep Learning

Figure 4 for Regional Correlation Aided Mobile Traffic Prediction with Spatiotemporal Deep Learning

Abstract:Mobile traffic data in urban regions shows differentiated patterns during different hours of the day. The exploitation of these patterns enables highly accurate mobile traffic prediction for proactive network management. However, recent Deep Learning (DL) driven studies have only exploited spatiotemporal features and have ignored the geographical correlations, causing high complexity and erroneous mobile traffic predictions. This paper addresses these limitations by proposing an enhanced mobile traffic prediction scheme that combines the clustering strategy of daily mobile traffic peak time and novel multi Temporal Convolutional Network with a Long Short Term Memory (multi TCN-LSTM) model. The mobile network cells that exhibit peak traffic during the same hour of the day are clustered together. Our experiments on large-scale real-world mobile traffic data show up to 28% performance improvement compared to state-of-the-art studies, which confirms the efficacy and viability of the proposed approach.

* 4 pages, 5 figures, 1 table. This paper is already accepted on IEEE Consumer Communications & Networking Conference(CCNC) 2024

Via

Access Paper or Ask Questions