Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Muhammad Ishfaq Hussain

Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning

Nov 13, 2025

Zubia Naz, Farhan Asghar, Muhammad Ishfaq Hussain, Yahya Hadadi, Muhammad Aasim Rafique, Wookjin Choi, Moongu Jeon

Abstract:Automated medical image captioning translates complex radiological images into diagnostic narratives that can support reporting workflows. We present a Swin-BART encoder-decoder system with a lightweight regional attention module that amplifies diagnostically salient regions before cross-attention. Trained and evaluated on ROCO, our model achieves state-of-the-art semantic fidelity while remaining compact and interpretable. We report results as mean$\pm$std over three seeds and include $95\%$ confidence intervals. Compared with baselines, our approach improves ROUGE (proposed 0.603, ResNet-CNN 0.356, BLIP2-OPT 0.255) and BERTScore (proposed 0.807, BLIP2-OPT 0.645, ResNet-CNN 0.623), with competitive BLEU, CIDEr, and METEOR. We further provide ablations (regional attention on/off and token-count sweep), per-modality analysis (CT/MRI/X-ray), paired significance tests, and qualitative heatmaps that visualize the regions driving each description. Decoding uses beam search (beam size $=4$), length penalty $=1.1$, $no\_repeat\_ngram\_size$ $=3$, and max length $=128$. The proposed design yields accurate, clinically phrased captions and transparent regional attributions, supporting safe research use with a human in the loop.

Via

Access Paper or Ask Questions

3D Multi-Object Tracking Employing MS-GLMB Filter for Autonomous Driving

Oct 19, 2024

Linh Van Ma, Muhammad Ishfaq Hussain, Kin-Choong Yow, Moongu Jeon

Figure 1 for 3D Multi-Object Tracking Employing MS-GLMB Filter for Autonomous Driving

Figure 2 for 3D Multi-Object Tracking Employing MS-GLMB Filter for Autonomous Driving

Figure 3 for 3D Multi-Object Tracking Employing MS-GLMB Filter for Autonomous Driving

Figure 4 for 3D Multi-Object Tracking Employing MS-GLMB Filter for Autonomous Driving

Abstract:The MS-GLMB filter offers a robust framework for tracking multiple objects through the use of multi-sensor data. Building on this, the MV-GLMB and MV-GLMB-AB filters enhance the MS-GLMB capabilities by employing cameras for 3D multi-sensor multi-object tracking, effectively addressing occlusions. However, both filters depend on overlapping fields of view from the cameras to combine complementary information. In this paper, we introduce an improved approach that integrates an additional sensor, such as LiDAR, into the MS-GLMB framework for 3D multi-object tracking. Specifically, we present a new LiDAR measurement model, along with a multi-camera and LiDAR multi-object measurement model. Our experimental results demonstrate a significant improvement in tracking performance compared to existing MS-GLMB-based methods. Importantly, our method eliminates the need for overlapping fields of view, broadening the applicability of the MS-GLMB filter. Our source code for nuScenes dataset is available at https://github.com/linh-gist/ms-glmb-nuScenes.

* 2024 International Conference on Control, Automation and Information Sciences (ICCAIS), November 26th to 28th, 2024 in Ho Chi Minh City

Via

Access Paper or Ask Questions

Adaptive Confidence Threshold for ByteTrack in Multi-Object Tracking

Dec 06, 2023

Linh Van Ma, Muhammad Ishfaq Hussain, JongHyun Park, Jeongbae Kim, Moongu Jeon

Figure 1 for Adaptive Confidence Threshold for ByteTrack in Multi-Object Tracking

Figure 2 for Adaptive Confidence Threshold for ByteTrack in Multi-Object Tracking

Figure 3 for Adaptive Confidence Threshold for ByteTrack in Multi-Object Tracking

Figure 4 for Adaptive Confidence Threshold for ByteTrack in Multi-Object Tracking

Abstract:We investigate the application of ByteTrack in the realm of multiple object tracking. ByteTrack, a simple tracking algorithm, enables the simultaneous tracking of multiple objects by strategically incorporating detections with a low confidence threshold. Conventionally, objects are initially associated with high confidence threshold detections. When the association between objects and detections becomes ambiguous, ByteTrack extends the association to lower confidence threshold detections. One notable drawback of the existing ByteTrack approach is its reliance on a fixed threshold to differentiate between high and low-confidence detections. In response to this limitation, we introduce a novel and adaptive approach. Our proposed method entails a dynamic adjustment of the confidence threshold, leveraging insights derived from overall detections. Through experimentation, we demonstrate the effectiveness of our adaptive confidence threshold technique while maintaining running time compared to ByteTrack.

* The 12th International Conference on Control, Automation and Information Sciences (ICCAIS 2023)
* The 12th International Conference on Control, Automation and Information Sciences (ICCAIS 2023), November 27th to 29th, 2023 in Hanoi

Via

Access Paper or Ask Questions