Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mushfiqur Rahman

Less Is More? Selective Visual Attention to High-Importance Regions for Multimodal Radiology Summarization

Mar 31, 2026

Mst. Fahmida Sultana Naznin, Adnan Ibney Faruq, Mushfiqur Rahman, Niloy Kumar Mondal, Md. Mehedi Hasan Shawon, Md Rakibul Hasan

Abstract:Automated radiology report summarization aims to distill verbose findings into concise clinical impressions, but existing multimodal models often struggle with visual noise and fail to meaningfully improve over strong text-only baselines in the FINDINGS $\to$ IMPRESSION transformation. We challenge two prevailing assumptions: (1) that more visual input is always better, and (2) that multimodal models add limited value when findings already contain rich image-derived detail. Through controlled ablations on MIMIC-CXR benchmark, we show that selectively focusing on pathology-relevant visual patches rather than full images yields substantially better performance. We introduce ViTAS, Visual-Text Attention Summarizer, a multi-stage pipeline that combines ensemble-guided MedSAM2 lung segmentation, bidirectional cross-attention for multi-view fusion, Shapley-guided adaptive patch clustering, and hierarchical visual tokenization feeding a ViT. ViTAS achieves SOTA results with 29.25% BLEU-4 and 69.83% ROUGE-L, improved factual alignment in qualitative analysis, and the highest expert-rated human evaluation scores. Our findings demonstrate that less but more relevant visual input is not only sufficient but superior for multimodal radiology summarization.

Via

Access Paper or Ask Questions

3D Spectrum Awareness for Radio Dynamic Zones Using Kriging and Matrix Completion

Mar 11, 2026

Mushfiqur Rahman, Sung Joon Maeng, Ismail Guvenc, Chau-Wai Wong

Abstract:Radio Dynamic Zones (RDZs) are geographically defined areas specifically allocated for testing new wireless technologies. It is essential to safeguard the regular spectrum users outside the zones from the interference caused by the deployed equipment within this zone. Previous works have utilized sparse reference signal received power (RSRP) measurements collected by unmanned aerial vehicles (UAVs) to construct a dense 3D radio map through ordinary Kriging. In this work, we illustrate that matrix completion can outperform ordinary Kriging. We partitioned a 2D area of interest into small square grids where each grid corresponds to a single entry of a matrix. The matrix completion algorithm learns the global structure of the radio environment map by leveraging the low-rank property of propagation maps. Additionally, we illustrate that the simple Kriging and trans-Gaussian Kriging yield better results when the density of known measurements is lower. Earlier works of RSRP prediction involved a training dataset at a single altitude. In this work, we also show that performance can be improved by utilizing a combined dataset from multiple altitudes.

* 2024 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2024, pp. 439-446
* Published in IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2024

Via

Access Paper or Ask Questions

UAV-Based 3D Spectrum Sensing: Insights on Altitude, Bandwidth, Trajectory, and Effective Antenna Patterns on REM Reconstruction

Mar 11, 2026

Mushfiqur Rahman, Sung Joon Maeng, Ismail Guvenc, Chau-Wai Wong, Mihail Sichitiu, Jason A. Abrahamson, Arupjyoti Bhuyan

Abstract:Spectrum sensing and the generation of 3D Radio Environment Maps (REMs) are essential for enabling spectrum sharing within cognitive radio networks. While Uncrewed Aerial Vehicles (UAVs) offer high-mobility 3D sensing, REM accuracy is challenged by dynamic flight behaviors, where fluctuations in UAV speed and direction introduce measurement inconsistencies. Furthermore, the structural influence of the airframe itself impacts the onboard antenna's radiation characteristics. In this paper, we present a comprehensive analysis of REM reconstruction at various altitudes, using real-world data from a fixed base station tower and a ground-vehicle source. We evaluate diverse reconstruction methodologies, including Kriging (simple, ordinary, and trans-Gaussian), matrix completion, and Gaussian process regression (GPR) for recovery from sparse samples. Our results indicate that simple Kriging and GPR remain more robust under extreme sample sparsity. We also propose a framework to enhance reconstruction accuracy in deep-shadowed regions by decomposing the REM into distinct smooth and deep-shadowed spatial components. We further investigate how REM reconstruction performance is influenced by physical and UAV-related external parameters. First, we demonstrate that the impact of UAV altitude on accuracy follows a tri-phasic trend: an initial performance gain up to $h_1$, a performance dip between $h_1$ and $h_2$, and a final stage of increasing accuracy. Additionally, we show that performance improves with increased spectrum bandwidth. Second, our analysis of UAV trajectories reveals that the variance of shadow fading exhibits a non-monotonic trend, peaking at both very low and mid-high elevation angles. Finally, we demonstrate that antenna pattern calibration from in-field measurements significantly enhances REM reconstruction accuracy by accounting for shadowing induced by the UAV airframe.

* Submitted to IEEE Sensors Journal

Via

Access Paper or Ask Questions

Bengali-Loop: Community Benchmarks for Long-Form Bangla ASR and Speaker Diarization

Feb 15, 2026

H. M. Shadman Tabib, Istiak Ahmmed Rifti, Abdullah Muhammed Amimul Ehsan, Somik Dasgupta, Md Zim Mim Siddiqee Sowdha, Abrar Jahin Sarker, Md. Rafiul Islam Nijamy, Tanvir Hossain, Mst. Metaly Khatun, Munzer Mahmood(+17 more)

Abstract:Bengali (Bangla) remains under-resourced in long-form speech technology despite its wide use. We present Bengali-Loop, two community benchmarks to address this gap: (1) a long-form ASR corpus of 191 recordings (158.6 hours, 792k words) from 11 YouTube channels, collected via a reproducible subtitle-extraction pipeline and human-in-the-loop transcript verification; and (2) a speaker diarization corpus of 24 recordings (22 hours, 5,744 annotated segments) with fully manual speaker-turn labels in CSV format. Both benchmarks target realistic multi-speaker, long-duration content (e.g., Bangla drama/natok). We establish baselines (Tugstugi: 34.07% WER; pyannote.audio: 40.08% DER) and provide standardized evaluation protocols (WER/CER, DER), annotation rules, and data formats to support reproducible benchmarking and future model development for Bangla long-form ASR and diarization.

Via

Access Paper or Ask Questions

SpectraSentinel: LightWeight Dual-Stream Real-Time Drone Detection, Tracking and Payload Identification

Jul 30, 2025

Shahriar Kabir, Istiak Ahmmed Rifti, H. M. Shadman Tabib, Mushfiqur Rahman, Sadatul Islam Sadi, Hasnaen Adil, Ahmed Mahir Sultan Rumi, Ch Md Rakin Haider

Abstract:The proliferation of drones in civilian airspace has raised urgent security concerns, necessitating robust real-time surveillance systems. In response to the 2025 VIP Cup challenge tasks - drone detection, tracking, and payload identification - we propose a dual-stream drone monitoring framework. Our approach deploys independent You Only Look Once v11-nano (YOLOv11n) object detectors on parallel infrared (thermal) and visible (RGB) data streams, deliberately avoiding early fusion. This separation allows each model to be specifically optimized for the distinct characteristics of its input modality, addressing the unique challenges posed by small aerial objects in diverse environmental conditions. We customize data preprocessing and augmentation strategies per domain - such as limiting color jitter for IR imagery - and fine-tune training hyperparameters to enhance detection performance under conditions of heavy noise, low light, and motion blur. The resulting lightweight YOLOv11n models demonstrate high accuracy in distinguishing drones from birds and in classifying payload types, all while maintaining real-time performance. This report details the rationale for a dual-modality design, the specialized training pipelines, and the architectural optimizations that collectively enable efficient and accurate drone surveillance across RGB and IR channels.

Via

Access Paper or Ask Questions

UAV-Assisted Coverage Hole Detection Using Reinforcement Learning in Urban Cellular Networks

Mar 09, 2025

Mushfiqur Rahman, Ismail Guvenc, David Ramirez, Chau-Wai Wong

Abstract:Deployment of cellular networks in urban areas requires addressing various challenges. For example, high-rise buildings with varying geometrical shapes and heights contribute to signal attenuation, reflection, diffraction, and scattering effects. This creates a high possibility of coverage holes (CHs) within the proximity of the buildings. Detecting these CHs is critical for network operators to ensure quality of service, as customers in such areas experience weak or no signal reception. To address this challenge, we propose an approach using an autonomous vehicle, such as an unmanned aerial vehicle (UAV), to detect CHs, for minimizing drive test efforts and reducing human labor. The UAV leverages reinforcement learning (RL) to find CHs using stored local building maps, its current location, and measured signal strengths. As the UAV moves, it dynamically updates its knowledge of the signal environment and its direction to a nearby CH while avoiding collisions with buildings. We created a wide range of testing scenarios using building maps from OpenStreetMap and signal strength data generated by NVIDIA Sionna raytracing simulations. The results demonstrate that the RL-based approach performs better than non-machine learning, geometry-based methods in detecting CHs in urban areas. Additionally, even with a limited number of UAV measurements, the method achieves performance close to theoretical upper bounds that assume complete knowledge of all signal strengths.

* Accepted at the ICC 2025 Workshop on 6G Connected Robotics for Collaborative Control, Sensing, and Communication

Via

Access Paper or Ask Questions

Individualized Deepfake Detection Exploiting Traces Due to Double Neural-Network Operations

Dec 13, 2023

Mushfiqur Rahman, Runze Liu, Chau-Wai Wong, Huaiyu Dai

Abstract:In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial images of individual public figures. We propose to condition the proposed detector on the identity of the identified individual given the advantages revealed by our theory-driven simulations. While most detectors in the literature rely on perceptible or imperceptible artifacts present in deepfake facial images, we demonstrate that the detection performance can be improved by exploiting the idempotency property of neural networks. In our approach, the training process involves double neural-network operations where we pass an authentic image through a deepfake simulating network twice. Experimental results show that the proposed method improves the area under the curve (AUC) from 0.92 to 0.94 and reduces its standard deviation by 17\%. For evaluating the detection performance of individual public figures, a facial image dataset with individuals' names is required, a criterion not met by the current deepfake datasets. To address this, we curated a dataset comprising 32k images featuring 45 public figures, which we intend to release to the public after the paper is published.

Via

Access Paper or Ask Questions

Bengali Document Layout Analysis with Detectron2

Aug 26, 2023

Md Ataullha, Mahedi Hassan Rabby, Mushfiqur Rahman, Tahsina Bintay Azam

Figure 1 for Bengali Document Layout Analysis with Detectron2

Figure 2 for Bengali Document Layout Analysis with Detectron2

Figure 3 for Bengali Document Layout Analysis with Detectron2

Figure 4 for Bengali Document Layout Analysis with Detectron2

Abstract:Document digitization is vital for preserving historical records, efficient document management, and advancing OCR (Optical Character Recognition) research. Document Layout Analysis (DLA) involves segmenting documents into meaningful units like text boxes, paragraphs, images, and tables. Challenges arise when dealing with diverse layouts, historical documents, and unique scripts like Bengali, hindered by the lack of comprehensive Bengali DLA datasets. We improved the accuracy of the DLA model for Bengali documents by utilizing advanced Mask R-CNN models available in the Detectron2 library. Our evaluation involved three variants: Mask R-CNN R-50, R-101, and X-101, both with and without pretrained weights from PubLayNet, on the BaDLAD dataset, which contains human-annotated Bengali documents in four categories: text boxes, paragraphs, images, and tables. Results show the effectiveness of these models in accurately segmenting Bengali documents. We discuss speed-accuracy tradeoffs and underscore the significance of pretrained weights. Our findings expand the applicability of Mask R-CNN in document layout analysis, efficient document management, and OCR research while suggesting future avenues for fine-tuning and data augmentation.

* DL Sprint 2.0 - BUET CSE Fest 2023, 4 pages, 2 figures, 2 tables

Via

Access Paper or Ask Questions

Contact-Free Simultaneous Sensing of Human Heart Rate and Canine Breathing Rate for Animal Assisted Interactions

Nov 11, 2022

Timothy Holder, Mushfiqur Rahman, Emily Summers, David Roberts, Chau-Wai Wong, Alper Bozkurt

Figure 1 for Contact-Free Simultaneous Sensing of Human Heart Rate and Canine Breathing Rate for Animal Assisted Interactions

Figure 2 for Contact-Free Simultaneous Sensing of Human Heart Rate and Canine Breathing Rate for Animal Assisted Interactions

Figure 3 for Contact-Free Simultaneous Sensing of Human Heart Rate and Canine Breathing Rate for Animal Assisted Interactions

Figure 4 for Contact-Free Simultaneous Sensing of Human Heart Rate and Canine Breathing Rate for Animal Assisted Interactions

Abstract:Animal Assisted Interventions (AAIs) involve pleasant interactions between humans and animals and can potentially benefit both types of participants. Research in this field may help to uncover universal insights about cross-species bonding, dynamic affect detection, and the influence of environmental factors on dyadic interactions. However, experiments evaluating these outcomes are limited to methodologies that are qualitative, subjective, and cumbersome due to the ergonomic challenges related to attaching sensors to the body. Current approaches in AAIs also face challenges when translating beyond controlled clinical environments or research contexts. These also often neglect the measurements from the animal throughout the interaction. Here, we present our preliminary effort toward a contact-free approach to facilitate AAI assessment via the physiological sensing of humans and canines using consumer-grade cameras. This initial effort focuses on verifying the technological feasibility of remotely sensing the heart rate signal of the human subject and the breathing rate signal of the dog subject while they are interacting. Small amounts of motion such as patting and involuntary body shaking or movement can be tolerated with our custom designed vision-based algorithms. The experimental results show that the physiological measurements obtained by our algorithms were consistent with those provided by the standard reference devices. With further validation and expansion to other physiological parameters, the presented approach offers great promise for many scenarios from the AAI research space to veterinary, surgical, and clinical applications.

* ACM International Conference on Animal-Computer Interaction, Newcastle upon Tyne, UK, 5-8 Dec 2022

Via

Access Paper or Ask Questions