Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zahra Dehghanian

3DLAND: 3D Lesion Abdominal Anomaly Localization Dataset

Feb 13, 2026

Mehran Advand, Zahra Dehghanian, Navid Faraji, Reza Barati, Seyed Amir Ahmad Safavi-Naini, Hamid R. Rabiee

Abstract:Existing medical imaging datasets for abdominal CT often lack three-dimensional annotations, multi-organ coverage, or precise lesion-to-organ associations, hindering robust representation learning and clinical applications. To address this gap, we introduce 3DLAND, a large-scale benchmark dataset comprising over 6,000 contrast-enhanced CT volumes with over 20,000 high-fidelity 3D lesion annotations linked to seven abdominal organs: liver, kidneys, pancreas, spleen, stomach, and gallbladder. Our streamlined three-phase pipeline integrates automated spatial reasoning, prompt-optimized 2D segmentation, and memory-guided 3D propagation, validated by expert radiologists with surface dice scores exceeding 0.75. By providing diverse lesion types and patient demographics, 3DLAND enables scalable evaluation of anomaly detection, localization, and cross-organ transfer learning for medical AI. Our dataset establishes a new benchmark for evaluating organ-aware 3D segmentation models, paving the way for advancements in healthcare-oriented AI. To facilitate reproducibility and further research, the 3DLAND dataset and implementation code are publicly available at https://mehrn79.github.io/3DLAND.

Via

Access Paper or Ask Questions

ReDiF: Reinforced Distillation for Few Step Diffusion

Dec 28, 2025

Amirhossein Tighkhorshid, Zahra Dehghanian, Gholamali Aminian, Chengchun Shi, Hamid R. Rabiee

Abstract:Distillation addresses the slow sampling problem in diffusion models by creating models with smaller size or fewer steps that approximate the behavior of high-step teachers. In this work, we propose a reinforcement learning based distillation framework for diffusion models. Instead of relying on fixed reconstruction or consistency losses, we treat the distillation process as a policy optimization problem, where the student is trained using a reward signal derived from alignment with the teacher's outputs. This RL driven approach dynamically guides the student to explore multiple denoising paths, allowing it to take longer, optimized steps toward high-probability regions of the data distribution, rather than relying on incremental refinements. Our framework utilizes the inherent ability of diffusion models to handle larger steps and effectively manage the generative process. Experimental results show that our method achieves superior performance with significantly fewer inference steps and computational resources compared to existing distillation techniques. Additionally, the framework is model agnostic, applicable to any type of diffusion models with suitable reward functions, providing a general optimization paradigm for efficient diffusion learning.

Via

Access Paper or Ask Questions

RealDrag: The First Dragging Benchmark with Real Target Image

Dec 13, 2025

Ahmad Zafarani, Zahra Dehghanian, Mohammadreza Davoodi, Mohsen Shadroo, MohammadAmin Fazli, Hamid R. Rabiee

Abstract:The evaluation of drag based image editing models is unreliable due to a lack of standardized benchmarks and metrics. This ambiguity stems from inconsistent evaluation protocols and, critically, the absence of datasets containing ground truth target images, making objective comparisons between competing methods difficult. To address this, we introduce \textbf{RealDrag}, the first comprehensive benchmark for point based image editing that includes paired ground truth target images. Our dataset contains over 400 human annotated samples from diverse video sources, providing source/target images, handle/target points, editable region masks, and descriptive captions for both the image and the editing action. We also propose four novel, task specific metrics: Semantical Distance (SeD), Outer Mask Preserving Score (OMPS), Inner Patch Preserving Score (IPPS), and Directional Similarity (DiS). These metrics are designed to quantify pixel level matching fidelity, check preservation of non edited (out of mask) regions, and measure semantic alignment with the desired task. Using this benchmark, we conduct the first large scale systematic analysis of the field, evaluating 17 SOTA models. Our results reveal clear trade offs among current approaches and establish a robust, reproducible baseline to guide future research. Our dataset and evaluation toolkit will be made publicly available.

Via

Access Paper or Ask Questions

CineLOG: A Training Free Approach for Cinematic Long Video Generation

Dec 13, 2025

Zahra Dehghanian, Morteza Abolghasemi, Hamid Beigy, Hamid R. Rabiee

Abstract:Controllable video synthesis is a central challenge in computer vision, yet current models struggle with fine grained control beyond textual prompts, particularly for cinematic attributes like camera trajectory and genre. Existing datasets often suffer from severe data imbalance, noisy labels, or a significant simulation to real gap. To address this, we introduce CineLOG, a new dataset of 5,000 high quality, balanced, and uncut video clips. Each entry is annotated with a detailed scene description, explicit camera instructions based on a standard cinematic taxonomy, and genre label, ensuring balanced coverage across 17 diverse camera movements and 15 film genres. We also present our novel pipeline designed to create this dataset, which decouples the complex text to video (T2V) generation task into four easier stages with more mature technology. To enable coherent, multi shot sequences, we introduce a novel Trajectory Guided Transition Module that generates smooth spatio-temporal interpolation. Extensive human evaluations show that our pipeline significantly outperforms SOTA end to end T2V models in adhering to specific camera and screenplay instructions, while maintaining professional visual quality. All codes and data are available at https://cine-log.pages.dev.

Via

Access Paper or Ask Questions

Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS

Dec 08, 2025

Mahta Fetrat, Donya Navabi, Zahra Dehghanian, Morteza Abolghasemi, Hamid R. Rabiee

Abstract:Lightweight, real-time text-to-speech systems are crucial for accessibility. However, the most efficient TTS models often rely on lightweight phonemizers that struggle with context-dependent challenges. In contrast, more advanced phonemizers with a deeper linguistic understanding typically incur high computational costs, which prevents real-time performance. This paper examines the trade-off between phonemization quality and inference speed in G2P-aided TTS systems, introducing a practical framework to bridge this gap. We propose lightweight strategies for context-aware phonemization and a service-oriented TTS architecture that executes these modules as independent services. This design decouples heavy context-aware components from the core TTS engine, effectively breaking the latency barrier and enabling real-time use of high-quality phonemization models. Experimental results confirm that the proposed system improves pronunciation soundness and linguistic accuracy while maintaining real-time responsiveness, making it well-suited for offline and end-device TTS applications.

Via

Access Paper or Ask Questions

Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models

May 19, 2025

Mahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee

Abstract:Homograph disambiguation remains a significant challenge in grapheme-to-phoneme (G2P) conversion, especially for low-resource languages. This challenge is twofold: (1) creating balanced and comprehensive homograph datasets is labor-intensive and costly, and (2) specific disambiguation strategies introduce additional latency, making them unsuitable for real-time applications such as screen readers and other accessibility tools. In this paper, we address both issues. First, we propose a semi-automated pipeline for constructing homograph-focused datasets, introduce the HomoRich dataset generated through this pipeline, and demonstrate its effectiveness by applying it to enhance a state-of-the-art deep learning-based G2P system for Persian. Second, we advocate for a paradigm shift - utilizing rich offline datasets to inform the development of fast, rule-based methods suitable for latency-sensitive accessibility applications like screen readers. To this end, we improve one of the most well-known rule-based G2P systems, eSpeak, into a fast homograph-aware version, HomoFast eSpeak. Our results show an approximate 30% improvement in homograph disambiguation accuracy for the deep learning-based and eSpeak systems.

* 8 main body pages, total 25 pages, 15 figures

Via

Access Paper or Ask Questions

UPL: Uncertainty-aware Pseudo-labeling for Imbalance Transductive Node Classification

Feb 02, 2025

Mohammad T. Teimuri, Zahra Dehghanian, Gholamali Aminian, Hamid R. Rabiee

Figure 1 for UPL: Uncertainty-aware Pseudo-labeling for Imbalance Transductive Node Classification

Figure 2 for UPL: Uncertainty-aware Pseudo-labeling for Imbalance Transductive Node Classification

Figure 3 for UPL: Uncertainty-aware Pseudo-labeling for Imbalance Transductive Node Classification

Figure 4 for UPL: Uncertainty-aware Pseudo-labeling for Imbalance Transductive Node Classification

Abstract:Graph-structured datasets often suffer from class imbalance, which complicates node classification tasks. In this work, we address this issue by first providing an upper bound on population risk for imbalanced transductive node classification. We then propose a simple and novel algorithm, Uncertainty-aware Pseudo-labeling (UPL). Our approach leverages pseudo-labels assigned to unlabeled nodes to mitigate the adverse effects of imbalance on classification accuracy. Furthermore, the UPL algorithm enhances the accuracy of pseudo-labeling by reducing training noise of pseudo-labels through a novel uncertainty-aware approach. We comprehensively evaluate the UPL algorithm across various benchmark datasets, demonstrating its superior performance compared to existing state-of-the-art methods.

Via

Access Paper or Ask Questions

Cueless EEG imagined speech for subject identification: dataset and benchmarks

Jan 16, 2025

Ali Derakhshesh, Zahra Dehghanian, Reza Ebrahimpour, Hamid R. Rabiee

Figure 1 for Cueless EEG imagined speech for subject identification: dataset and benchmarks

Figure 2 for Cueless EEG imagined speech for subject identification: dataset and benchmarks

Figure 3 for Cueless EEG imagined speech for subject identification: dataset and benchmarks

Figure 4 for Cueless EEG imagined speech for subject identification: dataset and benchmarks

Abstract:Electroencephalogram (EEG) signals have emerged as a promising modality for biometric identification. While previous studies have explored the use of imagined speech with semantically meaningful words for subject identification, most have relied on additional visual or auditory cues. In this study, we introduce a cueless EEG-based imagined speech paradigm, where subjects imagine the pronunciation of semantically meaningful words without any external cues. This innovative approach addresses the limitations of prior methods by requiring subjects to select and imagine words from a predefined list naturally. The dataset comprises over 4,350 trials from 11 subjects across five sessions. We assess a variety of classification methods, including traditional machine learning techniques such as Support Vector Machines (SVM) and XGBoost, as well as time-series foundation models and deep learning architectures specifically designed for EEG classification, such as EEG Conformer and Shallow ConvNet. A session-based hold-out validation strategy was employed to ensure reliable evaluation and prevent data leakage. Our results demonstrate outstanding classification accuracy, reaching 97.93%. These findings highlight the potential of cueless EEG paradigms for secure and reliable subject identification in real-world applications, such as brain-computer interfaces (BCIs).

Via

Access Paper or Ask Questions

Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

Oct 05, 2024

Aref Tabatabaei, Zahra Dehghanian, Maryam Amirmazlaghani

Figure 1 for Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

Figure 2 for Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

Figure 3 for Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

Figure 4 for Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

Abstract:Artifacts often degrade the visual quality of virtual try-on (VTON) and pose transfer applications, impacting user experience. This study introduces a novel conditional inpainting technique designed to detect and remove such distortions, improving image aesthetics. Our work is the first to present an end-to-end framework addressing this specific issue, and we developed a specialized dataset of artifacts in VTON and pose transfer tasks, complete with masks highlighting the affected areas. Experimental results show that our method not only effectively removes artifacts but also significantly enhances the visual quality of the final images, setting a new benchmark in computer vision and image processing.

Via

Access Paper or Ask Questions

LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

Sep 13, 2024

Mahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee

Figure 1 for LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

Figure 2 for LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

Figure 3 for LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

Figure 4 for LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

Abstract:Grapheme-to-phoneme (G2P) conversion is critical in speech processing, particularly for applications like speech synthesis. G2P systems must possess linguistic understanding and contextual awareness of languages with polyphone words and context-dependent phonemes. Large language models (LLMs) have recently demonstrated significant potential in various language tasks, suggesting that their phonetic knowledge could be leveraged for G2P. In this paper, we evaluate the performance of LLMs in G2P conversion and introduce prompting and post-processing methods that enhance LLM outputs without additional training or labeled data. We also present a benchmarking dataset designed to assess G2P performance on sentence-level phonetic challenges of the Persian language. Our results show that by applying the proposed methods, LLMs can outperform traditional G2P tools, even in an underrepresented language like Persian, highlighting the potential of developing LLM-aided G2P systems.

* 5 pages, 5 figures

Via

Access Paper or Ask Questions