Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zahra Ahmadi

LLM-Based Scientific Peer Review: Methods, Benchmarks, and Reliability Challenges

Jun 23, 2026

Thi Huyen Nguyen, Zahra Ahmadi

Abstract:The rapid growth of scientific submissions has pushed traditional peer review toward its scalability limits, motivating the exploration of large language models (LLMs) as intelligent automated evaluation assistants. Although recent studies show that LLMs can generate fluent critiques and approximate reviewer scores, their reliability, robustness, and security as decision-support systems remain insufficiently understood. This survey offers a systems-level analysis of LLM-based scientific peer review, focusing on two core evaluative functions: critique generation and score prediction. We present a structured taxonomy of modeling approaches (including prompt-based, supervised, retrieval-augmented, and alignment-optimized approaches), and synthesize empirical findings across existing benchmarks. We analyze dataset constraints, evaluation shortcomings, and domain concentration biases that limit current assessment practices. Beyond performance metrics, we identify emerging robustness risks, including prompt injection, data poisoning, retrieval vulnerabilities, and reward hacking, which expose automated review pipelines to strategic manipulation. From a data mining perspective, we outline key open challenges in modeling subjective disagreement and cross-domain generalization. By reframing automated peer review as a high-stakes, multi-objective decision problem, this survey provides a roadmap for developing robust, transparent, and trustworthy AI-assisted scientific evaluation systems.

Via

Access Paper or Ask Questions

Cross-Modal-Domain Generalization Through Semantically Aligned Discrete Representations

May 13, 2026

Souptik Sen, Raneen Younis, Zahra Ahmadi

Abstract:Multimodal learning seeks to integrate information across diverse sensory sources, yet current approaches struggle to balance cross-modal generalizability with modality-specific structure. Continuous (implicit) methods preserve fine-grained priors but render generalization challenging, while discrete (explicit) approaches enforce shared prototypes at the expense of modality specificity. We introduce CoDAAR (Cross-modal Discrete Alignment And Reconstruction), a novel framework that resolves this long-standing trade-off by establishing semantic consensus across modality-specific codebooks through index-level alignment. This design uniquely allows CoDAAR to preserve modality-unique structures while achieving generalizable cross-modal representations within a unified discrete space. CoDAAR combines two complementary mechanisms: Discrete Temporal Alignment (DTA), which enables fine-grained temporal quantization, and Cascading Semantic Alignment (CSA), which promotes progressive cross-modal semantic agreement. Together, they establish a competition-free unified representation space. Trained with self-supervised reconstruction objectives on paired multimodal sequences, CoDAAR demonstrates robust cross-modal and cross-domain generalization. Across Cross-Modal Generalization benchmarks, including event classification, localization, video segmentation, and cross-dataset transfer, CoDAAR achieves state-of-the-art performance, establishing a new paradigm for discrete and generalizable multimodal representation learning.

* Added missing affiliation for co-author R. Younis and Z. Ahmadi

Via

Access Paper or Ask Questions

Sheaf Diffusion with Adaptive Local Structure for Spatio-Temporal Forecasting

Apr 13, 2026

Abeer Mostafa, Raneen Younis, Zahra Ahmadi

Abstract:Spatio-temporal systems often exhibit highly heterogeneous and non-intuitive responses to localized disruptions, limiting the effectiveness of conventional message passing approaches in modeling higher-order interactions under local heterogeneity. This paper reformulates spatio-temporal forecasting as the problem of learning information flow over locally structured spaces, rather than propagating globally aligned node representations. We introduce a spatio-temporal sheaf diffusion graph neural network (ST-Sheaf GNN) that embeds graph topology into sheaf-theoretic vector spaces connected by learned linear restriction maps. Unlike prior work that relies on static or globally shared transformations, our model learns dynamic restriction maps that evolve over time and adapt to local spatio-temporal patterns to enable substantially more expressive interactions. By explicitly modeling latent local structure, the proposed framework efficiently mitigates the oversmoothing phenomenon in deep GNN architectures. We evaluate our framework on a diverse set of real-world spatio-temporal forecasting benchmarks spanning multiple domains. Experimental results demonstrate state-of-the-art performance, highlighting the effectiveness of sheaf-theoretic topological representations as a powerful foundation for spatio-temporal graph learning. The code is available at: https://anonymous.4open.science/r/ST-SheafGNN-6523/.

Via

Access Paper or Ask Questions

From Literature to Hypotheses: An AI Co-Scientist System for Biomarker-Guided Drug Combination Hypothesis Generation

Feb 28, 2026

Raneen Younis, Suvinava Basak, Lukas Chavez, Zahra Ahmadi

Abstract:The rapid growth of biomedical literature and curated databases has made it increasingly difficult for researchers to systematically connect biomarker mechanisms to actionable drug combination hypotheses. We present AI Co-Scientist (CoDHy), an interactive, human-in-the-loop system for biomarker-guided drug combination hypothesis generation in cancer research. CoDHy integrates structured biomedical databases and unstructured literature evidence into a task-specific knowledge graph, which serves as the basis for graph-based reasoning and hypothesis construction. The system combines knowledge graph embeddings with agent-based reasoning to generate, validate, and rank candidate drug combinations, while explicitly grounding each hypothesis in retrievable evidence. Through a web-based interface, users can configure the scientific context, inspect intermediate results, and iteratively refine hypotheses, enabling transparent and researcher-steerable exploration rather than automated decision-making. We demonstrate CoDHy as a system for exploratory hypothesis generation and decision support in translational oncology, highlighting its design, interaction workflow, and practical use cases.

Via

Access Paper or Ask Questions

Orthogonalized Multimodal Contrastive Learning with Asymmetric Masking for Structured Representations

Feb 16, 2026

Carolin Cissee, Raneen Younis, Zahra Ahmadi

Abstract:Multimodal learning seeks to integrate information from heterogeneous sources, where signals may be shared across modalities, specific to individual modalities, or emerge only through their interaction. While self-supervised multimodal contrastive learning has achieved remarkable progress, most existing methods predominantly capture redundant cross-modal signals, often neglecting modality-specific (unique) and interaction-driven (synergistic) information. Recent extensions broaden this perspective, yet they either fail to explicitly model synergistic interactions or learn different information components in an entangled manner, leading to incomplete representations and potential information leakage. We introduce \textbf{COrAL}, a principled framework that explicitly and simultaneously preserves redundant, unique, and synergistic information within multimodal representations. COrAL employs a dual-path architecture with orthogonality constraints to disentangle shared and modality-specific features, ensuring a clean separation of information components. To promote synergy modeling, we introduce asymmetric masking with complementary view-specific patterns, compelling the model to infer cross-modal dependencies rather than rely solely on redundant cues. Extensive experiments on synthetic benchmarks and diverse MultiBench datasets demonstrate that COrAL consistently matches or outperforms state-of-the-art methods while exhibiting low performance variance across runs. These results indicate that explicitly modeling the full spectrum of multimodal information yields more stable, reliable, and comprehensive embeddings.

Via

Access Paper or Ask Questions

Cross-Modal Binary Attention: An Energy-Efficient Fusion Framework for Audio-Visual Learning

Jan 31, 2026

Mohamed Saleh, Zahra Ahmadi

Abstract:Effective multimodal fusion requires mechanisms that can capture complex cross-modal dependencies while remaining computationally scalable for real-world deployment. Existing audio-visual fusion approaches face a fundamental trade-off: attention-based methods effectively model cross-modal relationships but incur quadratic computational complexity that prevents hierarchical, multi-scale architectures, while efficient fusion strategies rely on simplistic concatenation that fails to extract complementary cross-modal information. We introduce CMQKA, a novel cross-modal fusion mechanism that achieves linear O(N) complexity through efficient binary operations, enabling scalable hierarchical fusion previously infeasible with conventional attention. CMQKA employs bidirectional cross-modal Query-Key attention to extract complementary spatiotemporal features and uses learnable residual fusion to preserve modality-specific characteristics while enriching representations with cross-modal information. Building upon CMQKA, we present SNNergy, an energy-efficient multimodal fusion framework with a hierarchical architecture that processes inputs through progressively decreasing spatial resolutions and increasing semantic abstraction. This multi-scale fusion capability allows the framework to capture both local patterns and global context across modalities. Implemented with event-driven binary spike operations, SNNergy achieves remarkable energy efficiency while maintaining fusion effectiveness and establishing new state-of-the-art results on challenging audio-visual benchmarks, including CREMA-D, AVE, and UrbanSound8K-AV, significantly outperforming existing multimodal fusion baselines. Our framework advances multimodal fusion by introducing a scalable fusion mechanism that enables hierarchical cross-modal integration with practical energy efficiency for real-world audio-visual intelligence systems.

Via

Access Paper or Ask Questions

Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning

Nov 19, 2025

Raneen Younis, Louay Hamdi, Lukas Chavez, Zahra Ahmadi

Figure 1 for Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning

Figure 2 for Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning

Figure 3 for Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning

Figure 4 for Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning

Abstract:Whole-slide images are central to digital pathology, yet their extreme size and scarce annotations make self-supervised learning essential. Masked Autoencoders (MAEs) with Vision Transformer backbones have recently shown strong potential for histopathology representation learning. However, conventional random patch sampling during MAE pretraining often includes irrelevant or noisy regions, limiting the model's ability to capture meaningful tissue patterns. In this paper, we present a lightweight and domain-adapted framework that brings structure and biological relevance into MAE-based learning through a wavelet-informed patch selection strategy. WISE-MAE applies a two-step coarse-to-fine process: wavelet-based screening at low magnification to locate structurally rich regions, followed by high-resolution extraction for detailed modeling. This approach mirrors the diagnostic workflow of pathologists and improves the quality of learned representations. Evaluations across multiple cancer datasets, including lung, renal, and colorectal tissues, show that WISE-MAE achieves competitive representation quality and downstream classification performance while maintaining efficiency under weak supervision.

Via

Access Paper or Ask Questions

OpenReviewer: A Specialized Large Language Model for Generating Critical Scientific Paper Reviews

Dec 16, 2024

Maximilian Idahl, Zahra Ahmadi

Figure 1 for OpenReviewer: A Specialized Large Language Model for Generating Critical Scientific Paper Reviews

Figure 2 for OpenReviewer: A Specialized Large Language Model for Generating Critical Scientific Paper Reviews

Figure 3 for OpenReviewer: A Specialized Large Language Model for Generating Critical Scientific Paper Reviews

Figure 4 for OpenReviewer: A Specialized Large Language Model for Generating Critical Scientific Paper Reviews

Abstract:We present OpenReviewer, an open-source system for generating high-quality peer reviews of machine learning and AI conference papers. At its core is Llama-OpenReviewer-8B, an 8B parameter language model specifically fine-tuned on 79,000 expert reviews from top ML conferences. Given a PDF paper submission and review template as input, OpenReviewer extracts the full text, including technical content like equations and tables, and generates a structured review following conference-specific guidelines. Our evaluation on 400 test papers shows that OpenReviewer produces significantly more critical and realistic reviews compared to general-purpose LLMs like GPT-4 and Claude-3.5. While other LLMs tend toward overly positive assessments, OpenReviewer's recommendations closely match the distribution of human reviewer ratings. The system provides authors with rapid, constructive feedback to improve their manuscripts before submission, though it is not intended to replace human peer review. OpenReviewer is available as an online demo and open-source tool.

* Demo: https://huggingface.co/spaces/maxidl/openreviewer Model: https://huggingface.co/maxidl/Llama-OpenReviewer-8B

Via

Access Paper or Ask Questions

DAMMI:Daily Activities in a Psychologically Annotated Multi-Modal IoT dataset

Oct 05, 2024

Mohsen Falah Rad, Kamrad Khoshhal Roudposhti, Mohammad Hassan Khoobkar, Mohsen Shirali, Zahra Ahmadi, Carlos Fernandez-Llatas

Abstract:The growth in the elderly population and the shift in the age pyramid have increased the demand for healthcare and well-being services. To address this concern, alongside the rising cost of medical care, the concept of ageing at home has emerged, driven by recent advances in medical and technological solutions. Experts in computer science, communication technology, and healthcare have collaborated to develop affordable health solutions by employing sensors in living environments, wearable devices, and smartphones, in association with advanced data mining and intelligent systems with learning capabilities, to monitor, analyze, and predict the health status of elderly individuals. However, implementing intelligent healthcare systems and developing analytical techniques requires testing and evaluating algorithms on real-world data. Despite the need, there is a shortage of publicly available datasets that meet these requirements. To address this gap, we present the DAMMI dataset in this work, designed to support researchers in the field. The dataset includes daily activity data of an elderly individual collected via home-installed sensors, smartphone data, and a wristband over 146 days. It also contains daily psychological reports provided by a team of psychologists. Furthermore, the data collection spans significant events such as the COVID-19 pandemic, New Year's holidays, and the religious month of Ramadan, offering additional opportunities for analysis. In this paper, we outline detailed information about the data collection system, the types of data recorded, and pre-processed event logs. This dataset is intended to assist professionals in IoT and data mining in evaluating and implementing their research ideas.

* 14 pages

Via

Access Paper or Ask Questions

LLM-based event abstraction and integration for IoT-sourced logs

Sep 05, 2024

Mohsen Shirali, Mohammadreza Fani Sani, Zahra Ahmadi, Estefania Serral

Abstract:The continuous flow of data collected by Internet of Things (IoT) devices, has revolutionised our ability to understand and interact with the world across various applications. However, this data must be prepared and transformed into event data before analysis can begin. In this paper, we shed light on the potential of leveraging Large Language Models (LLMs) in event abstraction and integration. Our approach aims to create event records from raw sensor readings and merge the logs from multiple IoT sources into a single event log suitable for further Process Mining applications. We demonstrate the capabilities of LLMs in event abstraction considering a case study for IoT application in elderly care and longitudinal health monitoring. The results, showing on average an accuracy of 90% in detecting high-level activities. These results highlight LLMs' promising potential in addressing event abstraction and integration challenges, effectively bridging the existing gap.

* 12 pages

Via

Access Paper or Ask Questions