Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marc Boubnovski Martell

MechPert: Mechanistic Consensus as an Inductive Bias for Unseen Perturbation Prediction

Feb 14, 2026

Marc Boubnovski Martell, Josefa Lia Stoisser, Lawrence Phillips, Aditya Misra, Robert Kitchen, Jesper Ferkinghoff-Borg, Jialin Yu, Philip Torr, Kaspar Märten

Abstract:Predicting transcriptional responses to unseen genetic perturbations is essential for understanding gene regulation and prioritizing large-scale perturbation experiments. Existing approaches either rely on static, potentially incomplete knowledge graphs, or prompt language models for functionally similar genes, retrieving associations shaped by symmetric co-occurrence in scientific text rather than directed regulatory logic. We introduce MechPert, a lightweight framework that encourages LLM agents to generate directed regulatory hypotheses rather than relying solely on functional similarity. Multiple agents independently propose candidate regulators with associated confidence scores; these are aggregated through a consensus mechanism that filters spurious associations, producing weighted neighborhoods for downstream prediction. We evaluate MechPert on Perturb-seq benchmarks across four human cell lines. For perturbation prediction in low-data regimes ($N=50$ observed perturbations), MechPert improves Pearson correlation by up to 10.5\% over similarity-based baselines. For experimental design, MechPert-selected anchor genes outperform standard network centrality heuristics by up to 46\% in well-characterized cell lines.

Via

Access Paper or Ask Questions

Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

Apr 23, 2025

Josefa Lia Stoisser, Marc Boubnovski Martell, Julien Fauqueur

Figure 1 for Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

Figure 2 for Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

Figure 3 for Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

Figure 4 for Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

Abstract:This work reframes the Text-to-SQL task as a pathway for teaching large language models (LLMs) to reason over and manipulate tabular data--moving beyond the traditional focus on query generation. We propose a two-stage framework that leverages SQL supervision to develop transferable table reasoning capabilities. First, we synthesize detailed chain-of-thought (CoT) traces from real-world SQL queries, providing step-by-step, clause-level supervision that teaches the model how to traverse, filter, and aggregate table fields. Second, we introduce a Group Relative Policy Optimization (GRPO) reinforcement learning objective that connects SQL execution accuracy to generalizable reasoning by encouraging steps that extend beyond task-specific syntax and transfer across datasets. Empirically, our approach improves performance on standard Text-to-SQL benchmarks and achieves substantial gains on reasoning-intensive datasets such as BIRD and CRT-QA, demonstrating enhanced generalization and interpretability. Specifically, the distilled-quantized LLaMA model achieved a 20\% increase in accuracy when trained on Text-to-SQL tasks, while Qwen achieved a 5\% increase. These results suggest that SQL can serve not only as a target formalism but also as an effective scaffold for learning robust, transferable reasoning over structured data.

Via

Access Paper or Ask Questions

Deep Representation Learning of Tissue Metabolome and Computed Tomography Images Annotates Non-invasive Classification and Prognosis Prediction of NSCLC

May 26, 2023

Marc Boubnovski Martell, Kristofer Linton-Reid, Sumeet Hindocha, Mitchell Chen, OCTAPUS-AI, Paula Moreno, Marina Álvarez-Benito, Ángel Salvatierra, Richard Lee, Joram M. Posma(+2 more)

Abstract:The rich chemical information from tissue metabolomics provides a powerful means to elaborate tissue physiology or tumor characteristics at cellular and tumor microenvironment levels. However, the process of obtaining such information requires invasive biopsies, is costly, and can delay clinical patient management. Conversely, computed tomography (CT) is a clinical standard of care but does not intuitively harbor histological or prognostic information. Furthermore, the ability to embed metabolome information into CT to subsequently use the learned representation for classification or prognosis has yet to be described. This study develops a deep learning-based framework -- tissue-metabolomic-radiomic-CT (TMR-CT) by combining 48 paired CT images and tumor/normal tissue metabolite intensities to generate ten image embeddings to infer metabolite-derived representation from CT alone. In clinical NSCLC settings, we ascertain whether TMR-CT achieves state-of-the-art results in solving histology classification/prognosis tasks in an unseen international CT dataset of 742 patients. TMR-CT non-invasively determines histological classes - adenocarcinoma/ squamous cell carcinoma with an F1-score=0.78 and further asserts patients' prognosis with a c-index=0.72, surpassing the performance of radiomics models and clinical features. Additionally, our work shows the potential to generate informative biology-inspired CT-led features to explore connections between hard-to-obtain tissue metabolic profiles and routine lesion-derived image data.

Via

Access Paper or Ask Questions

Development of a Multi-Task Learning V-Net for Pulmonary Lobar Segmentation on Computed Tomography and Application to Diseased Lungs

May 11, 2021

Marc Boubnovski Martell, Mitchell Chen, Kristofer Linton-Reid, Joram M. Posma, Susan J Copley, Eric O. Aboagye

Figure 1 for Development of a Multi-Task Learning V-Net for Pulmonary Lobar Segmentation on Computed Tomography and Application to Diseased Lungs

Figure 2 for Development of a Multi-Task Learning V-Net for Pulmonary Lobar Segmentation on Computed Tomography and Application to Diseased Lungs

Figure 3 for Development of a Multi-Task Learning V-Net for Pulmonary Lobar Segmentation on Computed Tomography and Application to Diseased Lungs

Figure 4 for Development of a Multi-Task Learning V-Net for Pulmonary Lobar Segmentation on Computed Tomography and Application to Diseased Lungs

Abstract:Automated lobar segmentation allows regional evaluation of lung disease and is important for diagnosis and therapy planning. Advanced statistical workflows permitting such evaluation is a needed area within respiratory medicine; their adoption remains slow, with poor workflow accuracy. Diseased lung regions often produce high-density zones on CT images, limiting an algorithm's execution to specify damaged lobes due to oblique or lacking fissures. This impact motivated developing an improved machine learning method to segment lung lobes that utilises tracheobronchial tree information to enhance segmentation accuracy through the algorithm's spatial familiarity to define lobar extent more accurately. The method undertakes parallel segmentation of lobes and auxiliary tissues simultaneously by employing multi-task learning (MTL) in conjunction with V-Net-attention, a popular convolutional neural network in the imaging realm. In keeping with the model's adeptness for better generalisation, high performance was retained in an external dataset of patients with four distinct diseases: severe lung cancer, COVID-19 pneumonitis, collapsed lungs and Chronic Obstructive Pulmonary Disease (COPD), even though the training data included none of these cases. The benefit of our external validation test is specifically relevant since our choice includes those patients who have diagnosed lung disease with associated radiological abnormalities. To ensure equal rank is given to all segmentations in the main task we report the following performance (Dice score) on a per-segment basis: normal lungs 0.97, COPD 0.94, lung cancer 0.94, COVID-19 pneumonitis 0.94 and collapsed lung 0.92, all at p<0.05. Even segmenting lobes with large deformations on CT images, the model maintained high accuracy. The approach can be readily adopted in the clinical setting as a robust tool for radiologists.

* 13 pages, 4 figures

Via

Access Paper or Ask Questions