Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ben Murrell

Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet

Generative vector search to improve pathology foundation models across multimodal vision-language tasks

Dec 22, 2025

Markus Ekvall, Ludvig Bergenstråhle, Patrick Truong, Ben Murrell, Joakim Lundeberg

Abstract:Retrieval-augmented generation improves large language models by grounding outputs in external knowledge sources, reducing hallucinations and addressing knowledge cutoffs. However, standard embedding-based retrieval fails to capture the complexity of multi-concept queries, particularly in domains like biomedicine, where biological data are inherently high-dimensional. For example,omics datasets, and clinical reports simultaneously exhibit numerous molecular, cellular, and physiological features. We present Stochastic Latent Matching (STHLM), a generative vector search method that samples query-conditioned embeddings from text or image inputs to enhance retrieval performance. Analogous to how Chain-of-Thought reasoning enables language models to "think longer" on complex problems, STHLM allows retrieval systems to "search wider" through iterative sampling. STHLM demonstrates critical improvements over classical vector retrieval across diverse benchmarks, including scientific literature, clinical notes, and tissue images, boosting retrieval performance by 10-30% through test-time compute (trading latency for accuracy), while enabling up to a 10-fold compression of embedding dimensions.

* 13 pages main (54 total), 2 main figures (9 total)

Via

Access Paper or Ask Questions

Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

Nov 12, 2025

Hedwig Nora Nordlinder, Lukas Billera, Jack Collier Ryder, Anton Oresten, Aron Stålmarck, Theodor Mosetti Björk, Ben Murrell

Figure 1 for Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

Figure 2 for Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

Figure 3 for Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

Figure 4 for Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

Abstract:Diffusion and flow matching approaches to generative modeling have shown promise in domains where the state space is continuous, such as image generation or protein folding & design, and discrete, exemplified by diffusion large language models. They offer a natural fit when the number of elements in a state is fixed in advance (e.g. images), but require ad hoc solutions when, for example, the length of a response from a large language model, or the number of amino acids in a protein chain is not known a priori. Here we propose Branching Flows, a generative modeling framework that, like diffusion and flow matching approaches, transports a simple distribution to the data distribution. But in Branching Flows, the elements in the state evolve over a forest of binary trees, branching and dying stochastically with rates that are learned by the model. This allows the model to control, during generation, the number of elements in the sequence. We also show that Branching Flows can compose with any flow matching base process on discrete sets, continuous Euclidean spaces, smooth manifolds, and `multimodal' product spaces that mix these components. We demonstrate this in three domains: small molecule generation (multimodal), antibody sequence generation (discrete), and protein backbone generation (multimodal), and show that Branching Flows is a capable distribution learner with a stable learning objective, and that it enables new capabilities.

* 30 pages, 10 figures

Via

Access Paper or Ask Questions