Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chaeyun Jang

Parallel Test-Time Scaling with Multi-Sequence Verifiers

Mar 03, 2026

Yegon Kim, Seungyoo Lee, Chaeyun Jang, Hyungi Lee, Juho Lee

Abstract:Parallel test-time scaling, which generates multiple candidate solutions for a single problem, is a powerful technique for improving large language model performance. However, it is hindered by two key bottlenecks: accurately selecting the correct solution from the candidate pool, and the high inference latency from generating many full solutions. We argue that both challenges are fundamentally linked to verifier calibration. A well-calibrated verifier not only improves answer selection, but also enables early-stopping strategies to reduce latency. However, existing verifiers are limited as they score each candidate in isolation, overlooking rich contextual information across the set of candidates. To address this, we introduce the Multi-Sequence Verifier (MSV), the first verifier designed to jointly process all candidate solutions and model their interactions. MSV achieves improved calibration, which directly enhances best-of-N selection performance. We further introduce a streaming MSV variant that empowers a novel early-stopping framework. Our novel framework fully leverages parallel decoding, which contrasts with the existing multi-sequence early exit works that decode sequences one by one and thus incur significant latency. In this novel setting, MSV can achieve the same target accuracy with around half the latency that would be required with its counterpart that scores each solution in isolation.

Via

Access Paper or Ask Questions

Dimension Agnostic Neural Processes

Feb 28, 2025

Hyungi Lee, Chaeyun Jang, Dongbok Lee, Juho Lee

Figure 1 for Dimension Agnostic Neural Processes

Figure 2 for Dimension Agnostic Neural Processes

Figure 3 for Dimension Agnostic Neural Processes

Figure 4 for Dimension Agnostic Neural Processes

Abstract:Meta-learning aims to train models that can generalize to new tasks with limited labeled data by extracting shared features across diverse task datasets. Additionally, it accounts for prediction uncertainty during both training and evaluation, a concept known as uncertainty-aware meta-learning. Neural Process(NP) is a well-known uncertainty-aware meta-learning method that constructs implicit stochastic processes using parametric neural networks, enabling rapid adaptation to new tasks. However, existing NP methods face challenges in accommodating diverse input dimensions and learned features, limiting their broad applicability across regression tasks. To address these limitations and advance the utility of NP models as general regressors, we introduce Dimension Agnostic Neural Processes(DANP). DANP incorporates Dimension Aggregator Block(DAB) to transform input features into a fixed-dimensional space, enhancing the model's ability to handle diverse datasets. Furthermore, leveraging the Transformer architecture and latent encoding layers, DANP learns a wider range of features that are generalizable across various tasks. Through comprehensive experimentation on various synthetic and practical regression tasks, we empirically show that DANP outperforms previous NP variations, showcasing its effectiveness in overcoming the limitations of traditional NP models and its potential for broader applicability in diverse regression scenarios.

* 10 pages, 5 figures, Accepted to ICLR 2025 (International Conference on Learning Representations)

Via

Access Paper or Ask Questions

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Nov 11, 2024

Chaeyun Jang, Hyungi Lee, Jungtaek Kim, Juho Lee

Figure 1 for Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Figure 2 for Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Figure 3 for Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Figure 4 for Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Abstract:Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains. Despite its conceptual simplicity, fine-tuning entails several troublesome engineering choices, such as selecting hyperparameters and determining checkpoints from an optimization trajectory. To tackle the difficulty of choosing the best model, one effective solution is model fusion, which combines multiple models in a parameter space. However, we observe a large discrepancy between loss and metric landscapes during the fine-tuning of pre-trained language models. Building on this observation, we introduce a novel model fusion technique that optimizes both the desired metric and loss through multi-objective Bayesian optimization. In addition, to effectively select hyperparameters, we establish a two-stage procedure by integrating Bayesian optimization processes into our framework. Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.

Via

Access Paper or Ask Questions