Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xingyou Song

Richard

Regression Language Models for Code

Sep 30, 2025

Yash Akhauri, Xingyou Song, Arissa Wongpanich, Bryan Lewandowski, Mohamed S. Abdelfattah

Abstract:We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM initialized from T5Gemma, obtains > 0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves > 0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.

Via

Access Paper or Ask Questions

Decoding-based Regression

Jan 31, 2025

Xingyou Song, Dara Bahri

Abstract:Language models have recently been shown capable of performing regression tasks wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal auto-regressive sequence models when they are applied to any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoding-based regression is as performant as traditional approaches for tabular regression tasks, while being flexible enough to capture arbitrary distributions, such as in the task of density estimation.

* Google DeepMind Technical Report, 25 pages. Code can be found at https://github.com/google-research/optformer/tree/main/optformer/decoding_regression

Via

Access Paper or Ask Questions

Understanding LLM Embeddings for Regression

Nov 22, 2024

Eric Tang, Bangding Yang, Xingyou Song

Figure 1 for Understanding LLM Embeddings for Regression

Figure 2 for Understanding LLM Embeddings for Regression

Figure 3 for Understanding LLM Embeddings for Regression

Figure 4 for Understanding LLM Embeddings for Regression

Abstract:With the rise of large language models (LLMs) for flexibly processing information as strings, a natural application is regression, specifically by preprocessing string representations into LLM embeddings as downstream features for metric prediction. In this paper, we provide one of the first comprehensive investigations into embedding-based regression and demonstrate that LLM embeddings as features can be better for high-dimensional regression tasks than using traditional feature engineering. This regression performance can be explained in part due to LLM embeddings over numeric data inherently preserving Lipschitz continuity over the feature space. Furthermore, we quantify the contribution of different model effects, most notably model size and language understanding, which we find surprisingly do not always improve regression performance.

* 15 pages, 13 figures

Via

Access Paper or Ask Questions

Predicting from Strings: Language Model Embeddings for Bayesian Optimization

Oct 15, 2024

Tung Nguyen, Qiuyi Zhang, Bangding Yang, Chansoo Lee, Jorg Bornschein, Yingjie Miao, Sagi Perel, Yutian Chen, Xingyou Song

Figure 1 for Predicting from Strings: Language Model Embeddings for Bayesian Optimization

Figure 2 for Predicting from Strings: Language Model Embeddings for Bayesian Optimization

Figure 3 for Predicting from Strings: Language Model Embeddings for Bayesian Optimization

Figure 4 for Predicting from Strings: Language Model Embeddings for Bayesian Optimization

Abstract:Bayesian Optimization is ubiquitous in the field of experimental design and blackbox optimization for improving search efficiency, but has been traditionally restricted to regression models which are only applicable to fixed search spaces and tabular input features. We propose Embed-then-Regress, a paradigm for applying in-context regression over string inputs, through the use of string embedding capabilities of pretrained language models. By expressing all inputs as strings, we are able to perform general-purpose regression for Bayesian Optimization over various domains including synthetic, combinatorial, and hyperparameter optimization, obtaining comparable results to state-of-the-art Gaussian Process-based algorithms. Code can be found at https://github.com/google-research/optformer/tree/main/optformer/embed_then_regress.

Via

Access Paper or Ask Questions

The Vizier Gaussian Process Bandit Algorithm

Aug 21, 2024

Xingyou Song, Qiuyi Zhang, Chansoo Lee, Emily Fertig, Tzu-Kuo Huang, Lior Belenki, Greg Kochanski, Setareh Ariafar, Srinivas Vasudevan, Sagi Perel(+1 more)

Figure 1 for The Vizier Gaussian Process Bandit Algorithm

Figure 2 for The Vizier Gaussian Process Bandit Algorithm

Figure 3 for The Vizier Gaussian Process Bandit Algorithm

Figure 4 for The Vizier Gaussian Process Bandit Algorithm

Abstract:Google Vizier has performed millions of optimizations and accelerated numerous research and production systems at Google, demonstrating the success of Bayesian optimization as a large-scale service. Over multiple years, its algorithm has been improved considerably, through the collective experiences of numerous research efforts and user feedback. In this technical report, we discuss the implementation details and design choices of the current default algorithm provided by Open Source Vizier. Our experiments on standardized benchmarks reveal its robustness and versatility against well-established industry baselines on multiple practical modes.

* Google DeepMind Technical Report. Code can be found in https://github.com/google/vizier

Via

Access Paper or Ask Questions

Position: Leverage Foundational Models for Black-Box Optimization

May 09, 2024

Xingyou Song, Yingtao Tian, Robert Tjarko Lange, Chansoo Lee, Yujin Tang, Yutian Chen

Figure 1 for Position: Leverage Foundational Models for Black-Box Optimization

Figure 2 for Position: Leverage Foundational Models for Black-Box Optimization

Figure 3 for Position: Leverage Foundational Models for Black-Box Optimization

Figure 4 for Position: Leverage Foundational Models for Black-Box Optimization

Abstract:Undeniably, Large Language Models (LLMs) have stirred an extraordinary wave of innovation in the machine learning research domain, resulting in substantial impact across diverse fields such as reinforcement learning, robotics, and computer vision. Their incorporation has been rapid and transformative, marking a significant paradigm shift in the field of machine learning research. However, the field of experimental design, grounded on black-box optimization, has been much less affected by such a paradigm shift, even though integrating LLMs with optimization presents a unique landscape ripe for exploration. In this position paper, we frame the field of black-box optimization around sequence-based foundation models and organize their relationship with previous literature. We discuss the most promising ways foundational language models can revolutionize optimization, which include harnessing the vast wealth of information encapsulated in free-form text to enrich task comprehension, utilizing highly flexible sequence models such as Transformers to engineer superior optimization strategies, and enhancing performance prediction over previously unseen search spaces.

* International Conference on Machine Learning (ICML) 2024

Via

Access Paper or Ask Questions

Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions

May 06, 2024

Xingyou Song, Yingtao Tian, Robert Tjarko Lange, Chansoo Lee, Yujin Tang, Yutian Chen

Figure 1 for Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions

Figure 2 for Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions

Figure 3 for Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions

Figure 4 for Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions

* Accepted to International Conference on Machine Learning (ICML) 2024

Via

Access Paper or Ask Questions

OmniPred: Language Models as Universal Regressors

Mar 04, 2024

Xingyou Song, Oscar Li, Chansoo Lee, Bangding Yang, Daiyi Peng, Sagi Perel, Yutian Chen

Abstract:Over the broad landscape of experimental design, regression has been a powerful tool to accurately predict the outcome metrics of a system or model given a set of parameters, but has been traditionally restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ evaluation data from diverse real world experiments. Using data sourced from Google Vizier, one of the largest blackbox optimization databases in the world, our extensive experiments demonstrate that through only textual representations of mathematical parameters and values, language models are capable of very precise numerical regression, and if given the opportunity to train over multiple tasks, can significantly outperform traditional regression models.

* 24 pages, 10 figures. Code can be found in https://github.com/google-research/optformer/tree/main/optformer/omnipred

Via

Access Paper or Ask Questions

Hardness of Low Rank Approximation of Entrywise Transformed Matrix Products

Nov 03, 2023

Tamas Sarlos, Xingyou Song, David Woodruff, Qiuyi, Zhang

Figure 1 for Hardness of Low Rank Approximation of Entrywise Transformed Matrix Products

Abstract:Inspired by fast algorithms in natural language processing, we study low rank approximation in the entrywise transformed setting where we want to find a good rank $k$ approximation to $f(U \cdot V)$, where $U, V^\top \in \mathbb{R}^{n \times r}$ are given, $r = O(\log(n))$, and $f(x)$ is a general scalar function. Previous work in sublinear low rank approximation has shown that if both (1) $U = V^\top$ and (2) $f(x)$ is a PSD kernel function, then there is an $O(nk^{\omega-1})$ time constant relative error approximation algorithm, where $\omega \approx 2.376$ is the exponent of matrix multiplication. We give the first conditional time hardness results for this problem, demonstrating that both conditions (1) and (2) are in fact necessary for getting better than $n^{2-o(1)}$ time for a relative error low rank approximation for a wide class of functions. We give novel reductions from the Strong Exponential Time Hypothesis (SETH) that rely on lower bounding the leverage scores of flat sparse vectors and hold even when the rank of the transformed matrix $f(UV)$ and the target rank are $n^{o(1)}$, and when $U = V^\top$. Furthermore, even when $f(x) = x^p$ is a simple polynomial, we give runtime lower bounds in the case when $U \neq V^\top$ of the form $\Omega(\min(n^{2-o(1)}, \Omega(2^p)))$. Lastly, we demonstrate that our lower bounds are tight by giving an $O(n \cdot \text{poly}(k, 2^p, 1/\epsilon))$ time relative error approximation algorithm and a fast $O(n \cdot \text{poly}(k, p, 1/\epsilon))$ additive error approximation using fast tensor-based sketching. Additionally, since our low rank algorithms rely on matrix-vector product subroutines, our lower bounds extend to show that computing $f(UV)W$, for even a small matrix $W$, requires $\Omega(n^{2-o(1)})$ time.

* Accepted and formatted in Neurips 2023

Via

Access Paper or Ask Questions

Discovering Adaptable Symbolic Algorithms from Scratch

Jul 31, 2023

Stephen Kelly, Daniel S. Park, Xingyou Song, Mitchell McIntire, Pranav Nashikkar, Ritam Guha, Wolfgang Banzhaf, Kalyanmoy Deb, Vishnu Naresh Boddeti, Jie Tan(+1 more)

Figure 1 for Discovering Adaptable Symbolic Algorithms from Scratch

Figure 2 for Discovering Adaptable Symbolic Algorithms from Scratch

Figure 3 for Discovering Adaptable Symbolic Algorithms from Scratch

Figure 4 for Discovering Adaptable Symbolic Algorithms from Scratch

Abstract:Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaption policies, where only model parameters are optimized, ARZ can build control algorithms with the full expressive power of a linear register machine. We evolve modular policies that tune their model parameters and alter their inference algorithm on-the-fly to adapt to sudden environmental changes. We demonstrate our method on a realistic simulated quadruped robot, for which we evolve safe control policies that avoid falling when individual limbs suddenly break. This is a challenging task in which two popular neural network baselines fail. Finally, we conduct a detailed analysis of our method on a novel and challenging non-stationary control task dubbed Cataclysmic Cartpole. Results confirm our findings that ARZ is significantly more robust to sudden environmental changes and can build simple, interpretable control policies.

* Published as a conference paper at International Conference on Intelligent Robots and Systems (IROS) 2023. See https://youtu.be/sEFP1Hay4nE for associated video file

Via

Access Paper or Ask Questions