Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Madeleine Udell

AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews

Aug 19, 2024

Keith Tyser, Ben Segev, Gaston Longhitano, Xin-Yu Zhang, Zachary Meeks, Jason Lee, Uday Garg, Nicholas Belsten, Avi Shporer, Madeleine Udell(+2 more)

Figure 1 for AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews

Figure 2 for AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews

Figure 3 for AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews

Figure 4 for AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews

Abstract:Automatic reviewing helps handle a large volume of papers, provides early feedback and quality control, reduces bias, and allows the analysis of trends. We evaluate the alignment of automatic paper reviews with human reviews using an arena of human preferences by pairwise comparisons. Gathering human preference may be time-consuming; therefore, we also use an LLM to automatically evaluate reviews to increase sample efficiency while reducing bias. In addition to evaluating human and LLM preferences among LLM reviews, we fine-tune an LLM to predict human preferences, predicting which reviews humans will prefer in a head-to-head battle between LLMs. We artificially introduce errors into papers and analyze the LLM's responses to identify limitations, use adaptive review questions, meta prompting, role-playing, integrate visual and textual analysis, use venue-specific reviewing materials, and predict human preferences, improving upon the limitations of the traditional review processes. We make the reviews of publicly available arXiv and open-access Nature journal papers available online, along with a free service which helps authors review and revise their research papers and improve their quality. This work develops proof-of-concept LLM reviewing systems that quickly deliver consistent, high-quality reviews and evaluate their quality. We mitigate the risks of misuse, inflated review scores, overconfident ratings, and skewed score distributions by augmenting the LLM with multiple documents, including the review form, reviewer guide, code of ethics and conduct, area chair guidelines, and previous year statistics, by finding which errors and shortcomings of the paper may be detected by automated reviews, and evaluating pairwise reviewer preferences. This work identifies and addresses the limitations of using LLMs as reviewers and evaluators and enhances the quality of the reviewing process.

* 42 pages

Via

Access Paper or Ask Questions

OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale

Jul 29, 2024

Ali AhmadiTeshnizi, Wenzhi Gao, Herman Brunborg, Shayan Talaei, Madeleine Udell

Figure 1 for OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale

Figure 2 for OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale

Figure 3 for OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale

Figure 4 for OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale

Abstract:Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. We introduce a Large Language Model (LLM)-based system designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. Our system is capable of developing mathematical models, writing and debugging solver code, evaluating the generated solutions, and improving efficiency and correctness of its model and code based on these evaluations. OptiMUS-0.3 utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS-0.3 outperforms existing state-of-the-art methods on easy datasets by more than 12% and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than 8%.

* This paper documents OptiMUS-0.3, improving on OptiMUS-0.1 (arXiv:2310.06116) and OptiMUS-0.2 (arXiv:2402.10172). arXiv admin note: text overlap with arXiv:2402.10172

Via

Access Paper or Ask Questions

Have ASkotch: Fast Methods for Large-scale, Memory-constrained Kernel Ridge Regression

Jul 14, 2024

Pratik Rathore, Zachary Frangella, Madeleine Udell

Abstract:Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems that range from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, it is challenging to scale KRR solvers to large datasets: with $n$ training points, a direct solver (i.e., Cholesky decomposition) uses $O(n^2)$ storage and $O(n^3)$ flops. Iterative methods for KRR, such as preconditioned conjugate gradient (PCG), avoid the cubic scaling of direct solvers and often use low-rank preconditioners; a rank $r$ preconditioner uses $O(rn)$ storage and each iteration requires $O(n^2)$ flops. To reduce the storage and iteration complexity of iterative solvers for KRR, we propose ASkotch ($\textbf{A}$ccelerated $\textbf{s}$calable $\textbf{k}$ernel $\textbf{o}$p$\textbf{t}$imization using block $\textbf{c}$oordinate descent with $\textbf{H}$essian preconditioning). For a given block size $|b| << n$, each iteration of ASkotch uses $O(r|b| + n)$ storage and $O(n|b|)$ flops, so ASkotch scales better than Cholesky decomposition and PCG. We prove that ASkotch obtains linear convergence to the optimum, with the convergence rate depending on the square roots of the $\textit{preconditioned}$ block condition numbers. Furthermore, we solve KRR problems that were considered to be impossibly large while using limited computational resources: we show that ASkotch outperforms PCG methods with respect to generalization error on large-scale KRR (up to $n = 10^8$) and KRR classification tasks (up to $n = 10^7$) while running each of our experiments on $\textit{a single 12 GB Titan V GPU}$. Our work opens up the possibility of as-yet-unimagined applications of KRR across a wide range of disciplines.

* 40 pages (including appendices), 14 figures, 3 tables

Via

Access Paper or Ask Questions

Interpretable Prediction and Feature Selection for Survival Analysis

Apr 23, 2024

Mike Van Ness, Madeleine Udell

Abstract:Survival analysis is widely used as a technique to model time-to-event data when some data is censored, particularly in healthcare for predicting future patient risk. In such settings, survival models must be both accurate and interpretable so that users (such as doctors) can trust the model and understand model predictions. While most literature focuses on discrimination, interpretability is equally as important. A successful interpretable model should be able to describe how changing each feature impacts the outcome, and should only use a small number of features. In this paper, we present DyS (pronounced ``dice''), a new survival analysis model that achieves both strong discrimination and interpretability. DyS is a feature-sparse Generalized Additive Model, combining feature selection and interpretable prediction into one model. While DyS works well for all survival analysis problems, it is particularly useful for large (in $n$ and $p$) survival datasets such as those commonly found in observational healthcare studies. Empirical studies show that DyS competes with other state-of-the-art machine learning models for survival analysis, while being highly interpretable.

Via

Access Paper or Ask Questions

OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models

Feb 15, 2024

Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell

Abstract:Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. This paper introduces OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. OptiMUS can develop mathematical models, write and debug solver code, evaluate the generated solutions, and improve its model and code based on these evaluations. OptiMUS utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS outperforms existing state-of-the-art methods on easy datasets by more than $20\%$ and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than $30\%$.

Via

Access Paper or Ask Questions

Challenges in Training PINNs: A Loss Landscape Perspective

Feb 02, 2024

Pratik Rathore, Weimu Lei, Zachary Frangella, Lu Lu, Madeleine Udell

Figure 1 for Challenges in Training PINNs: A Loss Landscape Perspective

Figure 2 for Challenges in Training PINNs: A Loss Landscape Perspective

Figure 3 for Challenges in Training PINNs: A Loss Landscape Perspective

Figure 4 for Challenges in Training PINNs: A Loss Landscape Perspective

Abstract:This paper explores challenges in training Physics-Informed Neural Networks (PINNs), emphasizing the role of the loss landscape in the training process. We examine difficulties in minimizing the PINN loss function, particularly due to ill-conditioning caused by differential operators in the residual term. We compare gradient-based optimizers Adam, L-BFGS, and their combination Adam+L-BFGS, showing the superiority of Adam+L-BFGS, and introduce a novel second-order optimizer, NysNewton-CG (NNCG), which significantly improves PINN performance. Theoretically, our work elucidates the connection between ill-conditioned differential operators and ill-conditioning in the PINN loss and shows the benefits of combining first- and second-order optimization methods. Our work presents valuable insights and more powerful optimization strategies for training PINNs, which could improve the utility of PINNs for solving difficult partial differential equations.

* 30 pages (including appendices), 8 figures, 2 tables

Via

Access Paper or Ask Questions

Interpretable Survival Analysis for Heart Failure Risk Prediction

Oct 24, 2023

Mike Van Ness, Tomas Bosschieter, Natasha Din, Andrew Ambrosy, Alexander Sandhu, Madeleine Udell

Figure 1 for Interpretable Survival Analysis for Heart Failure Risk Prediction

Figure 2 for Interpretable Survival Analysis for Heart Failure Risk Prediction

Figure 3 for Interpretable Survival Analysis for Heart Failure Risk Prediction

Figure 4 for Interpretable Survival Analysis for Heart Failure Risk Prediction

Abstract:Survival analysis, or time-to-event analysis, is an important and widespread problem in healthcare research. Medical research has traditionally relied on Cox models for survival analysis, due to their simplicity and interpretability. Cox models assume a log-linear hazard function as well as proportional hazards over time, and can perform poorly when these assumptions fail. Newer survival models based on machine learning avoid these assumptions and offer improved accuracy, yet sometimes at the expense of model interpretability, which is vital for clinical use. We propose a novel survival analysis pipeline that is both interpretable and competitive with state-of-the-art survival models. Specifically, we use an improved version of survival stacking to transform a survival analysis problem to a classification problem, ControlBurn to perform feature selection, and Explainable Boosting Machines to generate interpretable predictions. To evaluate our pipeline, we predict risk of heart failure using a large-scale EHR database. Our pipeline achieves state-of-the-art performance and provides interesting and novel insights about risk factors for heart failure.

Via

Access Paper or Ask Questions

OptiMUS: Optimization Modeling Using mip Solvers and large language models

Oct 09, 2023

Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell

Figure 1 for OptiMUS: Optimization Modeling Using mip Solvers and large language models

Figure 2 for OptiMUS: Optimization Modeling Using mip Solvers and large language models

Figure 3 for OptiMUS: Optimization Modeling Using mip Solvers and large language models

Figure 4 for OptiMUS: Optimization Modeling Using mip Solvers and large language models

Abstract:Optimization problems are pervasive across various sectors, from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers, as the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. We introduce OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve MILP problems from their natural language descriptions. OptiMUS is capable of developing mathematical models, writing and debugging solver code, developing tests, and checking the validity of generated solutions. To benchmark our agent, we present NLP4LP, a novel dataset of linear programming (LP) and mixed integer linear programming (MILP) problems. Our experiments demonstrate that OptiMUS is able to solve 67\% more problems compared to a basic LLM prompting strategy. OptiMUS code and NLP4LP dataset are available at \href{https://github.com/teshnizi/OptiMUS}{https://github.com/teshnizi/OptiMUS}

Via

Access Paper or Ask Questions

PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

Sep 15, 2023

Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell

Figure 1 for PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

Figure 2 for PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

Figure 3 for PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

Figure 4 for PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

Abstract:This paper introduces PROMISE ($\textbf{Pr}$econditioned Stochastic $\textbf{O}$ptimization $\textbf{M}$ethods by $\textbf{I}$ncorporating $\textbf{S}$calable Curvature $\textbf{E}$stimates), a suite of sketching-based preconditioned stochastic gradient algorithms for solving large-scale convex optimization problems arising in machine learning. PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha; each algorithm comes with a strong theoretical analysis and effective default hyperparameter values. In contrast, traditional stochastic gradient methods require careful hyperparameter tuning to succeed, and degrade in the presence of ill-conditioning, a ubiquitous phenomenon in machine learning. Empirically, we verify the superiority of the proposed algorithms by showing that, using default hyperparameter values, they outperform or match popular tuned stochastic gradient optimizers on a test bed of $51$ ridge and logistic regression problems assembled from benchmark machine learning repositories. On the theoretical side, this paper introduces the notion of quadratic regularity in order to establish linear convergence of all proposed methods even when the preconditioner is updated infrequently. The speed of linear convergence is determined by the quadratic regularity ratio, which often provides a tighter bound on the convergence rate compared to the condition number, both in theory and in practice, and explains the fast global linear convergence of the proposed methods.

* 128 pages, 31 Figures

Via

Access Paper or Ask Questions

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Jun 24, 2023

Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh(+5 more)

Figure 1 for Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Figure 2 for Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Figure 3 for Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Figure 4 for Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Abstract:We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education.

* Did not receive permission to release the data or model fine-tuned on the data

Via

Access Paper or Ask Questions