Abstract:The Model Context Protocol (MCP) has emerged as a transformative standard for connecting large language models (LLMs) with external data sources and tools, and has been rapidly adopted across personal applications and development platforms. However, existing benchmarks predominantly focus on generic information-seeking tools and fail to capture the practical challenges posed by personal social applications, where tools interact with individual accounts or local databases. To bridge this critical gap, we introduce MCP-Persona, the first benchmark specifically designed for evaluating agent performance on real-world, personalized MCP tools. MCP-Persona encompasses a diverse set of widely-used applications, ranging from social media platforms like Reddit and Xiaohongshu (Rednote) to enterprise collaboration suites such as Lark (Feishu) and Slack. Our extensive experiments on various state-of-the-art (SOTA) agents demonstrate their significant struggles with personalized tool use, thereby highlighting the benchmark's crucial role in identifying and addressing these limitations. MCP-Persona is publicly available at https://github.com/wwh0411/MCP-Persona}{https://github.com/wwh0411/MCP-Persona.
Abstract:We study the problem of learning Gaussian mixture models under overparameterization. Prior work has shown that while overparameterization is essential for avoiding spurious local optima and enables global recovery of the ground-truth model using the gradient-EM (expectation-maximization) algorithm, it can dramatically slow down the local rate of convergence. Under certain assumptions on the mixture weights, we show that a standard divergence measure minimized by statistical learning procedures possesses a manifold of slow growth on which the well-known Polyak stepsize reduces the loss geometrically, and design a gradient-based method that converges to minimizers at a locally linear rate. Additionally, we show that our method converges to nearly optimal solutions -- up to a natural misspecification threshold -- for mixtures with arbitrary weights. At a high level, the method alternates between several "short" gradient descent steps that approach the manifold and "long" Polyak steps that contract the distance to minimizers. Our results suggest that slow convergence is not an intrinsic challenge of overparameterization, but can be overcome by exploiting the favorable structure of the loss landscape.




Abstract:The COVID-19 pandemic has affected travel behaviors and transportation system operations, and cities are grappling with what policies can be effective for a phased reopening shaped by social distancing. This edition of the white paper updates travel trends and highlights an agent-based simulation model's results to predict the impact of proposed phased reopening strategies. It also introduces a real-time video processing method to measure social distancing through cameras on city streets.




Abstract:The COVID-19 outbreak has dramatically changed travel behavior in affected cities. The C2SMART research team has been investigating the impact of COVID-19 on mobility and sociability. New York City (NYC) and Seattle, two of the cities most affected by COVID-19 in the U.S. were included in our initial study. An all-in-one dashboard with data mining and cloud computing capabilities was developed for interactive data analytics and visualization to facilitate the understanding of the impact of the outbreak and corresponding policies such as social distancing on transportation systems. This platform is updated regularly and continues to evolve with the addition of new data, impact metrics, and visualizations to assist public and decision-makers to make informed decisions. This paper presents the architecture of the COVID related mobility data dashboard and preliminary mobility and sociability metrics for NYC and Seattle.