Abstract:The role and use of race within health-related artificial intelligence and machine learning (AI/ML) models has sparked increasing attention and controversy. Despite the complexity and breadth of related issues, a robust and holistic framework to guide stakeholders in their examination and resolution remains lacking. This perspective provides a broad-based, systematic, and cross-cutting landscape analysis of race-related challenges, structured around the AI/ML lifecycle and framed through "points to consider" to support inquiry and decision-making.
Abstract:Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human productivity by completing an open-ended space of user queries. However, current agents face significant challenges: imprecise grounding of GUI elements, difficulties with long-horizon task planning, and performance bottlenecks from relying on single generalist models for diverse cognitive tasks. To this end, we introduce Agent S2, a novel compositional framework that delegates cognitive responsibilities across various generalist and specialist models. We propose a novel Mixture-of-Grounding technique to achieve precise GUI localization and introduce Proactive Hierarchical Planning, dynamically refining action plans at multiple temporal scales in response to evolving observations. Evaluations demonstrate that Agent S2 establishes new state-of-the-art (SOTA) performance on three prominent computer use benchmarks. Specifically, Agent S2 achieves 18.9% and 32.7% relative improvements over leading baseline agents such as Claude Computer Use and UI-TARS on the OSWorld 15-step and 50-step evaluation. Moreover, Agent S2 generalizes effectively to other operating systems and applications, surpassing previous best methods by 52.8% on WindowsAgentArena and by 16.52% on AndroidWorld relatively. Code available at https://github.com/simular-ai/Agent-S.
Abstract:In recent years, Large Language Models (LLMs) have demonstrated remarkable abilities in various natural language processing tasks. However, adapting these models to specialized domains using private datasets stored on resource-constrained edge devices, such as smartphones and personal computers, remains challenging due to significant privacy concerns and limited computational resources. Existing model adaptation methods either compromise data privacy by requiring data transmission or jeopardize model privacy by exposing proprietary LLM parameters. To address these challenges, we propose Prada, a novel privacy-preserving and efficient black-box LLM adaptation system using private on-device datasets. Prada employs a lightweight proxy model fine-tuned with Low-Rank Adaptation (LoRA) locally on user devices. During inference, Prada leverages the logits offset, i.e., difference in outputs between the base and adapted proxy models, to iteratively refine outputs from a remote black-box LLM. This offset-based adaptation approach preserves both data privacy and model privacy, as there is no need to share sensitive data or proprietary model parameters. Furthermore, we incorporate speculative decoding to further speed up the inference process of Prada, making the system practically deployable on bandwidth-constrained edge devices, enabling a more practical deployment of Prada. Extensive experiments on various downstream tasks demonstrate that Prada achieves performance comparable to centralized fine-tuning methods while significantly reducing computational overhead by up to 60% and communication costs by up to 80%.
Abstract:The use of large language models (LLMs) to simulate human behavior has gained significant attention, particularly through personas that approximate individual characteristics. Persona-based simulations hold promise for transforming disciplines that rely on population-level feedback, including social science, economic analysis, marketing research, and business operations. Traditional methods to collect realistic persona data face significant challenges. They are prohibitively expensive and logistically challenging due to privacy constraints, and often fail to capture multi-dimensional attributes, particularly subjective qualities. Consequently, synthetic persona generation with LLMs offers a scalable, cost-effective alternative. However, current approaches rely on ad hoc and heuristic generation techniques that do not guarantee methodological rigor or simulation precision, resulting in systematic biases in downstream tasks. Through extensive large-scale experiments including presidential election forecasts and general opinion surveys of the U.S. population, we reveal that these biases can lead to significant deviations from real-world outcomes. Our findings underscore the need to develop a rigorous science of persona generation and outline the methodological innovations, organizational and institutional support, and empirical foundations required to enhance the reliability and scalability of LLM-driven persona simulations. To support further research and development in this area, we have open-sourced approximately one million generated personas, available for public access and analysis at https://huggingface.co/datasets/Tianyi-Lab/Personas.
Abstract:A robust full-space physical layer security (PLS) transmission scheme is proposed in this paper considering the full-space wiretapping challenge of wireless networks supported by simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS). Different from the existing schemes, the proposed PLS scheme takes account of the uncertainty on the eavesdropper's position within the 360$^\circ$ service area offered by the STAR-RIS. Specifically, the large system analytical method is utilized to derive the asymptotic expression of the average security rate achieved by the security user, considering that the base station (BS) only has the statistical information of the eavesdropper's channel state information (CSI) and the uncertainty of its location. To evaluate the effectiveness of the proposed PLS scheme, we first formulate an optimization problem aimed at maximizing the weighted sum rate of the security user and the public user. This optimization is conducted under the power allocation constraint, and some practical limitations for STAR-RIS implementation, through jointly designing the active and passive beamforming variables. A novel iterative algorithm based on the minimum mean-square error (MMSE) and cross-entropy optimization (CEO) methods is proposed to effectively address the established non-convex optimization problem with discrete variables. Simulation results indicate that the proposed robust PLS scheme can effectively mitigate the information leakage across the entire coverage area of the STAR-RIS-assisted system, leading to superior performance gain when compared to benchmark schemes encompassing traditional RIS-aided scheme.
Abstract:FastMap was first introduced in the Data Mining community for generating Euclidean embeddings of complex objects. In this dissertation, we first present FastMap to generate Euclidean embeddings of graphs in near-linear time: The pairwise Euclidean distances approximate a desired graph-based distance function on the vertices. We then apply the graph version of FastMap to efficiently solve various graph-theoretic problems of significant interest in AI: including facility location, top-K centrality computations, community detection and block modeling, and graph convex hull computations. We also present a novel learning framework, called FastMapSVM, by combining FastMap and Support Vector Machines. We then apply FastMapSVM to predict the satisfiability of Constraint Satisfaction Problems and to classify seismograms in Earthquake Science.
Abstract:The Mixture of Experts (MoE) is an effective architecture for scaling large language models by leveraging sparse expert activation, optimizing the trade-off between performance and efficiency. However, under expert parallelism, MoE suffers from inference inefficiencies due to imbalanced token-to-expert assignment, where some experts are overloaded while others remain underutilized. This imbalance leads to poor resource utilization and increased latency, as the most burdened expert dictates the overall delay, a phenomenon we define as the \textbf{\textit{Straggler Effect}}. To mitigate this, we propose Capacity-Aware Inference, including two key techniques: (1) \textbf{\textit{Capacity-Aware Token Drop}}, which discards overloaded tokens to regulate the maximum latency of MoE, and (2) \textbf{\textit{Capacity-Aware Token Reroute}}, which reallocates overflowed tokens to underutilized experts, balancing the token distribution. These techniques collectively optimize both high-load and low-load expert utilization, leading to a more efficient MoE inference pipeline. Extensive experiments demonstrate the effectiveness of our methods, showing significant improvements in inference efficiency, e.g., 0.2\% average performance increase and a 1.94$\times$ inference speedup on Mixtral-8$\times$7B-Instruct.
Abstract:Autoregressive models have emerged as a powerful framework for modeling exchangeable sequences - i.i.d. observations when conditioned on some latent factor - enabling direct modeling of uncertainty from missing data (rather than a latent). Motivated by the critical role posterior inference plays as a subroutine in decision-making (e.g., active learning, bandits), we study the inferential and architectural inductive biases that are most effective for exchangeable sequence modeling. For the inference stage, we highlight a fundamental limitation of the prevalent single-step generation approach: inability to distinguish between epistemic and aleatoric uncertainty. Instead, a long line of works in Bayesian statistics advocates for multi-step autoregressive generation; we demonstrate this "correct approach" enables superior uncertainty quantification that translates into better performance on downstream decision-making tasks. This naturally leads to the next question: which architectures are best suited for multi-step inference? We identify a subtle yet important gap between recently proposed Transformer architectures for exchangeable sequences (Muller et al., 2022; Nguyen & Grover, 2022; Ye & Namkoong, 2024), and prove that they in fact cannot guarantee exchangeability despite introducing significant computational overhead. We illustrate our findings using controlled synthetic settings, demonstrating how custom architectures can significantly underperform standard causal masks, underscoring the need for new architectural innovations.
Abstract:Humanoid robots, capable of assuming human roles in various workplaces, have become essential to the advancement of embodied intelligence. However, as robots with complex physical structures, learning a control model that can operate robustly across diverse environments remains inherently challenging, particularly under the discrepancies between training and deployment environments. In this study, we propose HWC-Loco, a robust whole-body control algorithm tailored for humanoid locomotion tasks. By reformulating policy learning as a robust optimization problem, HWC-Loco explicitly learns to recover from safety-critical scenarios. While prioritizing safety guarantees, overly conservative behavior can compromise the robot's ability to complete the given tasks. To tackle this challenge, HWC-Loco leverages a hierarchical policy for robust control. This policy can dynamically resolve the trade-off between goal-tracking and safety recovery, guided by human behavior norms and dynamic constraints. To evaluate the performance of HWC-Loco, we conduct extensive comparisons against state-of-the-art humanoid control models, demonstrating HWC-Loco's superior performance across diverse terrains, robot structures, and locomotion tasks under both simulated and real-world environments.




Abstract:Achieving large-scale aerial swarms is challenging due to the inherent contradictions in balancing computational efficiency and scalability. This paper introduces Primitive-Swarm, an ultra-lightweight and scalable planner designed specifically for large-scale autonomous aerial swarms. The proposed approach adopts a decentralized and asynchronous replanning strategy. Within it is a novel motion primitive library consisting of time-optimal and dynamically feasible trajectories. They are generated utlizing a novel time-optimial path parameterization algorithm based on reachability analysis (TOPP-RA). Then, a rapid collision checking mechanism is developed by associating the motion primitives with the discrete surrounding space according to conflicts. By considering both spatial and temporal conflicts, the mechanism handles robot-obstacle and robot-robot collisions simultaneously. Then, during a replanning process, each robot selects the safe and minimum cost trajectory from the library based on user-defined requirements. Both the time-optimal motion primitive library and the occupancy information are computed offline, turning a time-consuming optimization problem into a linear-complexity selection problem. This enables the planner to comprehensively explore the non-convex, discontinuous 3-D safe space filled with numerous obstacles and robots, effectively identifying the best hidden path. Benchmark comparisons demonstrate that our method achieves the shortest flight time and traveled distance with a computation time of less than 1 ms in dense environments. Super large-scale swarm simulations, involving up to 1000 robots, running in real-time, verify the scalability of our method. Real-world experiments validate the feasibility and robustness of our approach. The code will be released to foster community collaboration.