Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weijia Song

ABSTRAL: Automatic Design of Multi-Agent Systems Through Iterative Refinement and Topology Optimization

Mar 24, 2026

Weijia Song, Jiashu Yue, Zhe Pang

Abstract:How should multi-agent systems be designed, and can that design knowledge be captured in a form that is inspectable, revisable, and transferable? We introduce ABSTRAL, a framework that treats MAS architecture as an evolving natural-language document, an artifact refined through contrastive trace analysis. Three findings emerge. First, we provide a precise measurement of the multi-agent coordination tax: under fixed turn budgets, ensembles achieve only 26% turn efficiency, with 66% of tasks exhausting the limit, yet still improve over single-agent baselines by discovering parallelizable task decompositions. Second, design knowledge encoded in documents transfers: topology reasoning and role templates learned on one domain provide a head start on new domains, with transferred seeds matching coldstart iteration 3 performance in a single iteration. Third, contrastive trace analysis discovers specialist roles absent from any initial design, a capability no prior system demonstrates. On SOPBench (134 bank tasks, deterministic oracle), ABSTRAL reaches 70% validation / 65.96% test pass rate with a GPT-4o backbone. We release the converged documents as inspectable design rationale.

Via

Access Paper or Ask Questions

Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Feb 28, 2024

Yuting Yang, Andrea Merlina, Weijia Song, Tiancheng Yuan, Ken Birman, Roman Vitenberg

Figure 1 for Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Figure 2 for Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Figure 3 for Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Figure 4 for Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Abstract:We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Compass, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing tasks where data dependencies will be satisfied, collocating tasks from the same job (when this will not overload the host or its GPU), and efficiently managing GPU memory. Comparison with other state of the art schedulers shows a significant reduction in completion times while requiring the same amount or even fewer resources. In one case, just half the servers were needed for processing the same workload.

Via

Access Paper or Ask Questions

Cascade: A Platform for Delay-Sensitive Edge Intelligence

Nov 29, 2023

Weijia Song, Thiago Garrett, Yuting Yang, Mingzhao Liu, Edward Tremel, Lorenzo Rosa, Andrea Merlina, Roman Vitenberg, Ken Birman

Figure 1 for Cascade: A Platform for Delay-Sensitive Edge Intelligence

Figure 2 for Cascade: A Platform for Delay-Sensitive Edge Intelligence

Figure 3 for Cascade: A Platform for Delay-Sensitive Edge Intelligence

Figure 4 for Cascade: A Platform for Delay-Sensitive Edge Intelligence

Abstract:Interactive intelligent computing applications are increasingly prevalent, creating a need for AI/ML platforms optimized to reduce per-event latency while maintaining high throughput and efficient resource management. Yet many intelligent applications run on AI/ML platforms that optimize for high throughput even at the cost of high tail-latency. Cascade is a new AI/ML hosting platform intended to untangle this puzzle. Innovations include a legacy-friendly storage layer that moves data with minimal copying and a "fast path" that collocates data and computation to maximize responsiveness. Our evaluation shows that Cascade reduces latency by orders of magnitude with no loss of throughput.

* 14 pages, 12 Figures

Via

Access Paper or Ask Questions