Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shengling Qin

EdgeBench: Unveiling Scaling Laws of Learning from Real-World Environments

Jul 06, 2026

Deyao Zhu, Xin Zhou, Shengling Qin, Xuekai Zhu, Hangliang Ding, Shu Zhong, Zixin Wen, Zhonglin Xie, Chenhui Gou, Linxuan Ren(+37 more)

Abstract:Pretraining scaling laws reveal that model capability improves predictably with data and compute. But learning from real world environments after deployment remains far less understood. Analyzing roughly 38,000 hours of agent interaction with the environment across 134 real world tasks, we find, to the best of our knowledge, the first evidence that overall performance during environment learning follows a log-sigmoid scaling law with remarkably high precision, reaching R^2 = 0.998. Across model generations, we also find that agent learning speed roughly doubles every three months. This discovery stems from EdgeBench, a suite of 134 real world tasks with ultra-long horizons, spanning scientific discovery, software engineering, combinatorial optimization, professional knowledge work, formal mathematics, and interactive games. Each task sustains at least 12 hours of continuous agent operation under rich, multilevel feedback, and is built through substantial expert effort. We publicly release 51 tasks and our full evaluation framework to accelerate the study of how agents learn from real world experience.

Via

Access Paper or Ask Questions

VLCache: Computing 2% Vision Tokens and Reusing 98% for Vision-Language Inference

Dec 18, 2025

Shengling Qin, Hao Yu, Chenxin Wu, Zheng Li, Yizhong Cao, Zhengyang Zhuge, Yuxin Zhou, Wentao Yao, Yi Zhang, Zhengheng Wang(+3 more)

Abstract:This paper presents VLCache, a cache reuse framework that exploits both Key-Value (KV) cache and encoder cache from prior multimodal inputs to eliminate costly recomputation when the same multimodal inputs recur. Unlike previous heuristic approaches, we formally identify the cumulative reuse error effect and demonstrate how to minimize the non-prefix cache reuse error effectively. We further analyze the varying importance of model layers and propose a dynamic, layer-aware recomputation strategy to balance accuracy and efficiency. Experimental results show that VLCache achieves an accuracy on par with full recomputation, while requiring only 2-5% of the tokens to compute, yielding 1.2x-16x TTFT speedups. We develop an experimental implementation of the proposed VLCache pipeline based on SGLang, enabling significantly faster inference in practical deployments.

Via

Access Paper or Ask Questions

Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge

Mar 17, 2025

Shengling Qin, Hai Wu, Hongyang Du, Kaibin Huang

Figure 1 for Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge

Figure 2 for Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge

Figure 3 for Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge

Figure 4 for Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge

Abstract:The emergence of distributed Mixture-of-Experts (DMoE) systems, which deploy expert models at edge nodes, offers a pathway to achieving connected intelligence in sixth-generation (6G) mobile networks and edge artificial intelligence (AI). However, current DMoE systems lack an effective expert selection algorithm to address the simultaneous task-expert relevance and channel diversity inherent in these systems. Traditional AI or communication systems focus on either performance or channel conditions, and direct application of these methods leads to high communication overhead or low performance. To address this, we propose the DMoE protocol to schedule the expert inference and inter-expert transmission. This protocol identifies expert selection and subcarrier allocation as key optimization problems. We formulate an expert selection problem by incorporating both AI performance and channel conditions, and further extend it to a Joint Expert and Subcarrier Allocation (JESA) problem for comprehensive AI and channel management within the DMoE framework. For the NP-hard expert selection problem, we introduce the Dynamic Expert Selection (DES) algorithm, which leverages a linear relaxation as a bounding criterion to significantly reduce search complexity. For the JESA problem, we discover a unique structural property that ensures asymptotic optimality in most scenarios. We propose an iterative algorithm that addresses subcarrier allocation as a subproblem and integrates it with the DES algorithm. The proposed framework effectively manages the tradeoff between task relevance and channel conditions through a tunable importance factor, enabling flexible adaptation to diverse scenarios. Numerical experiments validate the dual benefits of the proposed expert selection algorithm: high performance and significantly reduced cost.

Via

Access Paper or Ask Questions