Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinhao Kong

Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation

May 02, 2025

Jianxing Qin, Jingrong Chen, Xinhao Kong, Yongji Wu, Liang Luo, Zhaodong Wang, Ying Zhang, Tingjun Chen, Alvin R. Lebeck, Danyang Zhuo

Abstract:To accommodate ever-increasing model complexity, modern machine learning (ML) systems have to scale to large GPU clusters. Changes in ML model architecture, ML system implementation, and cluster configuration can significantly affect overall ML system performance. However, quantifying the performance impact before deployment is challenging. Existing performance estimation methods use performance modeling or static workload simulation. These techniques are not general: they requires significant human effort and computation capacity to generate training data or a workload. It is also difficult to adapt ML systems to use these techniques. This paper introduces, Phantora, a live GPU cluster simulator for performance estimation. Phantora runs minimally modified ML models and frameworks, intercepting and simulating GPU-related operations to enable high-fidelity performance estimation. Phantora overcomes several research challenges in integrating an event-driven network simulator with live system execution, and introduces a set of techniques to improve simulation speed, scalability, and accuracy. Our evaluation results show that Phantora can deliver similar estimation accuracy to the state-of-the-art workload simulation approach with only one GPU, while reducing human effort and increasing generalizability.

Via

Access Paper or Ask Questions

OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation

Mar 27, 2025

Yongxu Wang, Weiyun Yi, Xinhao Kong, Wanting Li

Figure 1 for OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation

Figure 2 for OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation

Figure 3 for OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation

Figure 4 for OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation

Abstract:With the rapid development of embodied intelligence, leveraging large-scale human data for high-level imitation learning on humanoid robots has become a focal point of interest in both academia and industry. However, applying humanoid robots to precision operation domains remains challenging due to the complexities they face in perception and control processes, the long-standing physical differences in morphology and actuation mechanisms between humanoid robots and humans, and the lack of task-relevant features obtained from egocentric vision. To address the issue of covariate shift in imitation learning, this paper proposes an imitation learning algorithm tailored for humanoid robots. By focusing on the primary task objectives, filtering out background information, and incorporating channel feature fusion with spatial attention mechanisms, the proposed algorithm suppresses environmental disturbances and utilizes a dynamic weight update strategy to significantly improve the success rate of humanoid robots in accomplishing target tasks. Experimental results demonstrate that the proposed method exhibits robustness and scalability across various typical task scenarios, providing new ideas and approaches for autonomous learning and control in humanoid robots. The project will be open-sourced on GitHub.

Via

Access Paper or Ask Questions

Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution

May 29, 2024

Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo

Figure 1 for Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution

Figure 2 for Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution

Figure 3 for Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution

Figure 4 for Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution

Abstract:The complexity of large language model (LLM) serving workloads has substantially increased due to the integration with external tool invocations, such as ChatGPT plugins. In this paper, we identify a new opportunity for efficient LLM serving for requests that trigger tools: tool partial execution alongside LLM decoding. To this end, we design Conveyor, an efficient LLM serving system optimized for handling requests involving external tools. We introduce a novel interface for tool developers to expose partial execution opportunities to the LLM serving system and a request scheduler that facilitates partial tool execution. Our results demonstrate that tool partial execution can improve request completion latency by up to 38.8%.

* 11 pages, 8 figures

Via

Access Paper or Ask Questions