Abstract:Diffusion Transformer (DiT) models excel at generating highquality images through iterative denoising steps, but serving them under strict Service Level Objectives (SLOs) is challenging due to their high computational cost, particularly at large resolutions. Existing serving systems use fixed degree sequence parallelism, which is inefficient for heterogeneous workloads with mixed resolutions and deadlines, leading to poor GPU utilization and low SLO attainment. In this paper, we propose step-level sequence parallelism to dynamically adjust the parallel degree of individual requests according to their deadlines. We present TetriServe, a DiT serving system that implements this strategy for highly efficient image generation. Specifically, TetriServe introduces a novel round-based scheduling mechanism that improves SLO attainment: (1) discretizing time into fixed rounds to make deadline-aware scheduling tractable, (2) adapting parallelism at the step level and minimize GPU hour consumption, and (3) jointly packing requests to minimize late completions. Extensive evaluation on state-of-the-art DiT models shows that TetriServe achieves up to 32% higher SLO attainment compared to existing solutions without degrading image quality.
Abstract:Unstructured data, such as text, images, audio, and video, comprises the vast majority of the world's information, yet it remains poorly supported by traditional data systems that rely on structured formats for computation. We argue for a new paradigm, which we call computing on unstructured data, built around three stages: extraction of latent structure, transformation of this structure through data processing techniques, and projection back into unstructured formats. This bi-directional pipeline allows unstructured data to benefit from the analytical power of structured computation, while preserving the richness and accessibility of unstructured representations for human and AI consumption. We illustrate this paradigm through two use cases and present the research components that need to be developed in a new data system called MXFlow.
Abstract:Cloud infrastructure is the cornerstone of the modern IT industry. However, managing this infrastructure effectively requires considerable manual effort from the DevOps engineering team. We make a case for developing AI agents powered by large language models (LLMs) to automate cloud infrastructure management tasks. In a preliminary study, we investigate the potential for AI agents to use different cloud/user interfaces such as software development kits (SDK), command line interfaces (CLI), Infrastructure-as-Code (IaC) platforms, and web portals. We report takeaways on their effectiveness on different management tasks, and identify research challenges and potential solutions.
Abstract:Automating AI research holds immense potential for accelerating scientific progress, yet current AI agents struggle with the complexities of rigorous, end-to-end experimentation. We introduce EXP-Bench, a novel benchmark designed to systematically evaluate AI agents on complete research experiments sourced from influential AI publications. Given a research question and incomplete starter code, EXP-Bench challenges AI agents to formulate hypotheses, design and implement experimental procedures, execute them, and analyze results. To enable the creation of such intricate and authentic tasks with high-fidelity, we design a semi-autonomous pipeline to extract and structure crucial experimental details from these research papers and their associated open-source code. With the pipeline, EXP-Bench curated 461 AI research tasks from 51 top-tier AI research papers. Evaluations of leading LLM-based agents, such as OpenHands and IterativeAgent on EXP-Bench demonstrate partial capabilities: while scores on individual experimental aspects such as design or implementation correctness occasionally reach 20-35%, the success rate for complete, executable experiments was a mere 0.5%. By identifying these bottlenecks and providing realistic step-by-step experiment procedures, EXP-Bench serves as a vital tool for future AI agents to improve their ability to conduct AI research experiments. EXP-Bench is open-sourced at https://github.com/Just-Curieous/Curie/tree/main/benchmark/exp_bench.
Abstract:Relational databases are central to modern data management, yet most data exists in unstructured forms like text documents. To bridge this gap, we leverage large language models (LLMs) to automatically synthesize a relational database by generating its schema and populating its tables from raw text. We introduce SQUiD, a novel neurosymbolic framework that decomposes this task into four stages, each with specialized techniques. Our experiments show that SQUiD consistently outperforms baselines across diverse datasets.
Abstract:Channel coherence time has been widely regarded as a critical parameter in the design of mobile systems. However, a prominent challenge lies in integrating electromagnetic (EM) polarization effects into the derivation of the channel coherence time. In this paper, we develop a framework to analyze the impact of polarization mismatch on the channel coherence time. Specifically, we first establish an EM channel model to capture the essence of EM wave propagation. Based on this model, we then derive the EM temporal correlation function, incorporating the effects of polarization mismatch and beam misalignment. Further, considering the random orientation of the mobile user equipment (UE), we derive a closed-form solution for the EM coherence time in the turning scenario. When the trajectory degenerates into a straight line, we also provide a closed-form lower bound on the EM coherence time. The simulation results validate our theoretical analysis and reveal that neglecting the EM polarization effects leads to overly optimistic estimates of the EM coherence time.
Abstract:The high computational complexity of the multiple signal classification (MUSIC) algorithm is mainly caused by the subspace decomposition and spectrum search, especially for frequent real-time applications or massive sensors. In this paper, we propose a low-complexity MUSIC algorithm from finite-precision arithmetic perspective. First, we analyze the computational bottlenecks of the classic low-complexity randomized unitary-based MUSIC (RU-MUSIC), formulating this computational issue as an inner product problem. Then, a mixed-precision method is introduced to address this problem. Specifically, this method partitions summations in inner products into blocks, where intra-block computations use low-precision arithmetic and inter-block sums use high-precision arithmetic. To further improve computational accuracy, we develop an adaptive-precision method that supports adaptive block sizes and multiple precision levels. Finally, simulation results show that the proposed finite-precision MUSIC design achieves direction-of-arrival (DOA) estimation performance similar to that using full-precision arithmetic while reducing more than 50\% computational cost.
Abstract:Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4$\times$ improvement in correctly answering experimental questions. Curie is open-sourced at https://github.com/Just-Curieous/Curie.
Abstract:With the help of Score Distillation Sampling(SDS) and the rapid development of various trainable 3D representations, Text-to-Image(T2I) diffusion models have been applied to 3D generation tasks and achieved considerable results. There are also some attempts toward the task of editing 3D objects leveraging this Text-to-3D pipeline. However, most methods currently focus on adding additional geometries, overwriting textures or both. But few of them can perform non-rigid transformation of 3D objects. For those who can perform non-rigid editing, on the other hand, suffer from low-resolution, lack of fidelity and poor flexibility. In order to address these issues, we present: Plasticine3D, a general, high-fidelity, photo-realistic and controllable non-rigid editing pipeline. Firstly, our work divides the editing process into a geometry editing stage and a texture editing stage to achieve more detailed and photo-realistic results ; Secondly, in order to perform non-rigid transformation with controllable results while maintain the fidelity towards original 3D models in the same time, we propose a multi-view-embedding(MVE) optimization strategy to ensure that the diffusion model learns the overall features of the original object and an embedding-fusion(EF) to control the degree of editing by adjusting the value of the fusing rate. We also design a geometry processing step before optimizing on the base geometry to cope with different needs of various editing tasks. Further more, to fully leverage the geometric prior from the original 3D object, we provide an optional replacement of score distillation sampling named score projection sampling(SPS) which enables us to directly perform optimization from the origin 3D mesh in most common median non-rigid editing scenarios. We demonstrate the effectiveness of our method on both the non-rigid 3D editing task and general 3D editing task.
Abstract:Positioning and sensing over wireless networks are imperative for many emerging applications. However, traditional wireless channel models cannot be used for sensing the attitude of the user equipment (UE), since they over-simplify the UE as a point target. In this paper, a comprehensive electromagnetic propagation modeling (EPM) based on electromagnetic theory is developed to precisely model the near-field channel. For the noise-free case, the EPM model establishes the non-linear functional dependence of observed signals on both the position and attitude of the UE. To address the difficulty in the non-linear coupling, we first propose to divide the distance domain into three regions, separated by the defined Phase ambiguity distance and Spacing constraint distance. Then, for each region, we obtain the closed-form solutions for joint position and attitude estimation with low complexity. Next, to investigate the impact of random noise on the joint estimation performance, the Ziv-Zakai bound (ZZB) is derived to yield useful insights. The expected Cram\'er-Rao bound (ECRB) is further provided to obtain the simplified closed-form expressions for the performance lower bounds. Our numerical results demonstrate that the derived ZZB can provide accurate predictions of the performance of estimators in all signal-to-noise ratio (SNR) regimes. More importantly, we achieve the millimeter-level accuracy in position estimation and attain the 0.1-level accuracy in attitude estimation.