Abstract:Failures in large-scale cloud systems incur substantial financial losses, making automated Root Cause Analysis (RCA) essential for operational stability. Recent efforts leverage Large Language Model (LLM) agents to automate this task, yet existing systems exhibit low detection accuracy even with capable models, and current evaluation frameworks assess only final answer correctness without revealing why the agent's reasoning failed. This paper presents a process level failure analysis of LLM-based RCA agents. We execute the full OpenRCA benchmark across five LLM models, producing 1,675 agent runs, and classify observed failures into 12 pitfall types across intra-agent reasoning, inter-agent communication, and agent-environment interaction. Our analysis reveals that the most prevalent pitfalls, notably hallucinated data interpretation and incomplete exploration, persist across all models regardless of capability tier, indicating that these failures originate from the shared agent architecture rather than from individual model limitations. Controlled mitigation experiments further show that prompt engineering alone cannot resolve the dominant pitfalls, whereas enriching the inter-agent communication protocol reduces communication-related failures by up to 15 percentage points. The pitfall taxonomy and diagnostic methodology developed in this work provide a foundation for designing more reliable autonomous agents for cloud RCA.




Abstract:Advancements in quantum computing underscore the critical need for sophisticated qubit readout techniques to accurately discern quantum states. This abstract presents our research intended for optimizing readout pulse fidelity for 2D and 3D Quantum Processing Units (QPUs), the latter coupled with Superconducting Radio Frequency (SRF) cavities. Focusing specifically on the application of the Least Mean Squares (LMS) adaptive filtering algorithm, we explore its integration into the FPGA-based control systems to enhance the accuracy and efficiency of qubit state detection by improving Signal-to-Noise Ratio (SNR). Implementing the LMS algorithm on the Zynq UltraScale+ RFSoC Gen 3 devices (RFSoC 4x2 FPGA and ZCU216 FPGA) using the Quantum Instrumentation Control Kit (QICK) open-source platform, we aim to dynamically test and adjust the filtering parameters in real-time to characterize and adapt to the noise profile presented in quantum computing readout signals. Our preliminary results demonstrate the LMS filter's capability to maintain high readout accuracy while efficiently managing FPGA resources. These findings are expected to contribute to developing more reliable and scalable quantum computing architectures, highlighting the pivotal role of adaptive signal processing in quantum technology advancements.




Abstract:This paper presents a method for building a personalized open-domain dialogue system to address the $\textit{WWH}$ ($\textit{WHAT}$, $\textit{WHEN}$, and $\textit{HOW}$) problem for natural response generation in a commercial setting, where personalized dialogue responses are heavily interleaved with casual response turns. The proposed approach involves weighted dataset blending, negative persona information augmentation methods, and the design of personalized conversation datasets to address the challenges of $\textit{WWH}$ in personalized, open-domain dialogue systems. Our work effectively balances dialogue fluency and tendency to ground, while also introducing a response-type label to improve the controllability and explainability of the grounded responses. The combination of these methods leads to more fluent conversations, as evidenced by subjective human evaluations as well as objective evaluations.


Abstract:When training a machine learning model with observational data, it is often encountered that some values are systemically missing. Learning from the incomplete data in which the missingness depends on some covariates may lead to biased estimation of parameters and even harm the fairness of decision outcome. This paper proposes how to adjust the causal effect of covariates on the missingness when training models using stochastic gradient descent (SGD). Inspired by the design of doubly robust estimator and its theoretical property of double robustness, we introduce stochastic doubly robust gradient (SDRG) consisting of two models: weight-corrected gradients for inverse propensity score weighting and per-covariate control variates for regression adjustment. Also, we identify the connection between double robustness and variance reduction in SGD by demonstrating the SDRG algorithm with a unifying framework for variance reduced SGD. The performance of our approach is empirically tested by showing the convergence in training image classifiers with several examples of missing data.