Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yajie Zhou

TabQueryBench: A Query-Centric Benchmark for Synthetic Tabular Data

Jul 04, 2026

Jialin Zhang, Fenghao Dong, Yajie Zhou, Vyas Sekar, Shinan Liu

Abstract:Synthetic tabular data support use cases like data sharing, model development under access restrictions, and rapid prototyping of analytical workflows. Modern generative models are evaluated by their statistical similarity, correlation structure, privacy, and downstream machine-learning utility. However, such evaluations leave a gap: they rarely test the structure that matters for analytical queries. We present TabQueryBench, a query-centric benchmark that uses SQL-shaped analytical queries as structural assessors for synthetic data fidelity. It provides an extensible foundation for query-centric synthetic-data evaluation. From 12 public sources of analytical queries, TabQueryBench taxonomizes recurring cross-domain logic into 44 reusable query templates and grounds them to each dataset via a policy-guided template-to-SQL pipeline. This makes queries schema-aware while preserving comparability across generative models. Across 49 datasets and 11 generative models, it activates 10-12 templates per dataset, producing more than 100 executable SQL queries per dataset. Our systematic experiments show five main patterns. First, current tabular generative models can have good distance-based fidelity, but they still fall short on query-centric fidelity: RealTabFormer achieves the highest query-centric fidelity, but it only reaches 0.75 +/- 0.15 (REAL data score is 1.00). Second, tabular generative models struggle with very high-cardinality discrete support. Third, SOTA generative models preserve good global conditional query-centric fidelity, but fail more on local queries. Fourth, tail fidelity deteriorates as queries move toward the extreme tail; even the best model recovers only about 40.7% of real rare values. Finally, there is a fidelity-cost tradeoff in tabular generation: BayesNet offers the strongest tradeoff, with slightly lower query-centric fidelity but much lower generation cost.

Via

Access Paper or Ask Questions

AIChilles: Automatically Uncovering Hidden Weaknesses in AI-Evolved Systems

Jun 14, 2026

Yajie Zhou, Ao Li, Ashwin Silla, Zaoxing Liu, Vyas Sekar

Abstract:The computer systems community has recently seen growing interest in AI-driven system evolution, where AI agents iteratively rewrite systems. Frameworks such as AdaEvolve and Engram report 12-60% score improvements over human-designed algorithms. While these results are promising, there are practical concerns if these AI-evolved programs can perform worse on unseen workloads and exhibit scalability regressions. Given the speed and scale of AI-generated code, we need automated mechanisms to uncover such identify hidden weaknesses in AI-evolved systems programs. To this end, we develop AIChilles that takes as input a baseline program $P$ and an AI-evolved program $P'$, AIChilles searches for valid workloads where $P'$ regresses relative to $P$ in correctness, runtime, memory usage, or output quality. To tackle the diversity in system applications, weakness types and potential bugs, AIChilles combines deterministic workload-parameter extraction, agent-based constraint inference, differential oracles, and code-frequency coverage to discover diverse failures. Across five system applications and 30 AI-evolved programs, AIChilles finds 49 distinct hidden weaknesses. We also show that explicitly including AIChilles in the AI-driven development lifecycle can mitigate several of these weaknesses.

Via

Access Paper or Ask Questions

AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption

May 30, 2025

Yajie Zhou, Xiaoyi Pang, Zhibo Wang

Figure 1 for AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption

Figure 2 for AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption

Figure 3 for AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption

Figure 4 for AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption

Abstract:Federated fine-tuning has emerged as a promising approach to adapt foundation models to downstream tasks using decentralized data. However, real-world deployment remains challenging due to the high computational and communication demands of fine-tuning Large Language Models (LLMs) on clients with data and system resources that are heterogeneous and constrained. In such settings, the global model's performance is often bottlenecked by the weakest clients and further degraded by the non-IID nature of local data. Although existing methods leverage parameter-efficient techniques such as Low-Rank Adaptation (LoRA) to reduce communication and computation overhead, they often fail to simultaneously ensure accurate aggregation of low-rank updates and maintain low system costs, thereby hindering overall performance. To address these challenges, we propose AFLoRA, an adaptive and lightweight federated fine-tuning framework for LLMs. AFLoRA decouples shared and client-specific updates to reduce overhead and improve aggregation accuracy, incorporates diagonal matrix-based rank pruning to better utilize local resources, and employs rank-aware aggregation with public data refinement to strengthen generalization under data heterogeneity. Extensive experiments demonstrate that AFLoRA outperforms state-of-the-art methods in both accuracy and efficiency, providing a practical solution for efficient LLM adaptation in heterogeneous environments in the real world.

Via

Access Paper or Ask Questions

Enhancing Network Management Using Code Generated by Large Language Models

Aug 11, 2023

Sathiya Kumaran Mani, Yajie Zhou, Kevin Hsieh, Santiago Segarra, Ranveer Chandra, Srikanth Kandula

Abstract:Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate task-specific code from natural language queries. This method tackles the challenges of explainability, scalability, and privacy by allowing network operators to inspect the generated code, eliminating the need to share network data with LLMs, and concentrating on application-specific requests combined with general program synthesis techniques. We design and evaluate a prototype system using benchmark applications, showcasing high accuracy, cost-effectiveness, and the potential for further enhancements using complementary program synthesis techniques.

Via

Access Paper or Ask Questions