Abstract:Table Question Answering (TableQA) enables natural language interaction with structured tabular data. However, existing large language model (LLM) approaches face critical limitations: context length constraints that restrict data handling capabilities, hallucination issues that compromise answer reliability, and single-agent architectures that struggle with complex reasoning scenarios involving semantic relationships and multi-hop logic. This paper introduces DataFactory, a multi-agent framework that addresses these limitations through specialized team coordination and automated knowledge transformation. The framework comprises a Data Leader employing the ReAct paradigm for reasoning orchestration, together with dedicated Database and Knowledge Graph teams, enabling the systematic decomposition of complex queries into structured and relational reasoning tasks. We formalize automated data-to-knowledge graph transformation via the mapping function T:D x S x R -> G, and implement natural language-based consultation that - unlike fixed workflow multi-agent systems - enables flexible inter-agent deliberation and adaptive planning to improve coordination robustness. We also apply context engineering strategies that integrate historical patterns and domain knowledge to reduce hallucinations and improve query accuracy. Across TabFact, WikiTableQuestions, and FeTaQA, using eight LLMs from five providers, results show consistent gains. Our approach improves accuracy by 20.2% (TabFact) and 23.9% (WikiTQ) over baselines, with significant effects (Cohen's d > 1). Team coordination also outperforms single-team variants (+5.5% TabFact, +14.4% WikiTQ, +17.1% FeTaQA ROUGE-2). The framework offers design guidelines for multi-agent collaboration and a practical platform for enterprise data analysis through integrated structured querying and graph-based knowledge representation.




Abstract:As autonomous driving systems (ADS) advance towards higher levels of autonomy, orchestrating their safety verification becomes increasingly intricate. This paper unveils ScenarioFuzz, a pioneering scenario-based fuzz testing methodology. Designed like a choreographer who understands the past performances, it uncovers vulnerabilities in ADS without the crutch of predefined scenarios. Leveraging map road networks, such as OPENDRIVE, we extract essential data to form a foundational scenario seed corpus. This corpus, enriched with pertinent information, provides the necessary boundaries for fuzz testing in the absence of starting scenarios. Our approach integrates specialized mutators and mutation techniques, combined with a graph neural network model, to predict and filter out high-risk scenario seeds, optimizing the fuzzing process using historical test data. Compared to other methods, our approach reduces the time cost by an average of 60.3%, while the number of error scenarios discovered per unit of time increases by 103%. Furthermore, we propose a self-supervised collision trajectory clustering method, which aids in identifying and summarizing 54 high-risk scenario categories prone to inducing ADS faults. Our experiments have successfully uncovered 58 bugs across six tested systems, emphasizing the critical safety concerns of ADS.