Picture for Ruoxi Sun

Ruoxi Sun

Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection

Add code
Apr 02, 2025
Viaarxiv icon

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Add code
Apr 01, 2025
Viaarxiv icon

Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies

Add code
Feb 04, 2025
Viaarxiv icon

SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling

Add code
Jan 31, 2025
Figure 1 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Figure 2 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Figure 3 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Figure 4 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Viaarxiv icon

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Add code
Jan 18, 2025
Figure 1 for Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
Figure 2 for Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
Figure 3 for Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
Figure 4 for Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
Viaarxiv icon

Data-Centric Improvements for Enhancing Multi-Modal Understanding in Spoken Conversation Modeling

Add code
Dec 20, 2024
Viaarxiv icon

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Add code
Nov 12, 2024
Figure 1 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 2 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 3 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 4 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Viaarxiv icon

AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems

Add code
Nov 09, 2024
Figure 1 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Figure 2 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Figure 3 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Figure 4 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Viaarxiv icon

Edge Unlearning is Not "on Edge"! An Adaptive Exact Unlearning System on Resource-Constrained Devices

Add code
Oct 15, 2024
Viaarxiv icon

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

Add code
Oct 09, 2024
Viaarxiv icon