Picture for Jitao Sang

Jitao Sang

Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents

Add code
Mar 12, 2026
Viaarxiv icon

HulluEdit: Single-Pass Evidence-Consistent Subspace Editing for Mitigating Hallucinations in Large Vision-Language Models

Add code
Feb 26, 2026
Viaarxiv icon

GUITester: Enabling GUI Agents for Exploratory Defect Discovery

Add code
Jan 08, 2026
Viaarxiv icon

TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents

Add code
Jan 06, 2026
Viaarxiv icon

Reasoning Shapes Alignment: Investigating Cultural Alignment in Large Reasoning Models with Cultural Norms

Add code
Nov 17, 2025
Viaarxiv icon

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

Add code
Oct 14, 2025
Figure 1 for Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
Figure 2 for Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
Figure 3 for Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
Figure 4 for Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
Viaarxiv icon

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

Add code
Oct 09, 2025
Figure 1 for ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
Figure 2 for ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
Figure 3 for ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
Figure 4 for ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
Viaarxiv icon

Efficient Video-to-Audio Generation via Multiple Foundation Models Mapper

Add code
Sep 05, 2025
Figure 1 for Efficient Video-to-Audio Generation via Multiple Foundation Models Mapper
Figure 2 for Efficient Video-to-Audio Generation via Multiple Foundation Models Mapper
Figure 3 for Efficient Video-to-Audio Generation via Multiple Foundation Models Mapper
Figure 4 for Efficient Video-to-Audio Generation via Multiple Foundation Models Mapper
Viaarxiv icon

NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models

Add code
Jun 15, 2025
Figure 1 for NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
Figure 2 for NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
Figure 3 for NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
Figure 4 for NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
Viaarxiv icon

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

Add code
May 21, 2025
Figure 1 for Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Figure 2 for Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Figure 3 for Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Figure 4 for Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Viaarxiv icon