Picture for Yuhang Guo

Yuhang Guo

HomeBench: Evaluating LLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices

Add code
May 26, 2025
Viaarxiv icon

DocMEdit: Towards Document-Level Model Editing

Add code
May 26, 2025
Viaarxiv icon

TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments

Add code
May 23, 2025
Viaarxiv icon

Exploring In-Image Machine Translation with Real-World Background

Add code
May 21, 2025
Viaarxiv icon

ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models

Add code
May 19, 2025
Viaarxiv icon

ReFF: Reinforcing Format Faithfulness in Language Models across Varied Tasks

Add code
Dec 12, 2024
Viaarxiv icon

Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation

Add code
Oct 22, 2024
Viaarxiv icon

FAME: Towards Factual Multi-Task Model Editing

Add code
Oct 07, 2024
Viaarxiv icon

Deterministic Reversible Data Augmentation for Neural Machine Translation

Add code
Jun 04, 2024
Viaarxiv icon

Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges

Add code
May 17, 2024
Figure 1 for Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges
Figure 2 for Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges
Figure 3 for Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges
Figure 4 for Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges
Viaarxiv icon