Picture for Shengyu Zhang

Shengyu Zhang

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

Add code
Apr 19, 2025
Viaarxiv icon

Disentangled Knowledge Tracing for Alleviating Cognitive Bias

Add code
Mar 04, 2025
Viaarxiv icon

AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks

Add code
Feb 18, 2025
Viaarxiv icon

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Add code
Feb 17, 2025
Viaarxiv icon

Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration

Add code
Jan 10, 2025
Figure 1 for Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration
Figure 2 for Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration
Figure 3 for Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration
Figure 4 for Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration
Viaarxiv icon

Collaboration of Large Language Models and Small Recommendation Models for Device-Cloud Recommendation

Add code
Jan 10, 2025
Viaarxiv icon

Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models

Add code
Jan 10, 2025
Viaarxiv icon

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

Add code
Jan 08, 2025
Viaarxiv icon

Forward Once for All: Structural Parameterized Adaptation for Efficient Cloud-coordinated On-device Recommendation

Add code
Jan 06, 2025
Viaarxiv icon

InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion

Add code
Jan 06, 2025
Figure 1 for InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion
Viaarxiv icon