Picture for Yutong Dai

Yutong Dai

WALT: Web Agents that Learn Tools

Add code
Oct 01, 2025
Viaarxiv icon

GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness

Add code
Oct 01, 2025
Viaarxiv icon

SCUBA: Salesforce Computer Use Benchmark

Add code
Sep 30, 2025
Viaarxiv icon

CoAct-1: Computer-using Agents with Coding as Actions

Add code
Aug 05, 2025
Viaarxiv icon

Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions

Add code
Apr 17, 2025
Viaarxiv icon

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Add code
Aug 16, 2024
Figure 1 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 2 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 3 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 4 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Viaarxiv icon

Variational Bayes for Federated Continual Learning

Add code
May 23, 2024
Figure 1 for Variational Bayes for Federated Continual Learning
Figure 2 for Variational Bayes for Federated Continual Learning
Figure 3 for Variational Bayes for Federated Continual Learning
Figure 4 for Variational Bayes for Federated Continual Learning
Viaarxiv icon

Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras

Add code
Apr 29, 2024
Figure 1 for Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras
Figure 2 for Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras
Figure 3 for Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras
Figure 4 for Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras
Viaarxiv icon

Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis

Add code
Nov 22, 2023
Figure 1 for Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis
Figure 2 for Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis
Figure 3 for Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis
Figure 4 for Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis
Viaarxiv icon

Joint Demosaicing and Denoising with Double Deep Image Priors

Add code
Sep 18, 2023
Figure 1 for Joint Demosaicing and Denoising with Double Deep Image Priors
Figure 2 for Joint Demosaicing and Denoising with Double Deep Image Priors
Figure 3 for Joint Demosaicing and Denoising with Double Deep Image Priors
Figure 4 for Joint Demosaicing and Denoising with Double Deep Image Priors
Viaarxiv icon