Picture for Heng Wang

Heng Wang

OpenCUA: Open Foundations for Computer-Use Agents

Add code
Aug 12, 2025
Viaarxiv icon

DesignLab: Designing Slides Through Iterative Detection and Correction

Add code
Jul 23, 2025
Viaarxiv icon

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Add code
Jun 10, 2025
Viaarxiv icon

Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection

Add code
Apr 28, 2025
Viaarxiv icon

Kimi-VL Technical Report

Add code
Apr 10, 2025
Viaarxiv icon

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Add code
Apr 02, 2025
Viaarxiv icon

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

Add code
Mar 15, 2025
Figure 1 for ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
Figure 2 for ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
Figure 3 for ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
Figure 4 for ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
Viaarxiv icon

BannerAgency: Advertising Banner Design with Multimodal LLM Agents

Add code
Mar 14, 2025
Viaarxiv icon

Reward Shaping to Mitigate Reward Hacking in RLHF

Add code
Feb 26, 2025
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon