Picture for Junyu Gao

Junyu Gao

Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs

Add code
Feb 09, 2026
Viaarxiv icon

One-Shot Crowd Counting With Density Guidance For Scene Adaptaion

Add code
Feb 08, 2026
Viaarxiv icon

FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis

Add code
Dec 16, 2025
Figure 1 for FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis
Figure 2 for FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis
Figure 3 for FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis
Figure 4 for FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis
Viaarxiv icon

Exploring the Underwater World Segmentation without Extra Training

Add code
Nov 11, 2025
Viaarxiv icon

Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security

Add code
Jul 29, 2025
Viaarxiv icon

WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code

Add code
Jun 09, 2025
Figure 1 for WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code
Figure 2 for WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code
Figure 3 for WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code
Figure 4 for WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code
Viaarxiv icon

LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges

Add code
Jun 09, 2025
Viaarxiv icon

Scale Efficient Training for Large Datasets

Add code
Mar 17, 2025
Figure 1 for Scale Efficient Training for Large Datasets
Figure 2 for Scale Efficient Training for Large Datasets
Figure 3 for Scale Efficient Training for Large Datasets
Figure 4 for Scale Efficient Training for Large Datasets
Viaarxiv icon

From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models

Add code
Mar 08, 2025
Figure 1 for From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
Figure 2 for From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
Figure 3 for From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
Figure 4 for From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
Viaarxiv icon

A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning

Add code
Mar 06, 2025
Figure 1 for A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Figure 2 for A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Figure 3 for A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Figure 4 for A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Viaarxiv icon