Picture for Wentao Bao

Wentao Bao

BLENDER: Blended Text Embeddings and Diffusion Residuals for Intra-Class Image Synthesis in Deep Metric Learning

Add code
Jan 28, 2026
Viaarxiv icon

Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

Add code
Nov 18, 2025
Viaarxiv icon

Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction

Add code
Apr 10, 2025
Viaarxiv icon

Window Token Concatenation for Efficient Visual Large Language Models

Add code
Apr 05, 2025
Viaarxiv icon

Visual Large Language Models for Generalized and Specialized Applications

Add code
Jan 06, 2025
Figure 1 for Visual Large Language Models for Generalized and Specialized Applications
Figure 2 for Visual Large Language Models for Generalized and Specialized Applications
Figure 3 for Visual Large Language Models for Generalized and Specialized Applications
Figure 4 for Visual Large Language Models for Generalized and Specialized Applications
Viaarxiv icon

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection

Add code
Nov 17, 2024
Figure 1 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 2 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 3 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 4 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Viaarxiv icon

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Add code
Sep 22, 2024
Figure 1 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 2 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 3 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 4 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Viaarxiv icon

MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos

Add code
Sep 04, 2024
Figure 1 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Figure 2 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Figure 3 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Figure 4 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Viaarxiv icon

Facial Affective Behavior Analysis with Instruction Tuning

Add code
Apr 07, 2024
Figure 1 for Facial Affective Behavior Analysis with Instruction Tuning
Figure 2 for Facial Affective Behavior Analysis with Instruction Tuning
Figure 3 for Facial Affective Behavior Analysis with Instruction Tuning
Figure 4 for Facial Affective Behavior Analysis with Instruction Tuning
Viaarxiv icon

Latent Space Energy-based Model for Fine-grained Open Set Recognition

Add code
Sep 19, 2023
Viaarxiv icon