Picture for Hu Xu

Hu Xu

Jack

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Add code
Jan 08, 2026
Viaarxiv icon

In Pursuit of Pixel Supervision for Visual Pre-training

Add code
Dec 17, 2025
Viaarxiv icon

MetaCLIP 2: A Worldwide Scaling Recipe

Add code
Jul 29, 2025
Figure 1 for MetaCLIP 2: A Worldwide Scaling Recipe
Figure 2 for MetaCLIP 2: A Worldwide Scaling Recipe
Figure 3 for MetaCLIP 2: A Worldwide Scaling Recipe
Figure 4 for MetaCLIP 2: A Worldwide Scaling Recipe
Viaarxiv icon

GM-LDM: Latent Diffusion Model for Brain Biomarker Identification through Functional Data-Driven Gray Matter Synthesis

Add code
Jun 15, 2025
Viaarxiv icon

Perception Encoder: The best visual embeddings are not at the output of the network

Add code
Apr 17, 2025
Figure 1 for Perception Encoder: The best visual embeddings are not at the output of the network
Figure 2 for Perception Encoder: The best visual embeddings are not at the output of the network
Figure 3 for Perception Encoder: The best visual embeddings are not at the output of the network
Figure 4 for Perception Encoder: The best visual embeddings are not at the output of the network
Viaarxiv icon

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Add code
Feb 13, 2025
Figure 1 for SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Figure 2 for SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Figure 3 for SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Figure 4 for SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Viaarxiv icon

General Information Metrics for Improving AI Model Training Efficiency

Add code
Jan 02, 2025
Figure 1 for General Information Metrics for Improving AI Model Training Efficiency
Figure 2 for General Information Metrics for Improving AI Model Training Efficiency
Figure 3 for General Information Metrics for Improving AI Model Training Efficiency
Figure 4 for General Information Metrics for Improving AI Model Training Efficiency
Viaarxiv icon

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Add code
Dec 20, 2024
Figure 1 for DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Figure 2 for DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Figure 3 for DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Figure 4 for DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Viaarxiv icon

Altogether: Image Captioning via Re-aligning Alt-text

Add code
Oct 22, 2024
Figure 1 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 2 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 3 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 4 for Altogether: Image Captioning via Re-aligning Alt-text
Viaarxiv icon

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Add code
Oct 22, 2024
Figure 1 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 2 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 3 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 4 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Viaarxiv icon