Picture for Xiaoyu Yue

Xiaoyu Yue

MSEarth: A Benchmark for Multimodal Scientific Comprehension of Earth Science

Add code
May 27, 2025
Viaarxiv icon

EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models

Add code
May 22, 2025
Viaarxiv icon

Diffusion Models Need Visual Priors for Image Generation

Add code
Oct 11, 2024
Figure 1 for Diffusion Models Need Visual Priors for Image Generation
Figure 2 for Diffusion Models Need Visual Priors for Image Generation
Figure 3 for Diffusion Models Need Visual Priors for Image Generation
Figure 4 for Diffusion Models Need Visual Priors for Image Generation
Viaarxiv icon

OV-PARTS: Towards Open-Vocabulary Part Segmentation

Add code
Oct 08, 2023
Figure 1 for OV-PARTS: Towards Open-Vocabulary Part Segmentation
Figure 2 for OV-PARTS: Towards Open-Vocabulary Part Segmentation
Figure 3 for OV-PARTS: Towards Open-Vocabulary Part Segmentation
Figure 4 for OV-PARTS: Towards Open-Vocabulary Part Segmentation
Viaarxiv icon

Understanding Masked Autoencoders From a Local Contrastive Perspective

Add code
Oct 03, 2023
Viaarxiv icon

In Defense of Clip-based Video Relation Detection

Add code
Jul 18, 2023
Viaarxiv icon

Rethinking the Two-Stage Framework for Grounded Situation Recognition

Add code
Dec 10, 2021
Figure 1 for Rethinking the Two-Stage Framework for Grounded Situation Recognition
Figure 2 for Rethinking the Two-Stage Framework for Grounded Situation Recognition
Figure 3 for Rethinking the Two-Stage Framework for Grounded Situation Recognition
Figure 4 for Rethinking the Two-Stage Framework for Grounded Situation Recognition
Viaarxiv icon

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

Add code
Aug 14, 2021
Figure 1 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Figure 2 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Figure 3 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Figure 4 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Viaarxiv icon

Vision Transformer with Progressive Sampling

Add code
Aug 03, 2021
Figure 1 for Vision Transformer with Progressive Sampling
Figure 2 for Vision Transformer with Progressive Sampling
Figure 3 for Vision Transformer with Progressive Sampling
Figure 4 for Vision Transformer with Progressive Sampling
Viaarxiv icon

Spatial Dual-Modality Graph Reasoning for Key Information Extraction

Add code
Mar 26, 2021
Figure 1 for Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Figure 2 for Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Figure 3 for Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Figure 4 for Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Viaarxiv icon