Picture for Xiaohua Zhai

Xiaohua Zhai

Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials

Add code
Apr 14, 2025
Figure 1 for Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials
Figure 2 for Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials
Figure 3 for Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials
Figure 4 for Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials
Viaarxiv icon

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Add code
Feb 20, 2025
Figure 1 for SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Figure 2 for SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Figure 3 for SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Figure 4 for SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Viaarxiv icon

Harnessing Language's Fractal Geometry with Recursive Inference Scaling

Add code
Feb 11, 2025
Viaarxiv icon

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Add code
Feb 11, 2025
Viaarxiv icon

PaliGemma 2: A Family of Versatile VLMs for Transfer

Add code
Dec 04, 2024
Figure 1 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 2 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 3 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Figure 4 for PaliGemma 2: A Family of Versatile VLMs for Transfer
Viaarxiv icon

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

Toward a Diffusion-Based Generalist for Dense Vision Tasks

Add code
Jun 29, 2024
Figure 1 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Figure 2 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Figure 3 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Figure 4 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Viaarxiv icon

No Filter: Cultural and Socioeconomic Diversityin Contrastive Vision-Language Models

Add code
May 22, 2024
Viaarxiv icon

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Mar 28, 2024
Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Add code
Mar 07, 2024
Figure 1 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 2 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 3 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 4 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Viaarxiv icon