Picture for Xiang Bai

Xiang Bai

Huazhong University of Science and Technology

MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Add code
Jun 07, 2024
Figure 1 for MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Figure 2 for MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Figure 3 for MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Figure 4 for MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Viaarxiv icon

Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction

Add code
Jun 05, 2024
Figure 1 for Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Figure 2 for Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Figure 3 for Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Figure 4 for Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Viaarxiv icon

Deciphering Oracle Bone Language with Diffusion Models

Add code
Jun 02, 2024
Figure 1 for Deciphering Oracle Bone Language with Diffusion Models
Figure 2 for Deciphering Oracle Bone Language with Diffusion Models
Figure 3 for Deciphering Oracle Bone Language with Diffusion Models
Figure 4 for Deciphering Oracle Bone Language with Diffusion Models
Viaarxiv icon

Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering

Add code
May 21, 2024
Viaarxiv icon

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

Add code
May 20, 2024
Figure 1 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Figure 2 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Figure 3 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Figure 4 for MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Viaarxiv icon

The First Swahili Language Scene Text Detection and Recognition Dataset

Add code
May 19, 2024
Figure 1 for The First Swahili Language Scene Text Detection and Recognition Dataset
Figure 2 for The First Swahili Language Scene Text Detection and Recognition Dataset
Figure 3 for The First Swahili Language Scene Text Detection and Recognition Dataset
Figure 4 for The First Swahili Language Scene Text Detection and Recognition Dataset
Viaarxiv icon

Exploring the Capabilities of Large Multimodal Models on Dense Text

Add code
May 09, 2024
Figure 1 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Figure 2 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Figure 3 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Figure 4 for Exploring the Capabilities of Large Multimodal Models on Dense Text
Viaarxiv icon

VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

Add code
Apr 30, 2024
Figure 1 for VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
Figure 2 for VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
Figure 3 for VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
Figure 4 for VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
Viaarxiv icon

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Add code
Apr 19, 2024
Figure 1 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 2 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 3 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 4 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Viaarxiv icon

Bridging the Gap Between End-to-End and Two-Step Text Spotting

Add code
Apr 06, 2024
Viaarxiv icon