Picture for Biao Yang

Biao Yang

AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs

Add code
Jul 16, 2024
Viaarxiv icon

Exploring the Capabilities of Large Multimodal Models on Dense Text

Add code
May 09, 2024
Viaarxiv icon

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

Add code
Mar 15, 2024
Figure 1 for TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
Figure 2 for TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
Figure 3 for TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
Figure 4 for TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
Viaarxiv icon

Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition

Add code
Feb 24, 2024
Figure 1 for Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition
Figure 2 for Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition
Figure 3 for Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition
Figure 4 for Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition
Viaarxiv icon

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

Add code
Feb 21, 2024
Figure 1 for Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition
Figure 2 for Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition
Figure 3 for Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition
Figure 4 for Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition
Viaarxiv icon

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Add code
Nov 24, 2023
Figure 1 for Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Figure 2 for Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Figure 3 for Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Figure 4 for Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Viaarxiv icon

Looking and Listening: Audio Guided Text Recognition

Add code
Jun 06, 2023
Figure 1 for Looking and Listening: Audio Guided Text Recognition
Figure 2 for Looking and Listening: Audio Guided Text Recognition
Figure 3 for Looking and Listening: Audio Guided Text Recognition
Figure 4 for Looking and Listening: Audio Guided Text Recognition
Viaarxiv icon

Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data

Add code
Feb 10, 2023
Figure 1 for Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data
Figure 2 for Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data
Figure 3 for Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data
Figure 4 for Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data
Viaarxiv icon

Searching Intrinsic Dimensions of Vision Transformers

Add code
Apr 16, 2022
Figure 1 for Searching Intrinsic Dimensions of Vision Transformers
Figure 2 for Searching Intrinsic Dimensions of Vision Transformers
Viaarxiv icon

NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

Add code
May 05, 2020
Figure 1 for NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results
Figure 2 for NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results
Figure 3 for NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results
Figure 4 for NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results
Viaarxiv icon