Picture for Xinghua Jiang

Xinghua Jiang

HRVDA: High-Resolution Visual Document Assistant

Add code
Apr 10, 2024
Figure 1 for HRVDA: High-Resolution Visual Document Assistant
Figure 2 for HRVDA: High-Resolution Visual Document Assistant
Figure 3 for HRVDA: High-Resolution Visual Document Assistant
Figure 4 for HRVDA: High-Resolution Visual Document Assistant
Viaarxiv icon

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

Add code
Feb 29, 2024
Figure 1 for Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Figure 2 for Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Figure 3 for Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Figure 4 for Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Viaarxiv icon

AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

Add code
Aug 25, 2023
Figure 1 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Figure 2 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Figure 3 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Figure 4 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Viaarxiv icon

OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

Add code
Jul 04, 2022
Figure 1 for OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification
Figure 2 for OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification
Figure 3 for OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification
Figure 4 for OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification
Viaarxiv icon

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

Add code
Apr 18, 2022
Figure 1 for The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
Figure 2 for The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
Figure 3 for The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
Figure 4 for The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
Viaarxiv icon

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition

Add code
Nov 25, 2021
Figure 1 for NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
Figure 2 for NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
Figure 3 for NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
Figure 4 for NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
Viaarxiv icon