Picture for Bin Bi

Bin Bi

Claire

PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning

Add code
Jun 25, 2024
Viaarxiv icon

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

Add code
Jul 17, 2023
Figure 1 for BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Figure 2 for BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Figure 3 for BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Figure 4 for BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Viaarxiv icon

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Add code
Feb 01, 2023
Figure 1 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Figure 2 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Figure 3 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Figure 4 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Viaarxiv icon

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

Add code
May 25, 2022
Figure 1 for mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Figure 2 for mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Figure 3 for mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Figure 4 for mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Viaarxiv icon

Achieving Human Parity on Visual Question Answering

Add code
Nov 19, 2021
Figure 1 for Achieving Human Parity on Visual Question Answering
Figure 2 for Achieving Human Parity on Visual Question Answering
Figure 3 for Achieving Human Parity on Visual Question Answering
Figure 4 for Achieving Human Parity on Visual Question Answering
Viaarxiv icon

Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training

Add code
Aug 21, 2021
Figure 1 for Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Figure 2 for Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Figure 3 for Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Figure 4 for Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Viaarxiv icon

E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning

Add code
Jun 04, 2021
Figure 1 for E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Figure 2 for E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Figure 3 for E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Figure 4 for E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Viaarxiv icon

StructuralLM: Structural Pre-training for Form Understanding

Add code
May 24, 2021
Figure 1 for StructuralLM: Structural Pre-training for Form Understanding
Figure 2 for StructuralLM: Structural Pre-training for Form Understanding
Figure 3 for StructuralLM: Structural Pre-training for Form Understanding
Figure 4 for StructuralLM: Structural Pre-training for Form Understanding
Viaarxiv icon

SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels

Add code
Mar 14, 2021
Figure 1 for SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
Figure 2 for SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
Figure 3 for SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
Figure 4 for SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
Viaarxiv icon

Latent Template Induction with Gumbel-CRFs

Add code
Nov 29, 2020
Figure 1 for Latent Template Induction with Gumbel-CRFs
Figure 2 for Latent Template Induction with Gumbel-CRFs
Figure 3 for Latent Template Induction with Gumbel-CRFs
Figure 4 for Latent Template Induction with Gumbel-CRFs
Viaarxiv icon