Picture for Yufan Zhou

Yufan Zhou

LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding

Add code
Nov 02, 2024
Viaarxiv icon

TextLap: Customizing Language Models for Text-to-Layout Planning

Add code
Oct 09, 2024
Viaarxiv icon

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

Add code
Oct 04, 2024
Figure 1 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Figure 2 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Figure 3 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Figure 4 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Viaarxiv icon

MMR: Evaluating Reading Ability of Large Multimodal Models

Add code
Aug 26, 2024
Figure 1 for MMR: Evaluating Reading Ability of Large Multimodal Models
Figure 2 for MMR: Evaluating Reading Ability of Large Multimodal Models
Figure 3 for MMR: Evaluating Reading Ability of Large Multimodal Models
Figure 4 for MMR: Evaluating Reading Ability of Large Multimodal Models
Viaarxiv icon

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models

Add code
Jul 27, 2024
Figure 1 for LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
Figure 2 for LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
Figure 3 for LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
Figure 4 for LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
Viaarxiv icon

Subspace Constrained Variational Bayesian Inference for Structured Compressive Sensing with a Dynamic Grid

Add code
Jul 24, 2024
Viaarxiv icon

ARTIST: Improving the Generation of Text-rich Images by Disentanglement

Add code
Jun 17, 2024
Figure 1 for ARTIST: Improving the Generation of Text-rich Images by Disentanglement
Figure 2 for ARTIST: Improving the Generation of Text-rich Images by Disentanglement
Figure 3 for ARTIST: Improving the Generation of Text-rich Images by Disentanglement
Figure 4 for ARTIST: Improving the Generation of Text-rich Images by Disentanglement
Viaarxiv icon

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

Add code
Jun 13, 2024
Viaarxiv icon

TRINS: Towards Multimodal Language Models that Can Read

Add code
Jun 10, 2024
Viaarxiv icon

Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models

Add code
Feb 16, 2024
Figure 1 for Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Figure 2 for Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Figure 3 for Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Figure 4 for Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Viaarxiv icon