Picture for Jianwei Yang

Jianwei Yang

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Add code
Dec 05, 2024
Viaarxiv icon

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Add code
Oct 15, 2024
Figure 1 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 2 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 3 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 4 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Viaarxiv icon

Latent Action Pretraining from Videos

Add code
Oct 15, 2024
Figure 1 for Latent Action Pretraining from Videos
Figure 2 for Latent Action Pretraining from Videos
Figure 3 for Latent Action Pretraining from Videos
Figure 4 for Latent Action Pretraining from Videos
Viaarxiv icon

Towards Flexible Visual Relationship Segmentation

Add code
Aug 15, 2024
Viaarxiv icon

OmniParser for Pure Vision Based GUI Agent

Add code
Aug 01, 2024
Figure 1 for OmniParser for Pure Vision Based GUI Agent
Figure 2 for OmniParser for Pure Vision Based GUI Agent
Figure 3 for OmniParser for Pure Vision Based GUI Agent
Figure 4 for OmniParser for Pure Vision Based GUI Agent
Viaarxiv icon

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Add code
Jun 17, 2024
Figure 1 for V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Figure 2 for V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Figure 3 for V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Viaarxiv icon

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

Add code
Jun 06, 2024
Viaarxiv icon

Matryoshka Multimodal Models

Add code
May 27, 2024
Viaarxiv icon

BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once

Add code
May 21, 2024
Figure 1 for BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once
Figure 2 for BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once
Figure 3 for BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once
Figure 4 for BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once
Viaarxiv icon

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Add code
Apr 25, 2024
Viaarxiv icon