Picture for Jingkang Yang

Jingkang Yang

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Add code
Jul 31, 2024
Figure 1 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 2 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 3 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 4 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Viaarxiv icon

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Add code
Jul 17, 2024
Figure 1 for LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Figure 2 for LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Figure 3 for LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Figure 4 for LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Viaarxiv icon

Long Context Transfer from Language to Vision

Add code
Jun 24, 2024
Figure 1 for Long Context Transfer from Language to Vision
Figure 2 for Long Context Transfer from Language to Vision
Figure 3 for Long Context Transfer from Language to Vision
Figure 4 for Long Context Transfer from Language to Vision
Viaarxiv icon

4D Panoptic Scene Graph Generation

Add code
May 16, 2024
Figure 1 for 4D Panoptic Scene Graph Generation
Figure 2 for 4D Panoptic Scene Graph Generation
Figure 3 for 4D Panoptic Scene Graph Generation
Figure 4 for 4D Panoptic Scene Graph Generation
Viaarxiv icon

WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning

Add code
May 06, 2024
Figure 1 for WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Figure 2 for WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Figure 3 for WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Figure 4 for WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Viaarxiv icon

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Add code
Mar 29, 2024
Figure 1 for Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models
Figure 2 for Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models
Figure 3 for Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models
Figure 4 for Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models
Viaarxiv icon

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Add code
Jan 18, 2024
Figure 1 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Figure 2 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Figure 3 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Figure 4 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Viaarxiv icon

Panoptic Video Scene Graph Generation

Add code
Nov 28, 2023
Figure 1 for Panoptic Video Scene Graph Generation
Figure 2 for Panoptic Video Scene Graph Generation
Figure 3 for Panoptic Video Scene Graph Generation
Figure 4 for Panoptic Video Scene Graph Generation
Viaarxiv icon

OtterHD: A High-Resolution Multi-modality Model

Add code
Nov 07, 2023
Figure 1 for OtterHD: A High-Resolution Multi-modality Model
Figure 2 for OtterHD: A High-Resolution Multi-modality Model
Figure 3 for OtterHD: A High-Resolution Multi-modality Model
Figure 4 for OtterHD: A High-Resolution Multi-modality Model
Viaarxiv icon

Large Language Models are Visual Reasoning Coordinators

Add code
Oct 23, 2023
Figure 1 for Large Language Models are Visual Reasoning Coordinators
Figure 2 for Large Language Models are Visual Reasoning Coordinators
Figure 3 for Large Language Models are Visual Reasoning Coordinators
Figure 4 for Large Language Models are Visual Reasoning Coordinators
Viaarxiv icon