Picture for Xudong Lin

Xudong Lin

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Add code
Jun 19, 2024
Figure 1 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 2 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 3 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 4 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Viaarxiv icon

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

Add code
Jun 16, 2024
Viaarxiv icon

BLINK: Multimodal Large Language Models Can See but Not Perceive

Add code
Apr 18, 2024
Figure 1 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Figure 2 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Figure 3 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Figure 4 for BLINK: Multimodal Large Language Models Can See but Not Perceive
Viaarxiv icon

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

Add code
Mar 03, 2024
Figure 1 for SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Figure 2 for SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Figure 3 for SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Figure 4 for SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Jan 18, 2024
Viaarxiv icon

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Add code
Dec 04, 2023
Figure 1 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 2 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 3 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 4 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Add code
Dec 01, 2023
Figure 1 for Video Summarization: Towards Entity-Aware Captions
Figure 2 for Video Summarization: Towards Entity-Aware Captions
Figure 3 for Video Summarization: Towards Entity-Aware Captions
Figure 4 for Video Summarization: Towards Entity-Aware Captions
Viaarxiv icon

Non-Sequential Graph Script Induction via Multimedia Grounding

Add code
May 27, 2023
Figure 1 for Non-Sequential Graph Script Induction via Multimedia Grounding
Figure 2 for Non-Sequential Graph Script Induction via Multimedia Grounding
Figure 3 for Non-Sequential Graph Script Induction via Multimedia Grounding
Figure 4 for Non-Sequential Graph Script Induction via Multimedia Grounding
Viaarxiv icon

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Add code
Apr 07, 2023
Figure 1 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 2 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 3 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 4 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Viaarxiv icon

Supervised Masked Knowledge Distillation for Few-Shot Transformers

Add code
Mar 29, 2023
Figure 1 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Figure 2 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Figure 3 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Figure 4 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Viaarxiv icon