Picture for Dohwan Ko

Dohwan Ko

Large Language Models are Temporal and Causal Reasoners for Video Question Answering

Add code
Nov 06, 2023
Figure 1 for Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Figure 2 for Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Figure 3 for Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Figure 4 for Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Viaarxiv icon

Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

Add code
Aug 18, 2023
Figure 1 for Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
Figure 2 for Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
Figure 3 for Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
Figure 4 for Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
Viaarxiv icon

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

Add code
Mar 23, 2023
Figure 1 for MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Figure 2 for MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Figure 3 for MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Figure 4 for MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Viaarxiv icon

Video-Text Representation Learning via Differentiable Weak Temporal Alignment

Add code
Mar 31, 2022
Figure 1 for Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Figure 2 for Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Figure 3 for Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Figure 4 for Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Viaarxiv icon