Picture for Sicong Leng

Sicong Leng

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Viaarxiv icon

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Add code
Apr 30, 2024
Viaarxiv icon

Constrained Layout Generation with Factor Graphs

Mar 30, 2024
Viaarxiv icon

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Add code
Nov 28, 2023
Viaarxiv icon

Tell2Design: A Dataset for Language-Guided Floor Plan Generation

Add code
Nov 27, 2023
Viaarxiv icon

Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction

Sep 11, 2021
Figure 1 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Figure 2 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Figure 3 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Figure 4 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Viaarxiv icon

Interventional Video Grounding with Dual Contrastive Learning

Add code
Jul 07, 2021
Figure 1 for Interventional Video Grounding with Dual Contrastive Learning
Figure 2 for Interventional Video Grounding with Dual Contrastive Learning
Figure 3 for Interventional Video Grounding with Dual Contrastive Learning
Figure 4 for Interventional Video Grounding with Dual Contrastive Learning
Viaarxiv icon