Picture for Sicong Leng

Sicong Leng

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Add code
Jun 18, 2024
Figure 1 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 2 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 3 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 4 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Viaarxiv icon

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Viaarxiv icon

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Add code
Apr 30, 2024
Viaarxiv icon

Constrained Layout Generation with Factor Graphs

Add code
Mar 30, 2024
Figure 1 for Constrained Layout Generation with Factor Graphs
Figure 2 for Constrained Layout Generation with Factor Graphs
Figure 3 for Constrained Layout Generation with Factor Graphs
Figure 4 for Constrained Layout Generation with Factor Graphs
Viaarxiv icon

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Add code
Nov 28, 2023
Viaarxiv icon

Tell2Design: A Dataset for Language-Guided Floor Plan Generation

Add code
Nov 27, 2023
Figure 1 for Tell2Design: A Dataset for Language-Guided Floor Plan Generation
Figure 2 for Tell2Design: A Dataset for Language-Guided Floor Plan Generation
Figure 3 for Tell2Design: A Dataset for Language-Guided Floor Plan Generation
Figure 4 for Tell2Design: A Dataset for Language-Guided Floor Plan Generation
Viaarxiv icon

Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction

Add code
Sep 11, 2021
Figure 1 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Figure 2 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Figure 3 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Figure 4 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Viaarxiv icon

Interventional Video Grounding with Dual Contrastive Learning

Add code
Jul 07, 2021
Figure 1 for Interventional Video Grounding with Dual Contrastive Learning
Figure 2 for Interventional Video Grounding with Dual Contrastive Learning
Figure 3 for Interventional Video Grounding with Dual Contrastive Learning
Figure 4 for Interventional Video Grounding with Dual Contrastive Learning
Viaarxiv icon