Picture for Longyin Wen

Longyin Wen

AIPO: Improving Training Objective for Iterative Preference Optimization

Add code
Sep 13, 2024
Viaarxiv icon

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

Add code
Jun 15, 2024
Viaarxiv icon

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Add code
May 09, 2024
Viaarxiv icon

Edit3K: Universal Representation Learning for Video Editing Components

Add code
Mar 24, 2024
Viaarxiv icon

Accurate and Fast Compressed Video Captioning

Add code
Sep 22, 2023
Viaarxiv icon

Exploring the Role of Audio in Video Captioning

Add code
Jun 21, 2023
Viaarxiv icon

Text with Knowledge Graph Augmented Transformer for Video Captioning

Add code
Mar 25, 2023
Viaarxiv icon

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

Add code
Mar 06, 2023
Viaarxiv icon

Dual-Stream Transformer for Generic Event Boundary Captioning

Add code
Jul 07, 2022
Figure 1 for Dual-Stream Transformer for Generic Event Boundary Captioning
Figure 2 for Dual-Stream Transformer for Generic Event Boundary Captioning
Figure 3 for Dual-Stream Transformer for Generic Event Boundary Captioning
Figure 4 for Dual-Stream Transformer for Generic Event Boundary Captioning
Viaarxiv icon

SC-Transformer++: Structured Context Transformer for Generic Event Boundary Detection

Add code
Jun 25, 2022
Figure 1 for SC-Transformer++: Structured Context Transformer for Generic Event Boundary Detection
Figure 2 for SC-Transformer++: Structured Context Transformer for Generic Event Boundary Detection
Figure 3 for SC-Transformer++: Structured Context Transformer for Generic Event Boundary Detection
Figure 4 for SC-Transformer++: Structured Context Transformer for Generic Event Boundary Detection
Viaarxiv icon