Picture for Ser-Nam Lim

Ser-Nam Lim

Facebook Research, New York, NY, USA

AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer

Add code
Mar 16, 2026
Viaarxiv icon

Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising

Add code
Nov 18, 2025
Viaarxiv icon

Think Then Embed: Generative Context Improves Multimodal Embedding

Add code
Oct 06, 2025
Viaarxiv icon

Delta Activations: A Representation for Finetuned Large Language Models

Add code
Sep 04, 2025
Figure 1 for Delta Activations: A Representation for Finetuned Large Language Models
Figure 2 for Delta Activations: A Representation for Finetuned Large Language Models
Figure 3 for Delta Activations: A Representation for Finetuned Large Language Models
Figure 4 for Delta Activations: A Representation for Finetuned Large Language Models
Viaarxiv icon

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Add code
Jun 05, 2025
Figure 1 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 2 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 3 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 4 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Viaarxiv icon

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Add code
Apr 04, 2025
Figure 1 for Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Figure 2 for Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Figure 3 for Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Figure 4 for Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Viaarxiv icon

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention

Add code
Mar 20, 2025
Viaarxiv icon

Temporal Regularization Makes Your Video Generator Stronger

Add code
Mar 19, 2025
Viaarxiv icon

Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View

Add code
Mar 16, 2025
Figure 1 for Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View
Figure 2 for Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View
Figure 3 for Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View
Figure 4 for Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View
Viaarxiv icon

VideoMerge: Towards Training-free Long Video Generation

Add code
Mar 13, 2025
Figure 1 for VideoMerge: Towards Training-free Long Video Generation
Figure 2 for VideoMerge: Towards Training-free Long Video Generation
Figure 3 for VideoMerge: Towards Training-free Long Video Generation
Figure 4 for VideoMerge: Towards Training-free Long Video Generation
Viaarxiv icon