Picture for Huiyu Wang

Huiyu Wang

Zebrafish Counting Using Event Stream Data

Add code
Apr 18, 2025
Viaarxiv icon

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Add code
Apr 17, 2025
Viaarxiv icon

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

Add code
Mar 13, 2025
Viaarxiv icon

TimeRefine: Temporal Grounding with Time Refining Video LLM

Add code
Dec 12, 2024
Figure 1 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 2 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 3 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 4 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Viaarxiv icon

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation

Add code
Oct 27, 2024
Figure 1 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Figure 2 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Figure 3 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Figure 4 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Viaarxiv icon

Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos

Add code
Sep 30, 2024
Figure 1 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 2 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 3 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 4 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Viaarxiv icon

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

Add code
Sep 30, 2024
Figure 1 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 2 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 3 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 4 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Viaarxiv icon

Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning

Add code
Aug 07, 2024
Viaarxiv icon

Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data

Add code
Jul 18, 2024
Viaarxiv icon

Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes

Add code
Apr 11, 2024
Figure 1 for Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes
Figure 2 for Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes
Figure 3 for Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes
Figure 4 for Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes
Viaarxiv icon