Picture for Zhaoyang Liu

Zhaoyang Liu

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Add code
Jul 30, 2024
Figure 1 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 2 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 3 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 4 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Viaarxiv icon

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

Add code
Jun 12, 2024
Figure 1 for VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Figure 2 for VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Figure 3 for VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Figure 4 for VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Viaarxiv icon

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Add code
Jun 06, 2024
Figure 1 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Figure 2 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Figure 3 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Figure 4 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Viaarxiv icon

LLMs Meet Multimodal Generation and Editing: A Survey

Add code
May 29, 2024
Figure 1 for LLMs Meet Multimodal Generation and Editing: A Survey
Figure 2 for LLMs Meet Multimodal Generation and Editing: A Survey
Figure 3 for LLMs Meet Multimodal Generation and Editing: A Survey
Figure 4 for LLMs Meet Multimodal Generation and Editing: A Survey
Viaarxiv icon

Paths of A Million People: Extracting Life Trajectories from Wikipedia

Add code
May 25, 2024
Viaarxiv icon

Linear Gaussian Bounding Box Representation and Ring-Shaped Rotated Convolution for Oriented Object Detection

Add code
Nov 14, 2023
Figure 1 for Linear Gaussian Bounding Box Representation and Ring-Shaped Rotated Convolution for Oriented Object Detection
Figure 2 for Linear Gaussian Bounding Box Representation and Ring-Shaped Rotated Convolution for Oriented Object Detection
Figure 3 for Linear Gaussian Bounding Box Representation and Ring-Shaped Rotated Convolution for Oriented Object Detection
Figure 4 for Linear Gaussian Bounding Box Representation and Ring-Shaped Rotated Convolution for Oriented Object Detection
Viaarxiv icon

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Add code
Oct 30, 2023
Figure 1 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Figure 2 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Figure 3 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Figure 4 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Viaarxiv icon

Data-Juicer: A One-Stop Data Processing System for Large Language Models

Add code
Sep 05, 2023
Figure 1 for Data-Juicer: A One-Stop Data Processing System for Large Language Models
Figure 2 for Data-Juicer: A One-Stop Data Processing System for Large Language Models
Figure 3 for Data-Juicer: A One-Stop Data Processing System for Large Language Models
Figure 4 for Data-Juicer: A One-Stop Data Processing System for Large Language Models
Viaarxiv icon

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

Add code
May 11, 2023
Figure 1 for InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Figure 2 for InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Figure 3 for InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Figure 4 for InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Viaarxiv icon

VLG: General Video Recognition with Web Textual Knowledge

Add code
Dec 03, 2022
Figure 1 for VLG: General Video Recognition with Web Textual Knowledge
Figure 2 for VLG: General Video Recognition with Web Textual Knowledge
Figure 3 for VLG: General Video Recognition with Web Textual Knowledge
Figure 4 for VLG: General Video Recognition with Web Textual Knowledge
Viaarxiv icon