Picture for Wei Li

Wei Li

Tsinghua University, Beijing, China

Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective

Add code
Jul 11, 2024
Figure 1 for Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective
Figure 2 for Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective
Figure 3 for Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective
Viaarxiv icon

Generalizable Implicit Motion Modeling for Video Frame Interpolation

Add code
Jul 11, 2024
Viaarxiv icon

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Add code
Jul 10, 2024
Figure 1 for LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Figure 2 for LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Figure 3 for LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Figure 4 for LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Viaarxiv icon

Music Era Recognition Using Supervised Contrastive Learning and Artist Information

Add code
Jul 07, 2024
Viaarxiv icon

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation

Add code
Jul 02, 2024
Figure 1 for WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
Figure 2 for WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
Figure 3 for WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
Figure 4 for WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
Viaarxiv icon

Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting

Add code
Jun 28, 2024
Viaarxiv icon

A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR

Add code
Jun 25, 2024
Figure 1 for A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Figure 2 for A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Figure 3 for A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Figure 4 for A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Viaarxiv icon

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

Add code
Jun 24, 2024
Figure 1 for GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization
Figure 2 for GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization
Figure 3 for GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization
Figure 4 for GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization
Viaarxiv icon

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Add code
Jun 22, 2024
Figure 1 for video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Figure 2 for video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Figure 3 for video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Figure 4 for video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Viaarxiv icon