Picture for Junke Wang

Junke Wang

Perception Encoder: The best visual embeddings are not at the output of the network

Add code
Apr 17, 2025
Viaarxiv icon

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL

Add code
Apr 15, 2025
Viaarxiv icon

Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning

Add code
Jan 23, 2025
Viaarxiv icon

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

Add code
Jun 13, 2024
Figure 1 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 2 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 3 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 4 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Viaarxiv icon

OmniVid: A Generative Framework for Universal Video Understanding

Add code
Mar 26, 2024
Figure 1 for OmniVid: A Generative Framework for Universal Video Understanding
Figure 2 for OmniVid: A Generative Framework for Universal Video Understanding
Figure 3 for OmniVid: A Generative Framework for Universal Video Understanding
Figure 4 for OmniVid: A Generative Framework for Universal Video Understanding
Viaarxiv icon

MouSi: Poly-Visual-Expert Vision-Language Models

Add code
Jan 30, 2024
Viaarxiv icon

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

Add code
Nov 29, 2023
Viaarxiv icon

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

Add code
Apr 29, 2023
Viaarxiv icon

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

Add code
Mar 21, 2023
Viaarxiv icon

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

Add code
Dec 13, 2022
Figure 1 for Look Before You Match: Instance Understanding Matters in Video Object Segmentation
Figure 2 for Look Before You Match: Instance Understanding Matters in Video Object Segmentation
Figure 3 for Look Before You Match: Instance Understanding Matters in Video Object Segmentation
Figure 4 for Look Before You Match: Instance Understanding Matters in Video Object Segmentation
Viaarxiv icon