Picture for Yanjie Wang

Yanjie Wang

Advancing Sequential Numerical Prediction in Autoregressive Models

Add code
May 19, 2025
Viaarxiv icon

Vision as LoRA

Add code
Mar 26, 2025
Viaarxiv icon

EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models

Add code
Mar 06, 2025
Viaarxiv icon

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Add code
Dec 12, 2024
Viaarxiv icon

Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem

Add code
Sep 05, 2024
Viaarxiv icon

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Add code
Jul 02, 2024
Viaarxiv icon

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

Add code
May 20, 2024
Viaarxiv icon

Elysium: Exploring Object-level Perception in Videos via MLLM

Add code
Mar 29, 2024
Viaarxiv icon

PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition

Add code
Feb 15, 2024
Figure 1 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 2 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 3 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 4 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Viaarxiv icon

GloTSFormer: Global Video Text Spotting Transformer

Add code
Jan 08, 2024
Viaarxiv icon