Picture for Weihong Lin

Weihong Lin

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Add code
Jan 27, 2026
Viaarxiv icon

Kling-Omni Technical Report

Add code
Dec 18, 2025
Figure 1 for Kling-Omni Technical Report
Figure 2 for Kling-Omni Technical Report
Figure 3 for Kling-Omni Technical Report
Figure 4 for Kling-Omni Technical Report
Viaarxiv icon

Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design

Add code
Jun 05, 2025
Viaarxiv icon

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Add code
Apr 14, 2025
Figure 1 for Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Figure 2 for Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Figure 3 for Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Figure 4 for Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Viaarxiv icon

TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

Add code
Mar 06, 2025
Figure 1 for TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Figure 2 for TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Figure 3 for TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Figure 4 for TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Viaarxiv icon

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models

Add code
Feb 28, 2025
Figure 1 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Figure 2 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Figure 3 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Figure 4 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Viaarxiv icon

UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

Add code
Jan 17, 2024
Viaarxiv icon

A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

Add code
Apr 17, 2023
Figure 1 for A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images
Figure 2 for A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images
Figure 3 for A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images
Figure 4 for A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images
Viaarxiv icon

Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer

Add code
Mar 21, 2023
Figure 1 for Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer
Figure 2 for Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer
Figure 3 for Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer
Figure 4 for Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer
Viaarxiv icon

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Add code
Oct 03, 2022
Figure 1 for Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Figure 2 for Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Figure 3 for Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Figure 4 for Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Viaarxiv icon