Picture for Qin Jin

Qin Jin

Renmin University of China

ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains

Add code
May 17, 2024
Figure 1 for ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains
Figure 2 for ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains
Figure 3 for ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains
Figure 4 for ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains
Viaarxiv icon

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

Add code
Apr 25, 2024
Figure 1 for TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
Figure 2 for TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
Figure 3 for TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
Figure 4 for TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
Viaarxiv icon

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

Add code
Apr 23, 2024
Figure 1 for Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Figure 2 for Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Figure 3 for Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Figure 4 for Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Viaarxiv icon

Movie101v2: Improved Movie Narration Benchmark

Add code
Apr 20, 2024
Figure 1 for Movie101v2: Improved Movie Narration Benchmark
Figure 2 for Movie101v2: Improved Movie Narration Benchmark
Figure 3 for Movie101v2: Improved Movie Narration Benchmark
Figure 4 for Movie101v2: Improved Movie Narration Benchmark
Viaarxiv icon

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Add code
Mar 19, 2024
Figure 1 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 2 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 3 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 4 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Viaarxiv icon

POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

Add code
Mar 09, 2024
Viaarxiv icon

SPAFormer: Sequential 3D Part Assembly with Transformers

Add code
Mar 09, 2024
Figure 1 for SPAFormer: Sequential 3D Part Assembly with Transformers
Figure 2 for SPAFormer: Sequential 3D Part Assembly with Transformers
Figure 3 for SPAFormer: Sequential 3D Part Assembly with Transformers
Figure 4 for SPAFormer: Sequential 3D Part Assembly with Transformers
Viaarxiv icon

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

Add code
Feb 22, 2024
Viaarxiv icon

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2

Add code
Jan 31, 2024
Figure 1 for Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2
Figure 2 for Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2
Figure 3 for Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2
Figure 4 for Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2
Viaarxiv icon

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

Add code
Oct 08, 2023
Figure 1 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 2 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 3 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 4 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Viaarxiv icon