Alert button
Picture for Qin Jin

Qin Jin

Alert button

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Add code
Bookmark button
Alert button
Mar 19, 2024
Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

Figure 1 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 2 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 3 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 4 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Viaarxiv icon

SPAFormer: Sequential 3D Part Assembly with Transformers

Add code
Bookmark button
Alert button
Mar 09, 2024
Boshen Xu, Sipeng Zheng, Qin Jin

Figure 1 for SPAFormer: Sequential 3D Part Assembly with Transformers
Figure 2 for SPAFormer: Sequential 3D Part Assembly with Transformers
Figure 3 for SPAFormer: Sequential 3D Part Assembly with Transformers
Figure 4 for SPAFormer: Sequential 3D Part Assembly with Transformers
Viaarxiv icon

POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

Add code
Bookmark button
Alert button
Mar 09, 2024
Boshen Xu, Sipeng Zheng, Qin Jin

Figure 1 for POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Figure 2 for POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Figure 3 for POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Figure 4 for POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Viaarxiv icon

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

Add code
Bookmark button
Alert button
Feb 22, 2024
Zihao Yue, Liang Zhang, Qin Jin

Viaarxiv icon

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2

Add code
Bookmark button
Alert button
Jan 31, 2024
Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe

Viaarxiv icon

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

Add code
Bookmark button
Alert button
Oct 08, 2023
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang

Figure 1 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 2 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 3 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 4 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Viaarxiv icon

Explore and Tell: Embodied Visual Captioning in 3D Environments

Add code
Bookmark button
Alert button
Aug 21, 2023
Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin

Figure 1 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 2 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 3 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 4 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Viaarxiv icon

A Systematic Exploration of Joint-training for Singing Voice Synthesis

Add code
Bookmark button
Alert button
Aug 05, 2023
Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin Jin

Figure 1 for A Systematic Exploration of Joint-training for Singing Voice Synthesis
Figure 2 for A Systematic Exploration of Joint-training for Singing Voice Synthesis
Figure 3 for A Systematic Exploration of Joint-training for Singing Voice Synthesis
Figure 4 for A Systematic Exploration of Joint-training for Singing Voice Synthesis
Viaarxiv icon

Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences

Add code
Bookmark button
Alert button
Jul 31, 2023
Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin

Figure 1 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Figure 2 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Figure 3 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Figure 4 for Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
Viaarxiv icon

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

Add code
Bookmark button
Alert button
Jul 20, 2023
Qi Zhang, Sipeng Zheng, Qin Jin

Figure 1 for No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Figure 2 for No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Figure 3 for No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Figure 4 for No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Viaarxiv icon