Alert button

"Text": models, code, and papers
Alert button

A Benchmark for Chinese-English Scene Text Image Super-resolution

Aug 07, 2023
Jianqi Ma, Zhetong Liang, Wangmeng Xiang, Xi Yang, Lei Zhang

Figure 1 for A Benchmark for Chinese-English Scene Text Image Super-resolution
Figure 2 for A Benchmark for Chinese-English Scene Text Image Super-resolution
Figure 3 for A Benchmark for Chinese-English Scene Text Image Super-resolution
Figure 4 for A Benchmark for Chinese-English Scene Text Image Super-resolution
Viaarxiv icon

Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!

Oct 06, 2023
Mariana Lindo, Ana Sofia Santos, André Ferreira, Jianning Li, Gijs Luijten, Gustavo Correia, Moon Kim, Jens Kleesiek, Jan Egger, Victor Alves

Figure 1 for Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!
Figure 2 for Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!
Figure 3 for Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!
Figure 4 for Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!
Viaarxiv icon

A Large-scale Dataset for Audio-Language Representation Learning

Oct 03, 2023
Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie

Figure 1 for A Large-scale Dataset for Audio-Language Representation Learning
Figure 2 for A Large-scale Dataset for Audio-Language Representation Learning
Figure 3 for A Large-scale Dataset for Audio-Language Representation Learning
Figure 4 for A Large-scale Dataset for Audio-Language Representation Learning
Viaarxiv icon

From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

Oct 18, 2023
Dongsheng Jiang, Yuchen Liu, Songlin Liu, Xiaopeng Zhang, Jin Li, Hongkai Xiong, Qi Tian

Figure 1 for From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Figure 2 for From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Figure 3 for From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Figure 4 for From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Viaarxiv icon

JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning

Oct 04, 2023
Chang Gao, Wenxuan Zhang, Guizhen Chen, Wai Lam

Figure 1 for JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
Figure 2 for JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
Figure 3 for JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
Figure 4 for JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
Viaarxiv icon

Can Language Models Employ the Socratic Method? Experiments with Code Debugging

Oct 04, 2023
Erfan Al-Hossami, Razvan Bunescu, Justin Smith, Ryan Teehan

Viaarxiv icon

MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

Sep 09, 2023
Weihao Liu, Fangyu Lei, Tongxu Luo, Jiahe Lei, Shizhu He, Jun Zhao, Kang Liu

Figure 1 for MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images
Figure 2 for MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images
Figure 3 for MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images
Figure 4 for MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images
Viaarxiv icon

BATINet: Background-Aware Text to Image Synthesis and Manipulation Network

Aug 11, 2023
Ryugo Morita, Zhiqiang Zhang, Jinjia Zhou

Figure 1 for BATINet: Background-Aware Text to Image Synthesis and Manipulation Network
Figure 2 for BATINet: Background-Aware Text to Image Synthesis and Manipulation Network
Figure 3 for BATINet: Background-Aware Text to Image Synthesis and Manipulation Network
Figure 4 for BATINet: Background-Aware Text to Image Synthesis and Manipulation Network
Viaarxiv icon

Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature

Oct 08, 2023
Guangsheng Bao, Yanbin Zhao, Zhiyang Teng, Linyi Yang, Yue Zhang

Viaarxiv icon

InstructDET: Diversifying Referring Object Detection with Generalized Instructions

Oct 17, 2023
Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song

Figure 1 for InstructDET: Diversifying Referring Object Detection with Generalized Instructions
Figure 2 for InstructDET: Diversifying Referring Object Detection with Generalized Instructions
Figure 3 for InstructDET: Diversifying Referring Object Detection with Generalized Instructions
Figure 4 for InstructDET: Diversifying Referring Object Detection with Generalized Instructions
Viaarxiv icon