Alert button

"Text": models, code, and papers
Alert button

IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition

Dec 19, 2023
Xiaomeng Yang, Zhi Qiao, Yu Zhou, Weiping Wang

Viaarxiv icon

CLAPP: Contrastive Language-Audio Pre-training in Passive Underwater Vessel Classification

Jan 15, 2024
Zeyu Li, Jingsheng Gao, Tong Yu, Suncheng Xiang, Jiacheng Ruan, Ting Liu, Yuzhuo Fu

Viaarxiv icon

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Jan 15, 2024
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

Viaarxiv icon

Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System

Jan 17, 2024
Feng Jiang, Kuang Wang, Haizhou Li

Viaarxiv icon

LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase

Jan 11, 2024
Chujie Gao, Dongping Chen, Qihui Zhang, Yue Huang, Yao Wan, Lichao Sun

Viaarxiv icon

Foundations of Vector Retrieval

Jan 17, 2024
Sebastian Bruch

Viaarxiv icon

From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion

Dec 31, 2023
Xingyuan Li, Yang Zou, Jinyuan Liu, Zhiying Jiang, Long Ma, Xin Fan, Risheng Liu

Viaarxiv icon

PlasmoData.jl -- A Julia Framework for Modeling and Analyzing Complex Data as Graphs

Jan 21, 2024
David L Cole, Victor M Zavala

Viaarxiv icon

Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization

Jan 16, 2024
Qi Bi, Wei Ji, Jingjun Yi, Haolan Zhan, Gui-Song Xia

Viaarxiv icon

MMToM-QA: Multimodal Theory of Mind Question Answering

Jan 16, 2024
Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu

Viaarxiv icon