Alert button

"Text": models, code, and papers
Alert button

Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?

Sep 27, 2023
Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, Chuchu Fan

Viaarxiv icon

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Jul 31, 2023
Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

Figure 1 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 2 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 3 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 4 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Viaarxiv icon

The Cambridge Law Corpus: A Corpus for Legal AI Research

Sep 21, 2023
Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, Felix Steffek

Viaarxiv icon

AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification

Sep 18, 2023
Abdelrahman Abdallah, Mahmoud Abdalla, Mohamed Elkasaby, Yasser Elbendary, Adam Jatowt

Figure 1 for AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification
Figure 2 for AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification
Figure 3 for AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification
Figure 4 for AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification
Viaarxiv icon

Unsupervised Open-Vocabulary Object Localization in Videos

Sep 18, 2023
Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

Figure 1 for Unsupervised Open-Vocabulary Object Localization in Videos
Figure 2 for Unsupervised Open-Vocabulary Object Localization in Videos
Figure 3 for Unsupervised Open-Vocabulary Object Localization in Videos
Figure 4 for Unsupervised Open-Vocabulary Object Localization in Videos
Viaarxiv icon

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

Sep 12, 2023
Haoxu Wang, Fan Yu, Xian Shi, Yuezhang Wang, Shiliang Zhang, Ming Li

Figure 1 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 2 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 3 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 4 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Viaarxiv icon

Does the "most sinfully decadent cake ever" taste good? Answering Yes/No Questions from Figurative Contexts

Sep 24, 2023
Geetanjali Rakshit, Jeffrey Flanigan

Figure 1 for Does the "most sinfully decadent cake ever" taste good? Answering Yes/No Questions from Figurative Contexts
Figure 2 for Does the "most sinfully decadent cake ever" taste good? Answering Yes/No Questions from Figurative Contexts
Figure 3 for Does the "most sinfully decadent cake ever" taste good? Answering Yes/No Questions from Figurative Contexts
Figure 4 for Does the "most sinfully decadent cake ever" taste good? Answering Yes/No Questions from Figurative Contexts
Viaarxiv icon

Unified Language-Vision Pretraining with Dynamic Discrete Visual Tokenization

Sep 09, 2023
Yang Jin, Kun Xu, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Yadong Mu, Di Zhang, Wenwu Ou, Kun Gai

Figure 1 for Unified Language-Vision Pretraining with Dynamic Discrete Visual Tokenization
Figure 2 for Unified Language-Vision Pretraining with Dynamic Discrete Visual Tokenization
Figure 3 for Unified Language-Vision Pretraining with Dynamic Discrete Visual Tokenization
Figure 4 for Unified Language-Vision Pretraining with Dynamic Discrete Visual Tokenization
Viaarxiv icon

Graph Representation Learning Towards Patents Network Analysis

Sep 25, 2023
Mohammad Heydari, Babak Teimourpour

Figure 1 for Graph Representation Learning Towards Patents Network Analysis
Figure 2 for Graph Representation Learning Towards Patents Network Analysis
Figure 3 for Graph Representation Learning Towards Patents Network Analysis
Figure 4 for Graph Representation Learning Towards Patents Network Analysis
Viaarxiv icon

Delving into Multimodal Prompting for Fine-grained Visual Classification

Sep 16, 2023
Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li

Figure 1 for Delving into Multimodal Prompting for Fine-grained Visual Classification
Figure 2 for Delving into Multimodal Prompting for Fine-grained Visual Classification
Figure 3 for Delving into Multimodal Prompting for Fine-grained Visual Classification
Figure 4 for Delving into Multimodal Prompting for Fine-grained Visual Classification
Viaarxiv icon