Alert button

"Text": models, code, and papers
Alert button

Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?

Aug 12, 2023
Risab Biswas

Figure 1 for Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?
Figure 2 for Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?
Figure 3 for Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?
Figure 4 for Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?
Viaarxiv icon

Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Sep 29, 2023
Weizhe Lin, Jinghong Chen, Jingbiao Mei, Alexandru Coca, Bill Byrne

Viaarxiv icon

Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation

Aug 11, 2023
Yuki Endo

Figure 1 for Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation
Figure 2 for Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation
Figure 3 for Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation
Figure 4 for Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation
Viaarxiv icon

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Aug 19, 2023
Yangyi Huang, Hongwei Yi, Yuliang Xiu, Tingting Liao, Jiaxiang Tang, Deng Cai, Justus Thies

Figure 1 for TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Figure 2 for TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Figure 3 for TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Figure 4 for TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Viaarxiv icon

MASON-NLP at eRisk 2023: Deep Learning-Based Detection of Depression Symptoms from Social Media Texts

Oct 17, 2023
Fardin Ahsan Sakib, Ahnaf Atef Choudhury, Ozlem Uzuner

Viaarxiv icon

A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis

Sep 21, 2023
Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang

Figure 1 for A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
Figure 2 for A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
Figure 3 for A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
Figure 4 for A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
Viaarxiv icon

MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval

Sep 04, 2023
Zijun Long, George Killick, Richard McCreadie, Gerardo Aragon Camarasa

Viaarxiv icon

Jointly Training Large Autoregressive Multimodal Models

Sep 28, 2023
Emanuele Aiello, Lili Yu, Yixin Nie, Armen Aghajanyan, Barlas Oguz

Figure 1 for Jointly Training Large Autoregressive Multimodal Models
Figure 2 for Jointly Training Large Autoregressive Multimodal Models
Figure 3 for Jointly Training Large Autoregressive Multimodal Models
Figure 4 for Jointly Training Large Autoregressive Multimodal Models
Viaarxiv icon

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

Oct 10, 2023
Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, Long Quan, Ying Shan, Yonghong Tian

Figure 1 for HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Figure 2 for HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Figure 3 for HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Figure 4 for HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Viaarxiv icon

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

Oct 10, 2023
Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Rohit Kumar, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

Figure 1 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Figure 2 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Figure 3 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Figure 4 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Viaarxiv icon