Picture for Yixin Nie

Yixin Nie

Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning

Add code
Mar 22, 2024
Figure 1 for Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
Figure 2 for Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
Figure 3 for Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
Figure 4 for Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
Viaarxiv icon

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

Add code
Nov 15, 2023
Figure 1 for The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
Figure 2 for The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
Figure 3 for The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
Figure 4 for The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
Viaarxiv icon

Jointly Training Large Autoregressive Multimodal Models

Add code
Sep 28, 2023
Figure 1 for Jointly Training Large Autoregressive Multimodal Models
Figure 2 for Jointly Training Large Autoregressive Multimodal Models
Figure 3 for Jointly Training Large Autoregressive Multimodal Models
Figure 4 for Jointly Training Large Autoregressive Multimodal Models
Viaarxiv icon

Llama 2: Open Foundation and Fine-Tuned Chat Models

Add code
Jul 19, 2023
Figure 1 for Llama 2: Open Foundation and Fine-Tuned Chat Models
Figure 2 for Llama 2: Open Foundation and Fine-Tuned Chat Models
Figure 3 for Llama 2: Open Foundation and Fine-Tuned Chat Models
Figure 4 for Llama 2: Open Foundation and Fine-Tuned Chat Models
Viaarxiv icon

Text-guided 3D Human Generation from 2D Collections

Add code
May 23, 2023
Figure 1 for Text-guided 3D Human Generation from 2D Collections
Figure 2 for Text-guided 3D Human Generation from 2D Collections
Figure 3 for Text-guided 3D Human Generation from 2D Collections
Figure 4 for Text-guided 3D Human Generation from 2D Collections
Viaarxiv icon

TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models

Add code
Apr 18, 2023
Figure 1 for TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models
Figure 2 for TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models
Figure 3 for TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models
Figure 4 for TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models
Viaarxiv icon

3DGen: Triplane Latent Diffusion for Textured Mesh Generation

Add code
Mar 09, 2023
Figure 1 for 3DGen: Triplane Latent Diffusion for Textured Mesh Generation
Figure 2 for 3DGen: Triplane Latent Diffusion for Textured Mesh Generation
Figure 3 for 3DGen: Triplane Latent Diffusion for Textured Mesh Generation
Figure 4 for 3DGen: Triplane Latent Diffusion for Textured Mesh Generation
Viaarxiv icon

CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

Add code
Mar 07, 2023
Figure 1 for CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding
Figure 2 for CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding
Figure 3 for CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding
Figure 4 for CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding
Viaarxiv icon

TVLT: Textless Vision-Language Transformer

Add code
Sep 28, 2022
Figure 1 for TVLT: Textless Vision-Language Transformer
Figure 2 for TVLT: Textless Vision-Language Transformer
Figure 3 for TVLT: Textless Vision-Language Transformer
Figure 4 for TVLT: Textless Vision-Language Transformer
Viaarxiv icon

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

Add code
Dec 08, 2021
Figure 1 for MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Figure 2 for MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Figure 3 for MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Figure 4 for MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Viaarxiv icon