Picture for Zhishuai Zhang

Zhishuai Zhang

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

De-Diffusion Makes Text a Strong Cross-Modal Interface

Add code
Nov 01, 2023
Figure 1 for De-Diffusion Makes Text a Strong Cross-Modal Interface
Figure 2 for De-Diffusion Makes Text a Strong Cross-Modal Interface
Figure 3 for De-Diffusion Makes Text a Strong Cross-Modal Interface
Figure 4 for De-Diffusion Makes Text a Strong Cross-Modal Interface
Viaarxiv icon

AudioPaLM: A Large Language Model That Can Speak and Listen

Add code
Jun 22, 2023
Figure 1 for AudioPaLM: A Large Language Model That Can Speak and Listen
Figure 2 for AudioPaLM: A Large Language Model That Can Speak and Listen
Figure 3 for AudioPaLM: A Large Language Model That Can Speak and Listen
Figure 4 for AudioPaLM: A Large Language Model That Can Speak and Listen
Viaarxiv icon

Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

Add code
Jun 01, 2023
Figure 1 for Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints
Figure 2 for Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints
Figure 3 for Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints
Figure 4 for Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints
Viaarxiv icon

Noise2Music: Text-conditioned Music Generation with Diffusion Models

Add code
Feb 08, 2023
Figure 1 for Noise2Music: Text-conditioned Music Generation with Diffusion Models
Figure 2 for Noise2Music: Text-conditioned Music Generation with Diffusion Models
Figure 3 for Noise2Music: Text-conditioned Music Generation with Diffusion Models
Figure 4 for Noise2Music: Text-conditioned Music Generation with Diffusion Models
Viaarxiv icon

CGPart: A Part Segmentation Dataset Based on 3D Computer Graphics Models

Add code
Mar 25, 2021
Figure 1 for CGPart: A Part Segmentation Dataset Based on 3D Computer Graphics Models
Figure 2 for CGPart: A Part Segmentation Dataset Based on 3D Computer Graphics Models
Figure 3 for CGPart: A Part Segmentation Dataset Based on 3D Computer Graphics Models
Figure 4 for CGPart: A Part Segmentation Dataset Based on 3D Computer Graphics Models
Viaarxiv icon

Unsupervised Part Discovery via Feature Alignment

Add code
Dec 01, 2020
Figure 1 for Unsupervised Part Discovery via Feature Alignment
Figure 2 for Unsupervised Part Discovery via Feature Alignment
Figure 3 for Unsupervised Part Discovery via Feature Alignment
Figure 4 for Unsupervised Part Discovery via Feature Alignment
Viaarxiv icon

STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

Add code
May 08, 2020
Figure 1 for STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction
Figure 2 for STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction
Figure 3 for STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction
Figure 4 for STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction
Viaarxiv icon

Localizing Occluders with Compositional Convolutional Networks

Add code
Nov 18, 2019
Figure 1 for Localizing Occluders with Compositional Convolutional Networks
Figure 2 for Localizing Occluders with Compositional Convolutional Networks
Figure 3 for Localizing Occluders with Compositional Convolutional Networks
Figure 4 for Localizing Occluders with Compositional Convolutional Networks
Viaarxiv icon