Picture for Baining Guo

Baining Guo

Incorporating Pre-trained Diffusion Models in Solving the Schrödinger Bridge Problem

Add code
Aug 25, 2025
Viaarxiv icon

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Add code
Jul 31, 2025
Viaarxiv icon

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Add code
Jul 31, 2025
Viaarxiv icon

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Add code
Feb 25, 2025
Viaarxiv icon

Diffusion Models without Classifier-free Guidance

Add code
Feb 17, 2025
Viaarxiv icon

Optimizing Large Language Model Training Using FP4 Quantization

Add code
Jan 28, 2025
Figure 1 for Optimizing Large Language Model Training Using FP4 Quantization
Figure 2 for Optimizing Large Language Model Training Using FP4 Quantization
Figure 3 for Optimizing Large Language Model Training Using FP4 Quantization
Figure 4 for Optimizing Large Language Model Training Using FP4 Quantization
Viaarxiv icon

MageBench: Bridging Large Multimodal Models to Agents

Add code
Dec 05, 2024
Viaarxiv icon

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Add code
Dec 03, 2024
Figure 1 for UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Figure 2 for UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Figure 3 for UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Figure 4 for UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Viaarxiv icon

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Add code
Nov 29, 2024
Figure 1 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 2 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 3 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Figure 4 for CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Viaarxiv icon

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

Add code
Jul 11, 2024
Figure 1 for RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
Figure 2 for RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
Figure 3 for RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
Figure 4 for RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
Viaarxiv icon