Picture for Minheng Ni

Minheng Ni

MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM

Add code
May 30, 2025
Viaarxiv icon

Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning

Add code
May 26, 2025
Figure 1 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Figure 2 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Figure 3 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Figure 4 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Viaarxiv icon

Measurement of LLM's Philosophies of Human Nature

Add code
Apr 03, 2025
Viaarxiv icon

Don't Let Your Robot be Harmful: Responsible Robotic Manipulation

Add code
Nov 27, 2024
Viaarxiv icon

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Add code
Oct 04, 2024
Figure 1 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Figure 2 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Figure 3 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Figure 4 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Viaarxiv icon

AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

Add code
Aug 21, 2024
Figure 1 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Figure 2 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Figure 3 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Figure 4 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Viaarxiv icon

Responsible Visual Editing

Add code
Apr 08, 2024
Figure 1 for Responsible Visual Editing
Figure 2 for Responsible Visual Editing
Figure 3 for Responsible Visual Editing
Figure 4 for Responsible Visual Editing
Viaarxiv icon

Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

Add code
Sep 01, 2023
Viaarxiv icon

ORES: Open-vocabulary Responsible Visual Synthesis

Add code
Aug 26, 2023
Figure 1 for ORES: Open-vocabulary Responsible Visual Synthesis
Figure 2 for ORES: Open-vocabulary Responsible Visual Synthesis
Figure 3 for ORES: Open-vocabulary Responsible Visual Synthesis
Figure 4 for ORES: Open-vocabulary Responsible Visual Synthesis
Viaarxiv icon

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Add code
Mar 22, 2023
Figure 1 for NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Figure 2 for NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Figure 3 for NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Figure 4 for NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Viaarxiv icon