Picture for Minheng Ni

Minheng Ni

MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM

Add code
May 30, 2025
Viaarxiv icon

Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning

Add code
May 26, 2025
Viaarxiv icon

Measurement of LLM's Philosophies of Human Nature

Add code
Apr 03, 2025
Viaarxiv icon

Don't Let Your Robot be Harmful: Responsible Robotic Manipulation

Add code
Nov 27, 2024
Viaarxiv icon

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Add code
Oct 04, 2024
Figure 1 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Figure 2 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Figure 3 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Figure 4 for Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Viaarxiv icon

AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

Add code
Aug 21, 2024
Figure 1 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Figure 2 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Figure 3 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Figure 4 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Viaarxiv icon

Responsible Visual Editing

Add code
Apr 08, 2024
Viaarxiv icon

Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

Add code
Sep 01, 2023
Viaarxiv icon

ORES: Open-vocabulary Responsible Visual Synthesis

Add code
Aug 26, 2023
Viaarxiv icon

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Add code
Mar 22, 2023
Viaarxiv icon