Picture for Pheng-Ann Heng

Pheng-Ann Heng

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency

Add code
Oct 27, 2025
Viaarxiv icon

Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs

Add code
Oct 27, 2025
Viaarxiv icon

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception

Add code
Oct 14, 2025
Viaarxiv icon

REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization

Add code
Oct 06, 2025
Viaarxiv icon

DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis

Add code
Oct 02, 2025
Figure 1 for DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis
Figure 2 for DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis
Figure 3 for DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis
Figure 4 for DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis
Viaarxiv icon

From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?

Add code
Oct 02, 2025
Viaarxiv icon

MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization

Add code
Sep 16, 2025
Viaarxiv icon

From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models

Add code
Aug 06, 2025
Figure 1 for From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models
Figure 2 for From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models
Figure 3 for From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models
Figure 4 for From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models
Viaarxiv icon

ClipGS: Clippable Gaussian Splatting for Interactive Cinematic Visualization of Volumetric Medical Data

Add code
Jul 09, 2025
Viaarxiv icon