Picture for Robinson Piramuthu

Robinson Piramuthu

MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization

Add code
Oct 02, 2025
Viaarxiv icon

VaPR -- Vision-language Preference alignment for Reasoning

Add code
Oct 02, 2025
Viaarxiv icon

Towards Internet-Scale Training For Agents

Add code
Feb 10, 2025
Figure 1 for Towards Internet-Scale Training For Agents
Figure 2 for Towards Internet-Scale Training For Agents
Figure 3 for Towards Internet-Scale Training For Agents
Figure 4 for Towards Internet-Scale Training For Agents
Viaarxiv icon

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

Add code
Oct 08, 2024
Figure 1 for T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Figure 2 for T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Figure 3 for T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Figure 4 for T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Viaarxiv icon

S-EQA: Tackling Situational Queries in Embodied Question Answering

Add code
May 08, 2024
Viaarxiv icon

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

Add code
May 08, 2024
Figure 1 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Figure 2 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Figure 3 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Figure 4 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Viaarxiv icon

"Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations

Add code
Apr 12, 2024
Viaarxiv icon

E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer

Add code
Nov 28, 2023
Figure 1 for E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Figure 2 for E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Figure 3 for E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Figure 4 for E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Viaarxiv icon

Characterizing Video Question Answering with Sparsified Inputs

Add code
Nov 27, 2023
Figure 1 for Characterizing Video Question Answering with Sparsified Inputs
Figure 2 for Characterizing Video Question Answering with Sparsified Inputs
Figure 3 for Characterizing Video Question Answering with Sparsified Inputs
Figure 4 for Characterizing Video Question Answering with Sparsified Inputs
Viaarxiv icon

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

Add code
Mar 14, 2023
Viaarxiv icon