Picture for Chanakya Ekbote

Chanakya Ekbote

Interleaved Head Attention

Add code
Feb 24, 2026
Viaarxiv icon

OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization

Add code
Feb 11, 2026
Viaarxiv icon

MURPHY: Multi-Turn GRPO for Self Correcting Code Generation

Add code
Nov 11, 2025
Viaarxiv icon

What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains

Add code
Aug 10, 2025
Viaarxiv icon

PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts

Add code
Jun 06, 2025
Viaarxiv icon

TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models

Add code
Apr 15, 2025
Viaarxiv icon

Understanding the Emergence of Multimodal Representation Alignment

Add code
Feb 22, 2025
Figure 1 for Understanding the Emergence of Multimodal Representation Alignment
Figure 2 for Understanding the Emergence of Multimodal Representation Alignment
Figure 3 for Understanding the Emergence of Multimodal Representation Alignment
Figure 4 for Understanding the Emergence of Multimodal Representation Alignment
Viaarxiv icon

Local to Global: Learning Dynamics and Effect of Initialization for Transformers

Add code
Jun 05, 2024
Figure 1 for Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Figure 2 for Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Figure 3 for Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Figure 4 for Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Viaarxiv icon

FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations

Add code
Oct 04, 2023
Viaarxiv icon

Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions

Add code
Nov 02, 2022
Figure 1 for Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions
Figure 2 for Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions
Figure 3 for Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions
Figure 4 for Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions
Viaarxiv icon