Picture for Mayank Mishra

Mayank Mishra

Description and Discussion on DCASE 2026 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes

Add code
Apr 01, 2026
Viaarxiv icon

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

Add code
Mar 15, 2026
Viaarxiv icon

Distilling to Hybrid Attention Models via KL-Guided Layer Selection

Add code
Dec 23, 2025
Viaarxiv icon

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

Add code
Dec 16, 2025
Figure 1 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Figure 2 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Figure 3 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Figure 4 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Viaarxiv icon

Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes

Add code
Jun 12, 2025
Figure 1 for Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
Figure 2 for Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
Figure 3 for Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
Viaarxiv icon

FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

Add code
May 28, 2025
Viaarxiv icon

PaTH Attention: Position Encoding via Accumulating Householder Transformations

Add code
May 22, 2025
Viaarxiv icon

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Add code
Jan 11, 2025
Figure 1 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 2 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 3 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 4 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Viaarxiv icon

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Add code
Sep 07, 2024
Figure 1 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Figure 2 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Figure 3 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Figure 4 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Viaarxiv icon

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Add code
Aug 23, 2024
Figure 1 for Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Figure 2 for Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Figure 3 for Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Figure 4 for Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Viaarxiv icon