Oops! No exact matches were found based on your query. Here are some results similar to "Mha Fixer":


TemporalDoRA: Temporal PEFT for Robust Surgical Video Question Answering

Add code
Mar 10, 2026
Viaarxiv icon

SORT: A Systematically Optimized Ranking Transformer for Industrial-scale Recommenders

Add code
Mar 04, 2026
Viaarxiv icon

Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

Add code
Feb 28, 2026
Viaarxiv icon

Interleaved Head Attention

Add code
Feb 24, 2026
Viaarxiv icon

A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Add code
Jan 19, 2026
Viaarxiv icon

Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views

Add code
Jan 06, 2026
Viaarxiv icon

Mixture of Attention Schemes (MoAS): Learning to Route Between MHA, GQA, and MQA

Add code
Dec 16, 2025
Viaarxiv icon

BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding

Add code
Dec 12, 2025
Viaarxiv icon

Focusing on Language: Revealing and Exploiting Language Attention Heads in Multilingual Large Language Models

Add code
Nov 10, 2025
Viaarxiv icon

The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Add code
Nov 06, 2025
Viaarxiv icon