Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Empirical Analysis of Task-Induced Encoder Bias in Fréchet Audio Distance

Feb 27, 2026

Wonwoo Jeong

Share this with someone who'll enjoy it:

Abstract:Fréchet Audio Distance (FAD) is the de facto standard for evaluating text-to-audio generation, yet its scores depend on the underlying encoder's embedding space. An encoder's training task dictates which acoustic features are preserved or discarded, causing FAD to inherit systematic task-induced biases. We decompose evaluation into Recall, Precision, and Alignment (split into semantic and structural dimensions), using log-scale normalization for fair cross-encoder comparison. Controlled experiments on six encoders across two datasets reveal a four-axis trade-off: reconstruction-based AudioMAE leads precision sensitivity; ASR-trained Whisper dominates structural detection but is blind to signal degradation; classification-trained VGGish maximizes semantic detection but penalizes legitimate intra-class variation. Since no single encoder is a universal evaluator, future metrics must shift toward evaluation-native encoders intrinsically aligned with human perception.

* 6 pages, 4 figures. Submitted to Interspeech 2026. Source code and evaluation pipeline are available at: https://github.com/wonwoo-jeong/fad-encoder-bias

View paper on

Share this with someone who'll enjoy it:

Title:An Empirical Analysis of Task-Induced Encoder Bias in Fréchet Audio Distance

Paper and Code