Abstract:Diffusion models have emerged as state-of-the-art generative methods for image synthesis, yet their potential as general-purpose feature encoders remains underexplored. Trained for denoising and generation without labels, they can be interpreted as self-supervised learners that capture both low- and high-level structure. We show that a frozen diffusion backbone enables strong fine-grained recognition by probing intermediate denoising features across layers and timesteps and training a linear classifier for each pair. We evaluate this in a real-world plankton-monitoring setting with practical impact, using controlled and comparable training setups against established supervised and self-supervised baselines. Frozen diffusion features are competitive with supervised baselines and outperform other self-supervised methods in both balanced and naturally long-tailed settings. Out-of-distribution evaluations on temporally and geographically shifted plankton datasets further show that frozen diffusion features maintain strong accuracy and Macro F1 under substantial distribution shift.




Abstract:This paper focuses on estimating probability distributions over the set of 3D rotations ($SO(3)$) using deep neural networks. Learning to regress models to the set of rotations is inherently difficult due to differences in topology between $\mathbb{R}^N$ and $SO(3)$. We overcome this issue by using a neural network to output the parameters for a matrix Fisher distribution since these parameters are homeomorphic to $\mathbb{R}^9$. By using a negative log likelihood loss for this distribution we get a loss which is convex with respect to the network outputs. By optimizing this loss we improve state-of-the-art on several challenging applicable datasets, namely Pascal3D+, ModelNet10-$SO(3)$ and UPNA head pose.