Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning

Dec 17, 2025

Sasan Sharifipour, Constantino Álvarez Casado, Manuel Lage Cañellas, Miguel Bordallo López

Figure 1 for From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning

Figure 2 for From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning

Figure 3 for From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning

Share this with someone who'll enjoy it:

Abstract:Loss functions are fundamental to learning accurate 3D point cloud models, yet common choices trade geometric fidelity for computational cost. Chamfer Distance is efficient but permits many-to-one correspondences, while Earth Mover Distance better reflects one-to-one transport at high computational cost. APML approximates transport with differentiable Sinkhorn iterations and an analytically derived temperature, but its dense formulation scales quadratically in memory. We present CUDA-APML, a sparse GPU implementation that thresholds negligible assignments and runs adaptive softmax, bidirectional symmetrization, and Sinkhorn normalization directly in COO form. This yields near-linear memory scaling and preserves gradients on the stored support, while pairwise distance evaluation remains quadratic in the current implementation. On ShapeNet and MM-Fi, CUDA-APML matches dense APML within a small tolerance while reducing peak GPU memory by 99.9%. Code available at: https://github.com/Multimodal-Sensing-Lab/apml

* 5 pages, 2 figures, 2 tables, 5 formulas, 34 references, journal paper

View paper on

Share this with someone who'll enjoy it:

Title:From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning

Paper and Code