Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Antonio Agudo

SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding

Apr 14, 2025

Marc Gutiérrez-Pérez, Antonio Agudo

Abstract:Sports video analysis is a key domain in computer vision, enabling detailed spatial understanding through multi-view correspondences. In this work, we introduce SoccerNet-v3D and ISSIA-3D, two enhanced and scalable datasets designed for 3D scene understanding in soccer broadcast analysis. These datasets extend SoccerNet-v3 and ISSIA by incorporating field-line-based camera calibration and multi-view synchronization, enabling 3D object localization through triangulation. We propose a monocular 3D ball localization task built upon the triangulation of ground-truth 2D ball annotations, along with several calibration and reprojection metrics to assess annotation quality on demand. Additionally, we present a single-image 3D ball localization method as a baseline, leveraging camera calibration and ball size priors to estimate the ball's position from a monocular viewpoint. To further refine 2D annotations, we introduce a bounding box optimization technique that ensures alignment with the 3D scene representation. Our proposed datasets establish new benchmarks for 3D soccer scene understanding, enhancing both spatial and temporal analysis in sports analytics. Finally, we provide code to facilitate access to our annotations and the generation pipelines for the datasets.

Via

Access Paper or Ask Questions

Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling

Mar 24, 2025

Guillem Capellera, Antonio Rubio, Luis Ferraz, Antonio Agudo

Abstract:Multi-agent trajectory modeling has primarily focused on forecasting future states, often overlooking broader tasks like trajectory completion, which are crucial for real-world applications such as correcting tracking data. Existing methods also generally predict agents' states without offering any state-wise measure of uncertainty. Moreover, popular multi-modal sampling methods lack any error probability estimates for each generated scene under the same prior observations, making it difficult to rank the predictions during inference time. We introduce U2Diff, a \textbf{unified} diffusion model designed to handle trajectory completion while providing state-wise \textbf{uncertainty} estimates jointly. This uncertainty estimation is achieved by augmenting the simple denoising loss with the negative log-likelihood of the predicted noise and propagating latent space uncertainty to the real state space. Additionally, we incorporate a Rank Neural Network in post-processing to enable \textbf{error probability} estimation for each generated mode, demonstrating a strong correlation with the error relative to ground truth. Our method outperforms the state-of-the-art solutions in trajectory completion and forecasting across four challenging sports datasets (NBA, Basketball-U, Football-U, Soccer-U), highlighting the effectiveness of uncertainty and error probability estimation. Video at https://youtu.be/ngw4D4eJToE

* Accepted to CVPR 2025 conference

Via

Access Paper or Ask Questions

Dual-Space Augmented Intrinsic-LoRA for Wind Turbine Segmentation

Dec 30, 2024

Shubh Singhal, Raül Pérez-Gonzalo, Andreas Espersen, Antonio Agudo

Figure 1 for Dual-Space Augmented Intrinsic-LoRA for Wind Turbine Segmentation

Figure 2 for Dual-Space Augmented Intrinsic-LoRA for Wind Turbine Segmentation

Figure 3 for Dual-Space Augmented Intrinsic-LoRA for Wind Turbine Segmentation

Figure 4 for Dual-Space Augmented Intrinsic-LoRA for Wind Turbine Segmentation

Abstract:Accurate segmentation of wind turbine blade (WTB) images is critical for effective assessments, as it directly influences the performance of automated damage detection systems. Despite advancements in large universal vision models, these models often underperform in domain-specific tasks like WTB segmentation. To address this, we extend Intrinsic LoRA for image segmentation, and propose a novel dual-space augmentation strategy that integrates both image-level and latent-space augmentations. The image-space augmentation is achieved through linear interpolation between image pairs, while the latent-space augmentation is accomplished by introducing a noise-based latent probabilistic model. Our approach significantly boosts segmentation accuracy, surpassing current state-of-the-art methods in WTB image segmentation.

* Authors Shubh Singhal and Ra\"ul P\'erez-Gonzalo contributed equally to this work. Accepted to ICASSP 2025

Via

Access Paper or Ask Questions

4DPV: 4D Pet from Videos by Coarse-to-Fine Non-Rigid Radiance Fields

Nov 15, 2024

Sergio M. de Paco, Antonio Agudo

Figure 1 for 4DPV: 4D Pet from Videos by Coarse-to-Fine Non-Rigid Radiance Fields

Figure 2 for 4DPV: 4D Pet from Videos by Coarse-to-Fine Non-Rigid Radiance Fields

Figure 3 for 4DPV: 4D Pet from Videos by Coarse-to-Fine Non-Rigid Radiance Fields

Figure 4 for 4DPV: 4D Pet from Videos by Coarse-to-Fine Non-Rigid Radiance Fields

Abstract:We present a coarse-to-fine neural deformation model to simultaneously recover the camera pose and the 4D reconstruction of an unknown object from multiple RGB sequences in the wild. To that end, our approach does not consider any pre-built 3D template nor 3D training data as well as controlled illumination conditions, and can sort out the problem in a self-supervised manner. Our model exploits canonical and image-variant spaces where both coarse and fine components are considered. We introduce a neural local quadratic model with spatio-temporal consistency to encode fine details that is combined with canonical embeddings in order to establish correspondences across sequences. We thoroughly validate the method on challenging scenarios with complex and real-world deformations, providing both quantitative and qualitative evaluations, an ablation study and a comparison with respect to competing approaches. Our project is available at https://github.com/smontode24/4DPV.

* 17th Asian Conference on Computer Vision (ACCV 2024)

Via

Access Paper or Ask Questions

TranSPORTmer: A Holistic Approach to Trajectory Understanding in Multi-Agent Sports

Oct 23, 2024

Guillem Capellera, Luis Ferraz, Antonio Rubio, Antonio Agudo, Francesc Moreno-Noguer

Abstract:Understanding trajectories in multi-agent scenarios requires addressing various tasks, including predicting future movements, imputing missing observations, inferring the status of unseen agents, and classifying different global states. Traditional data-driven approaches often handle these tasks separately with specialized models. We introduce TranSPORTmer, a unified transformer-based framework capable of addressing all these tasks, showcasing its application to the intricate dynamics of multi-agent sports scenarios like soccer and basketball. Using Set Attention Blocks, TranSPORTmer effectively captures temporal dynamics and social interactions in an equivariant manner. The model's tasks are guided by an input mask that conceals missing or yet-to-be-predicted observations. Additionally, we introduce a CLS extra agent to classify states along soccer trajectories, including passes, possessions, uncontrolled states, and out-of-play intervals, contributing to an enhancement in modeling trajectories. Evaluations on soccer and basketball datasets show that TranSPORTmer outperforms state-of-the-art task-specific models in player forecasting, player forecasting-imputation, ball inference, and ball imputation. https://youtu.be/8VtSRm8oGoE

* Accepted to ACCV 2024

Via

Access Paper or Ask Questions

FootBots: A Transformer-based Architecture for Motion Prediction in Soccer

Jun 28, 2024

Guillem Capellera, Luis Ferraz, Antonio Rubio, Antonio Agudo, Francesc Moreno-Noguer

Abstract:Motion prediction in soccer involves capturing complex dynamics from player and ball interactions. We present FootBots, an encoder-decoder transformer-based architecture addressing motion prediction and conditioned motion prediction through equivariance properties. FootBots captures temporal and social dynamics using set attention blocks and multi-attention block decoder. Our evaluation utilizes two datasets: a real soccer dataset and a tailored synthetic one. Insights from the synthetic dataset highlight the effectiveness of FootBots' social attention mechanism and the significance of conditioned motion prediction. Empirical results on real soccer data demonstrate that FootBots outperforms baselines in motion prediction and excels in conditioned tasks, such as predicting the players based on the ball position, predicting the offensive (defensive) team based on the ball and the defensive (offensive) team, and predicting the ball position based on all players. Our evaluation connects quantitative and qualitative findings. https://youtu.be/9kaEkfzG3L8

* Published as a conference paper at IEEE ICIP 2024

Via

Access Paper or Ask Questions

Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios

Jun 10, 2024

Raül Pérez-Gonzalo, Andreas Espersen, Antonio Agudo

Figure 1 for Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios

Figure 2 for Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios

Figure 3 for Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios

Figure 4 for Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios

Abstract:Rate-distortion optimization through neural networks has accomplished competitive results in compression efficiency and image quality. This learning-based approach seeks to minimize the compromise between compression rate and reconstructed image quality by automatically extracting and retaining crucial information, while discarding less critical details. A successful technique consists in introducing a deep hyperprior that operates within a 2-level nested latent variable model, enhancing compression by capturing complex data dependencies. This paper extends this concept by designing a generalized L-level nested generative model with a Markov chain structure. We demonstrate as L increases that a trainable prior is detrimental and explore a common dimensionality along the distinct latent variables to boost compression performance. As this structured framework can represent autoregressive coders, we outperform the hyperprior model and achieve state-of-the-art performance while reducing substantially the computational cost. Our experimental evaluation is performed on wind turbine scenarios to study its application on visual inspections

* Accepted to ICIP 2024

Via

Access Paper or Ask Questions

No Bells, Just Whistles: Sports Field Registration by Leveraging Geometric Properties

Apr 12, 2024

Marc Gutiérrez-Pérez, Antonio Agudo

Abstract:Broadcast sports field registration is traditionally addressed as a homography estimation task, mapping the visible image area to a planar field model, predominantly focusing on the main camera shot. Addressing the shortcomings of previous approaches, we propose a novel calibration pipeline enabling camera calibration using a 3D soccer field model and extending the process to assess the multiple-view nature of broadcast videos. Our approach begins with a keypoint generation pipeline derived from SoccerNet dataset annotations, leveraging the geometric properties of the court. Subsequently, we execute classical camera calibration through DLT algorithm in a minimalist fashion, without further refinement. Through extensive experimentation on real-world soccer broadcast datasets such as SoccerNet-Calibration, WorldCup 2014 and TS- WorldCup, our method demonstrates superior performance in both multiple- and single-view 3D camera calibration while maintaining competitive results in homography estimation compared to state-of-the-art techniques.

* Accepted in CVPRW 2024

Via

Access Paper or Ask Questions

Morphological Symmetries in Robotics

Feb 23, 2024

Daniel Ordoñez-Apraez, Giulio Turrisi, Vladimir Kostic, Mario Martin, Antonio Agudo, Francesc Moreno-Noguer, Massimiliano Pontil, Claudio Semini, Carlos Mastalli

Abstract:We present a comprehensive framework for studying and leveraging morphological symmetries in robotic systems. These are intrinsic properties of the robot's morphology, frequently observed in animal biology and robotics, which stem from the replication of kinematic structures and the symmetrical distribution of mass. We illustrate how these symmetries extend to the robot's state space and both proprioceptive and exteroceptive sensor measurements, resulting in the equivariance of the robot's equations of motion and optimal control policies. Thus, we recognize morphological symmetries as a relevant and previously unexplored physics-informed geometric prior, with significant implications for both data-driven and analytical methods used in modeling, control, estimation and design in robotics. For data-driven methods, we demonstrate that morphological symmetries can enhance the sample efficiency and generalization of machine learning models through data augmentation, or by applying equivariant/invariant constraints on the model's architecture. In the context of analytical methods, we employ abstract harmonic analysis to decompose the robot's dynamics into a superposition of lower-dimensional, independent dynamics. We substantiate our claims with both synthetic and real-world experiments conducted on bipedal and quadrupedal robots. Lastly, we introduce the repository MorphoSymm to facilitate the practical use of the theory and applications outlined in this work.

* 18 pages, 11 figures

Via

Access Paper or Ask Questions

VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

Dec 13, 2023

Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer

Figure 1 for VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

Figure 2 for VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

Figure 3 for VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

Figure 4 for VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

Abstract:Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage a low-dimensional statistical body model for realistic results, whereas recent non-parametric methods achieve higher precision by directly regressing the 3D coordinates of the human body. Despite their strengths, both approaches face limitations: the parameters of statistical body models pose challenges as regression targets, and predicting 3D coordinates introduces computational complexities and issues related to smoothness. In this work, we take a novel approach to address the HPSE problem. We introduce a unique method involving a low-dimensional discrete latent representation of the human mesh, framing HPSE as a classification task. Instead of predicting body model parameters or 3D vertex coordinates, our focus is on forecasting the proposed discrete latent representation, which can be decoded into a registered human mesh. This innovative paradigm offers two key advantages: firstly, predicting a low-dimensional discrete representation confines our predictions to the space of anthropomorphic poses and shapes; secondly, by framing the problem as a classification task, we can harness the discriminative power inherent in neural networks. Our proposed model, VQ-HPS, a transformer-based architecture, forecasts the discrete latent representation of the mesh, trained through minimizing a cross-entropy loss. Our results demonstrate that VQ-HPS outperforms the current state-of-the-art non-parametric approaches while yielding results as realistic as those produced by parametric methods. This highlights the significant potential of the classification approach for HPSE.

Via

Access Paper or Ask Questions