Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alejandro Monroy Muñoz

Detecting Object Tracking Failure via Sequential Hypothesis Testing

Feb 13, 2026

Alejandro Monroy Muñoz, Rajeev Verma, Alexander Timans

Abstract:Real-time online object tracking in videos constitutes a core task in computer vision, with wide-ranging applications including video surveillance, motion capture, and robotics. Deployed tracking systems usually lack formal safety assurances to convey when tracking is reliable and when it may fail, at best relying on heuristic measures of model confidence to raise alerts. To obtain such assurances we propose interpreting object tracking as a sequential hypothesis test, wherein evidence for or against tracking failures is gradually accumulated over time. Leveraging recent advancements in the field, our sequential test (formalized as an e-process) quickly identifies when tracking failures set in whilst provably containing false alerts at a desired rate, and thus limiting potentially costly re-calibration or intervention steps. The approach is computationally light-weight, requires no extra training or fine-tuning, and is in principle model-agnostic. We propose both supervised and unsupervised variants by leveraging either ground-truth or solely internal tracking information, and demonstrate its effectiveness for two established tracking models across four video benchmarks. As such, sequential testing can offer a statistically grounded and efficient mechanism to incorporate safety assurances into real-time tracking systems.

* Accepted in WACV workshop "Real World Surveillance: Applications and Challenges, 6th"

Via

Access Paper or Ask Questions

DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach

Oct 12, 2024

Daniel Gallo Fernández, Rǎzvan-Andrei Matişan, Alejandro Monroy Muñoz, Ana-Maria Vasilcoiu, Janusz Partyka, Tin Hadži Veljković, Metod Jazbec

Figure 1 for DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach

Figure 2 for DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach

Figure 3 for DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach

Figure 4 for DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach

Abstract:Diffusion models have achieved unprecedented performance in image generation, yet they suffer from slow inference due to their iterative sampling process. To address this, early-exiting has recently been proposed, where the depth of the denoising network is made adaptive based on the (estimated) difficulty of each sampling step. Here, we discover an interesting "phase transition" in the sampling process of current adaptive diffusion models: the denoising network consistently exits early during the initial sampling steps, until it suddenly switches to utilizing the full network. Based on this, we propose accelerating generation by employing a shallower denoising network in the initial sampling steps and a deeper network in the later steps. We demonstrate empirically that our dual-backbone approach, DuoDiff, outperforms existing early-exit diffusion methods in both inference speed and generation quality. Importantly, DuoDiff is easy to implement and complementary to existing approaches for accelerating diffusion.

* Accepted to NeurIPS, see https://openreview.net/forum?id=G7E4tNmmHD

Via

Access Paper or Ask Questions

Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

Jul 29, 2024

Daniel Gallo Fernández, Răzvan-Andrei Matisan, Alejandro Monroy Muñoz, Janusz Partyka

Figure 1 for Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

Figure 2 for Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

Figure 3 for Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

Figure 4 for Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

Abstract:Text-to-image generative models often present issues regarding fairness with respect to certain sensitive attributes, such as gender or skin tone. This study aims to reproduce the results presented in "ITI-GEN: Inclusive Text-to-Image Generation" by Zhang et al. (2023a), which introduces a model to improve inclusiveness in these kinds of models. We show that most of the claims made by the authors about ITI-GEN hold: it improves the diversity and quality of generated images, it is scalable to different domains, it has plug-and-play capabilities, and it is efficient from a computational point of view. However, ITI-GEN sometimes uses undesired attributes as proxy features and it is unable to disentangle some pairs of (correlated) attributes such as gender and baldness. In addition, when the number of considered attributes increases, the training time grows exponentially and ITI-GEN struggles to generate inclusive images for all elements in the joint distribution. To solve these issues, we propose using Hard Prompt Search with negative prompting, a method that does not require training and that handles negation better than vanilla Hard Prompt Search. Nonetheless, Hard Prompt Search (with or without negative prompting) cannot be used for continuous attributes that are hard to express in natural language, an area where ITI-GEN excels as it is guided by images during training. Finally, we propose combining ITI-GEN and Hard Prompt Search with negative prompting.

* Accepted to TMLR, see https://openreview.net/forum?id=d3Vj360Wi2

Via

Access Paper or Ask Questions