Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Object Detection

What is Object Detection? Object detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.

Synthetic Data for Robust Runway Detection

Oct 23, 2025

Estelle Chigot, Dennis G. Wilson, Meriem Ghrib, Fabrice Jimenez, Thomas Oberlin

Abstract:Deep vision models are now mature enough to be integrated in industrial and possibly critical applications such as autonomous navigation. Yet, data collection and labeling to train such models requires too much efforts and costs for a single company or product. This drawback is more significant in critical applications, where training data must include all possible conditions including rare scenarios. In this perspective, generating synthetic images is an appealing solution, since it allows a cheap yet reliable covering of all the conditions and environments, if the impact of the synthetic-to-real distribution shift is mitigated. In this article, we consider the case of runway detection that is a critical part in autonomous landing systems developed by aircraft manufacturers. We propose an image generation approach based on a commercial flight simulator that complements a few annotated real images. By controlling the image generation and the integration of real and synthetic data, we show that standard object detection models can achieve accurate prediction. We also evaluate their robustness with respect to adverse conditions, in our case nighttime images, that were not represented in the real data, and show the interest of using a customized domain adaptation strategy.

* Computer Analysis of Images and Patterns. CAIP 2025. Lecture Notes in Computer Science, vol 15621. Springer, Cham

Via

Access Paper or Ask Questions

Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

Oct 26, 2025

Lexiang Xiong, Chengyu Liu, Jingwen Ye, Yan Liu, Yuecong Xu

Abstract:Concept erasure in text-to-image diffusion models is crucial for mitigating harmful content, yet existing methods often compromise generative quality. We introduce Semantic Surgery, a novel training-free, zero-shot framework for concept erasure that operates directly on text embeddings before the diffusion process. It dynamically estimates the presence of target concepts in a prompt and performs a calibrated vector subtraction to neutralize their influence at the source, enhancing both erasure completeness and locality. The framework includes a Co-Occurrence Encoding module for robust multi-concept erasure and a visual feedback loop to address latent concept persistence. As a training-free method, Semantic Surgery adapts dynamically to each prompt, ensuring precise interventions. Extensive experiments on object, explicit content, artistic style, and multi-celebrity erasure tasks show our method significantly outperforms state-of-the-art approaches. We achieve superior completeness and robustness while preserving locality and image quality (e.g., 93.58 H-score in object erasure, reducing explicit content to just 1 instance, and 8.09 H_a in style erasure with no quality degradation). This robustness also allows our framework to function as a built-in threat detection system, offering a practical solution for safer text-to-image generation.

* Accepted to the 39th Conference on Neural Information Processing Systems (NeurIPS 2025). Code is available at https://github.com/Lexiang-Xiong/Semantic-Surgery

Via

Access Paper or Ask Questions

Radar-Camera Fused Multi-Object Tracking: Online Calibration and Common Feature

Oct 23, 2025

Lei Cheng, Siyang Cao

Abstract:This paper presents a Multi-Object Tracking (MOT) framework that fuses radar and camera data to enhance tracking efficiency while minimizing manual interventions. Contrary to many studies that underutilize radar and assign it a supplementary role--despite its capability to provide accurate range/depth information of targets in a world 3D coordinate system--our approach positions radar in a crucial role. Meanwhile, this paper utilizes common features to enable online calibration to autonomously associate detections from radar and camera. The main contributions of this work include: (1) the development of a radar-camera fusion MOT framework that exploits online radar-camera calibration to simplify the integration of detection results from these two sensors, (2) the utilization of common features between radar and camera data to accurately derive real-world positions of detected objects, and (3) the adoption of feature matching and category-consistency checking to surpass the limitations of mere position matching in enhancing sensor association accuracy. To the best of our knowledge, we are the first to investigate the integration of radar-camera common features and their use in online calibration for achieving MOT. The efficacy of our framework is demonstrated by its ability to streamline the radar-camera mapping process and improve tracking precision, as evidenced by real-world experiments conducted in both controlled environments and actual traffic scenarios. Code is available at https://github.com/radar-lab/Radar_Camera_MOT

* accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS)

Via

Access Paper or Ask Questions

Probe-based Fine-tuning for Reducing Toxicity

Oct 24, 2025

Jan Wehner, Mario Fritz

Abstract:Probes trained on model activations can detect undesirable behaviors like deception or biases that are difficult to identify from outputs alone. This makes them useful detectors to identify misbehavior. Furthermore, they are also valuable training signals, since they not only reward outputs, but also good internal processes for arriving at that output. However, training against interpretability tools raises a fundamental concern: when a monitor becomes a training target, it may cease to be reliable (Goodhart's Law). We propose two methods for training against probes based on Supervised Fine-tuning and Direct Preference Optimization. We conduct an initial exploration of these methods in a testbed for reducing toxicity and evaluate the amount by which probe accuracy drops when training against them. To retain the accuracy of probe-detectors after training, we attempt (1) to train against an ensemble of probes, (2) retain held-out probes that aren't used for training, and (3) retrain new probes after training. First, probe-based preference optimization unexpectedly preserves probe detectability better than classifier-based methods, suggesting the preference learning objective incentivizes maintaining rather than obfuscating relevant representations. Second, probe diversity provides minimal practical benefit - simply retraining probes after optimization recovers high detection accuracy. Our findings suggest probe-based training can be viable for certain alignment methods, though probe ensembles are largely unnecessary when retraining is feasible.

Via

Access Paper or Ask Questions

Drone Carry-on Weight and Wind Flow Assessment via Micro-Doppler Analysis

Oct 26, 2025

Dmytro Vovchuk, Oleg Torgovitsky, Mykola Khobzei, Vladyslav Tkach, Sergey Geyman, Anton Kharchevskii, Andrey Sheleg, Toms Salgals, Vjaceslavs Bobrovs, Shai Gizach(+4 more)

Abstract:Remote monitoring of drones has become a global objective due to emerging applications in national security and managing aerial delivery traffic. Despite their relatively small size, drones can carry significant payloads, which require monitoring, especially in cases of unauthorized transportation of dangerous goods. A drone's flight dynamics heavily depend on outdoor wind conditions and the carry-on weight, which affect the tilt angle of a drone's body and the rotation velocity of the blades. A surveillance radar can capture both effects, provided a sufficient signal-to-noise ratio for the received echoes and an adjusted postprocessing detection algorithm. Here, we conduct a systematic study to demonstrate that micro-Doppler analysis enables the disentanglement of the impacts of wind and weight on a hovering drone. The physics behind the effect is related to the flight controller, as the way the drone counteracts weight and wind differs. When the payload is balanced, it imposes an additional load symmetrically on all four rotors, causing them to rotate faster, thereby generating a blade-related micro-Doppler shift at a higher frequency. However, the impact of the wind is different. The wind attempts to displace the drone, and to counteract this, the drone tilts to the side. As a result, the forward and rear rotors rotate at different velocities to maintain the tilt angle of the drone body relative to the airflow direction. This causes the splitting in the micro-Doppler spectra. By performing a set of experiments in a controlled environment, specifically, an anechoic chamber for electromagnetic isolation and a wind tunnel for imposing deterministic wind conditions, we demonstrate that both wind and payload details can be extracted using a simple deterministic algorithm based on branching in the micro-Doppler spectra.

Via

Access Paper or Ask Questions

Kinematic Analysis and Integration of Vision Algorithms for a Mobile Manipulator Employed Inside a Self-Driving Laboratory

Oct 21, 2025

Shifa Sulaiman, Tobias Busk Jensen, Stefan Hein Bengtson, Simon Bøgh

Abstract:Recent advances in robotics and autonomous systems have broadened the use of robots in laboratory settings, including automated synthesis, scalable reaction workflows, and collaborative tasks in self-driving laboratories (SDLs). This paper presents a comprehensive development of a mobile manipulator designed to assist human operators in such autonomous lab environments. Kinematic modeling of the manipulator is carried out based on the Denavit Hartenberg (DH) convention and inverse kinematics solution is determined to enable precise and adaptive manipulation capabilities. A key focus of this research is enhancing the manipulator ability to reliably grasp textured objects as a critical component of autonomous handling tasks. Advanced vision-based algorithms are implemented to perform real-time object detection and pose estimation, guiding the manipulator in dynamic grasping and following tasks. In this work, we integrate a vision method that combines feature-based detection with homography-driven pose estimation, leveraging depth information to represent an object pose as a $2$D planar projection within $3$D space. This adaptive capability enables the system to accommodate variations in object orientation and supports robust autonomous manipulation across diverse environments. By enabling autonomous experimentation and human-robot collaboration, this work contributes to the scalability and reproducibility of next-generation chemical laboratories

* International Journal of Intelligent Robotics and Applications 2025
* International Journal of Intelligent Robotics and Applications 2025

Via

Access Paper or Ask Questions

Hybrid Genetic Algorithm for Optimal User Order Routing: Multi-Objective Solver Optimization in CoW Protocol Batch Auctions

Oct 24, 2025

Mitchell Marfinetz

Abstract:CoW Protocol batch auctions aggregate user intents and rely on solvers to find optimal execution paths that maximize user surplus across heterogeneous automated market makers (AMMs) under stringent auction deadlines. Deterministic single-objective heuristics that optimize only expected output frequently fail to exploit split-flow opportunities across multiple parallel paths and to internalize gas, slippage, and execution risk constraints in a unified search. We apply evolutionary multi-objective optimization to this blockchain routing problem, proposing a hybrid genetic algorithm (GA) architecture for real-time solver optimization that combines a production-grade, multi-objective NSGA-II engine with adaptive instance profiling and deterministic baselines. Our core engine encodes variable-length path sets with continuous split ratios and evolves candidate route-and-volume allocations under a Pareto objective vector F = (user surplus, -gas, -slippage, -risk), enabling principled trade-offs and anytime operation within the auction deadline. An adaptive controller selects between GA and a deterministic dual-decomposition optimizer with Bellman-Ford based negative-cycle detection, with a guarantee to never underperform the baseline. The open-source system integrates six protection layers and passes 8/8 tests, validating safety and correctness. In a 14-stratum benchmark (30 seeds each), the hybrid approach yields absolute user-surplus gains of approximately 0.40-9.82 ETH on small-to-medium orders, while large high-fragmentation orders are unprofitable across gas regimes. Convergence occurs in about 0.5 s median (soft capped at 1.0 s) within a 2-second limit. We are not aware of an openly documented multi-objective GA with end-to-end safety for real-time DEX routing.

Via

Access Paper or Ask Questions

REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects

Oct 24, 2025

Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon, Karim Jerbi, Giulia Lioi

Abstract:Foundation models have transformed AI by reducing reliance on task-specific data through large-scale pretraining. While successful in language and vision, their adoption in EEG has lagged due to the heterogeneity of public datasets, which are collected under varying protocols, devices, and electrode configurations. Existing EEG foundation models struggle to generalize across these variations, often restricting pretraining to a single setup, resulting in suboptimal performance, in particular under linear probing. We present REVE (Representation for EEG with Versatile Embeddings), a pretrained model explicitly designed to generalize across diverse EEG signals. REVE introduces a novel 4D positional encoding scheme that enables it to process signals of arbitrary length and electrode arrangement. Using a masked autoencoding objective, we pretrain REVE on over 60,000 hours of EEG data from 92 datasets spanning 25,000 subjects, representing the largest EEG pretraining effort to date. REVE achieves state-of-the-art results on 10 downstream EEG tasks, including motor imagery classification, seizure detection, sleep staging, cognitive load estimation, and emotion recognition. With little to no fine-tuning, it demonstrates strong generalization, and nuanced spatio-temporal modeling. We release code, pretrained weights, and tutorials to support standardized EEG research and accelerate progress in clinical neuroscience.

* Code available at: https://brain-bzh.github.io/reve/

Via

Access Paper or Ask Questions

Mixing Configurations for Downstream Prediction

Oct 22, 2025

Juntang Wang, Hao Wu, Runkun Guo, Yihan Wang, Dongmian Zou, Shixin Xu

Abstract:Humans possess an innate ability to group objects by similarity, a cognitive mechanism that clustering algorithms aim to emulate. Recent advances in community detection have enabled the discovery of configurations -- valid hierarchical clusterings across multiple resolution scales -- without requiring labeled data. In this paper, we formally characterize these configurations and identify similar emergent structures in register tokens within Vision Transformers. Unlike register tokens, configurations exhibit lower redundancy and eliminate the need for ad hoc selection. They can be learned through unsupervised or self-supervised methods, yet their selection or composition remains specific to the downstream task and input. Building on these insights, we introduce GraMixC, a plug-and-play module that extracts configurations, aligns them using our Reverse Merge/Split (RMS) technique, and fuses them via attention heads before forwarding them to any downstream predictor. On the DSN1 16S rRNA cultivation-media prediction task, GraMixC improves the R2 score from 0.6 to 0.9 across multiple methods, setting a new state of the art. We further validate GraMixC on standard tabular benchmarks, where it consistently outperforms single-resolution and static-feature baselines.

* 16 pages,13 figures, conference paper. Equal contribution: Juntang Wang and Hao Wu

Via

Access Paper or Ask Questions

Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent

Oct 24, 2025

Christy Li, Josep Lopez Camuñas, Jake Thomas Touchet, Jacob Andreas, Agata Lapedriza, Antonio Torralba, Tamar Rott Shaham

Abstract:When a vision model performs image recognition, which visual attributes drive its predictions? Detecting unintended reliance on specific visual features is critical for ensuring model robustness, preventing overfitting, and avoiding spurious correlations. We introduce an automated framework for detecting such dependencies in trained vision models. At the core of our method is a self-reflective agent that systematically generates and tests hypotheses about visual attributes that a model may rely on. This process is iterative: the agent refines its hypotheses based on experimental outcomes and uses a self-evaluation protocol to assess whether its findings accurately explain model behavior. When inconsistencies arise, the agent self-reflects over its findings and triggers a new cycle of experimentation. We evaluate our approach on a novel benchmark of 130 models designed to exhibit diverse visual attribute dependencies across 18 categories. Our results show that the agent's performance consistently improves with self-reflection, with a significant performance increase over non-reflective baselines. We further demonstrate that the agent identifies real-world visual attribute dependencies in state-of-the-art models, including CLIP's vision encoder and the YOLOv8 object detector.

* 32 pages, 10 figures, Neurips 2025

Via

Access Paper or Ask Questions

Topic:Object Detection

Papers and Code