What is Object Detection? Object detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.
Papers and Code
Aug 24, 2025
Abstract:Weakly-Supervised Change Detection (WSCD) aims to distinguish specific object changes (e.g., objects appearing or disappearing) from background variations (e.g., environmental changes due to light, weather, or seasonal shifts) in paired satellite images, relying only on paired image (i.e., image-level) classification labels. This technique significantly reduces the need for dense annotations required in fully-supervised change detection. However, as image-level supervision only indicates whether objects have changed in a scene, WSCD methods often misclassify background variations as object changes, especially in complex remote-sensing scenarios. In this work, we propose an Adversarial Class Prompting (AdvCP) method to address this co-occurring noise problem, including two phases: a) Adversarial Prompt Mining: After each training iteration, we introduce adversarial prompting perturbations, using incorrect one-hot image-level labels to activate erroneous feature mappings. This process reveals co-occurring adversarial samples under weak supervision, namely background variation features that are likely to be misclassified as object changes. b) Adversarial Sample Rectification: We integrate these adversarially prompt-activated pixel samples into training by constructing an online global prototype. This prototype is built from an exponentially weighted moving average of the current batch and all historical training data. Our AdvCP can be seamlessly integrated into current WSCD methods without adding additional inference cost. Experiments on ConvNet, Transformer, and Segment Anything Model (SAM)-based baselines demonstrate significant performance enhancements. Furthermore, we demonstrate the generalizability of AdvCP to other multi-class weakly-supervised dense prediction scenarios. Code is available at https://github.com/zhenghuizhao/AdvCP
Via

Aug 29, 2025
Abstract:Visuotactile sensors provide high-resolution tactile information but are incapable of perceiving the material features of objects. We present UltraTac, an integrated sensor that combines visuotactile imaging with ultrasound sensing through a coaxial optoacoustic architecture. The design shares structural components and achieves consistent sensing regions for both modalities. Additionally, we incorporate acoustic matching into the traditional visuotactile sensor structure, enabling integration of the ultrasound sensing modality without compromising visuotactile performance. Through tactile feedback, we dynamically adjust the operating state of the ultrasound module to achieve flexible functional coordination. Systematic experiments demonstrate three key capabilities: proximity sensing in the 3-8 cm range ($R^2=0.90$), material classification (average accuracy: 99.20%), and texture-material dual-mode object recognition achieving 92.11% accuracy on a 15-class task. Finally, we integrate the sensor into a robotic manipulation system to concurrently detect container surface patterns and internal content, which verifies its potential for advanced human-machine interaction and precise robotic manipulation.
* Accepted to IROS 2025
Via

Aug 27, 2025
Abstract:Mitotic figure (MF) detection in histopathology images is challenging due to large variations in slide scanners, staining protocols, tissue types, and the presence of artifacts. This paper presents a collection of training techniques - a bag of tricks - that enable robust, real-time MF detection across diverse domains. We build on the efficient RTMDet single stage object detector to achieve high inference speed suitable for clinical deployment. Our method addresses scanner variability and tumor heterogeneity via extensive multi-domain training data, balanced sampling, and careful augmentation. Additionally, we employ targeted, hard negative mining on necrotic and debris tissue to reduce false positives. In a grouped 5-fold cross-validation across multiple MF datasets, our model achieves an F1 score between 0.78 and 0.84. On the preliminary test set of the MItosis DOmain Generalization (MIDOG) 2025 challenge, our single-stage RTMDet-S based approach reaches an F1 of 0.81, outperforming larger models and demonstrating adaptability to new, unfamiliar domains. The proposed solution offers a practical trade-off between accuracy and speed, making it attractive for real-world clinical adoption.
Via

Aug 26, 2025
Abstract:Microsurgical anastomosis demands exceptional dexterity and visuospatial skills, underscoring the importance of comprehensive training and precise outcome assessment. Currently, methods such as the outcome-oriented anastomosis lapse index are used to evaluate this procedure. However, they often rely on subjective judgment, which can introduce biases that affect the reliability and efficiency of the assessment of competence. Leveraging three datasets from hospitals with participants at various levels, we introduce a quantitative framework that uses image-processing techniques for objective assessment of microsurgical anastomoses. The approach uses geometric modeling of errors along with a detection and scoring mechanism, enhancing the efficiency and reliability of microsurgical proficiency assessment and advancing training protocols. The results show that the geometric metrics effectively replicate expert raters' scoring for the errors considered in this work.
* 7 pages, 7 figures, accepted at EMBC2025
Via

Sep 04, 2025
Abstract:This paper introduces an unobtrusive in-situ measurement method to detect user behavior changes during arbitrary exposures in XR systems. Here, such behavior changes are typically associated with the Proteus effect or bodily affordances elicited by different avatars that the users embody in XR. We present a biometric user model based on deep metric similarity learning, which uses high-dimensional embeddings as reference vectors to identify behavior changes of individual users. We evaluate our model against two alternative approaches: a (non-learned) motion analysis based on central tendencies of movement patterns and subjective post-exposure embodiment questionnaires frequently used in various XR exposures. In a within-subject study, participants performed a fruit collection task while embodying avatars of different body heights (short, actual-height, and tall). Subjective assessments confirmed the effective manipulation of perceived body schema, while the (non-learned) objective analyses of head and hand movements revealed significant differences across conditions. Our similarity learning model trained on the motion data successfully identified the elicited behavior change for various query and reference data pairings of the avatar conditions. The approach has several advantages in comparison to existing methods: 1) In-situ measurement without additional user input, 2) generalizable and scalable motion analysis for various use cases, 3) user-specific analysis on the individual level, and 4) with a trained model, users can be added and evaluated in real time to study how avatar changes affect behavior.
Via

Aug 27, 2025
Abstract:Fine-tuning LLMs on narrowly harmful datasets can lead to behavior that is broadly misaligned with respect to human values. To understand when and how this emergent misalignment occurs, we develop a comprehensive framework for detecting and characterizing rapid transitions during fine-tuning using both distributional change detection methods as well as order parameters that are formulated in plain English and evaluated by an LLM judge. Using an objective statistical dissimilarity measure, we quantify how the phase transition that occurs during fine-tuning affects multiple aspects of the model. In particular, we assess what percentage of the total distributional change in model outputs is captured by different aspects, such as alignment or verbosity, providing a decomposition of the overall transition. We also find that the actual behavioral transition occurs later in training than indicated by the peak in the gradient norm alone. Our framework enables the automated discovery and quantification of language-based order parameters, which we demonstrate on examples ranging from knowledge questions to politics and ethics.
* 11+25 pages, 4+11 figures
Via

Aug 27, 2025
Abstract:We present a novel framework for estimating accident-prone regions in everyday indoor scenes, aimed at improving real-time risk awareness in service robots operating in human-centric environments. As robots become integrated into daily life, particularly in homes, the ability to anticipate and respond to environmental hazards is crucial for ensuring user safety, trust, and effective human-robot interaction. Our approach models object-level risk and context through a semantic graph-based propagation algorithm. Each object is represented as a node with an associated risk score, and risk propagates asymmetrically from high-risk to low-risk objects based on spatial proximity and accident relationship. This enables the robot to infer potential hazards even when they are not explicitly visible or labeled. Designed for interpretability and lightweight onboard deployment, our method is validated on a dataset with human-annotated risk regions, achieving a binary risk detection accuracy of 75%. The system demonstrates strong alignment with human perception, particularly in scenes involving sharp or unstable objects. These results underline the potential of context-aware risk reasoning to enhance robotic scene understanding and proactive safety behaviors in shared human-robot spaces. This framework could serve as a foundation for future systems that make context-driven safety decisions, provide real-time alerts, or autonomously assist users in avoiding or mitigating hazards within home environments.
* 8 pages, Accepted for IEEE RO-MAN 2025 Conference
Via

Aug 25, 2025
Abstract:Ensuring safety in autonomous driving requires a seamless integration of perception and decision making under uncertain conditions. Although computer vision (CV) models such as YOLO achieve high accuracy in detecting traffic signs and obstacles, their performance degrades in drift scenarios caused by weather variations or unseen objects. This work presents a simulated autonomous driving system that combines a context aware CV model with adaptive control using the ADORE framework. The CARLA simulator was integrated with ADORE via the ROS bridge, allowing real-time communication between perception, decision, and control modules. A simulated test case was designed in both clear and drift weather conditions to demonstrate the robust detection performance of the perception model while ADORE successfully adapted vehicle behavior to speed limits and obstacles with low response latency. The findings highlight the potential of coupling deep learning-based perception with rule-based adaptive decision making to improve automotive safety critical system.
Via

Aug 20, 2025
Abstract:Powered by advances in multiple remote sensing sensors, the production of high spatial resolution images provides great potential to achieve cost-efficient and high-accuracy agricultural inventory and analysis in an automated way. Lots of studies that aim at providing an inventory of the level of each agricultural parcel have generated many methods for Agricultural Parcel and Boundary Delineation (APBD). This review covers APBD methods for detecting and delineating agricultural parcels and systematically reviews the past and present of APBD-related research applied to remote sensing images. With the goal to provide a clear knowledge map of existing APBD efforts, we conduct a comprehensive review of recent APBD papers to build a meta-data analysis, including the algorithm, the study site, the crop type, the sensor type, the evaluation method, etc. We categorize the methods into three classes: (1) traditional image processing methods (including pixel-based, edge-based and region-based); (2) traditional machine learning methods (such as random forest, decision tree); and (3) deep learning-based methods. With deep learning-oriented approaches contributing to a majority, we further discuss deep learning-based methods like semantic segmentation-based, object detection-based and Transformer-based methods. In addition, we discuss five APBD-related issues to further comprehend the APBD domain using remote sensing data, such as multi-sensor data in APBD task, comparisons between single-task learning and multi-task learning in the APBD domain, comparisons among different algorithms and different APBD tasks, etc. Finally, this review proposes some APBD-related applications and a few exciting prospects and potential hot topics in future APBD research. We hope this review help researchers who involved in APBD domain to keep track of its development and tendency.
Via

Aug 26, 2025
Abstract:Robust robotic task execution hinges on the reliable detection of execution failures in order to trigger safe operation modes, recovery strategies, or task replanning. However, many failure detection methods struggle to provide meaningful performance when applied to a variety of real-world scenarios. In this paper, we propose a video-based failure detection approach that uses spatio-temporal knowledge in the form of the actions the robot performs and task-relevant objects within the field of view. Both pieces of information are available in most robotic scenarios and can thus be readily obtained. We demonstrate the effectiveness of our approach on three datasets that we amend, in part, with additional annotations of the aforementioned task-relevant knowledge. In light of the results, we also propose a data augmentation method that improves performance by applying variable frame rates to different parts of the video. We observe an improvement from 77.9 to 80.0 in F1 score on the ARMBench dataset without additional computational expense and an additional increase to 81.4 with test-time augmentation. The results emphasize the importance of spatio-temporal information during failure detection and suggest further investigation of suitable heuristics in future implementations. Code and annotations are available.
* Accepted at ECMR 2025
Via
