Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Didier Stricker

BRep Boundary and Junction Detection for CAD Reverse Engineering

Sep 21, 2024

Sk Aziz Ali, Mohammad Sadil Khan, Didier Stricker

Figure 1 for BRep Boundary and Junction Detection for CAD Reverse Engineering

Figure 2 for BRep Boundary and Junction Detection for CAD Reverse Engineering

Figure 3 for BRep Boundary and Junction Detection for CAD Reverse Engineering

Figure 4 for BRep Boundary and Junction Detection for CAD Reverse Engineering

Abstract:In machining process, 3D reverse engineering of the mechanical system is an integral, highly important, and yet time consuming step to obtain parametric CAD models from 3D scans. Therefore, deep learning-based Scan-to-CAD modeling can offer designers enormous editability to quickly modify CAD model, being able to parse all its structural compositions and design steps. In this paper, we propose a supervised boundary representation (BRep) detection network BRepDetNet from 3D scans of CC3D and ABC dataset. We have carefully annotated the 50K and 45K scans of both the datasets with appropriate topological relations (e.g., next, mate, previous) between the geometrical primitives (i.e., boundaries, junctions, loops, faces) of their BRep data structures. The proposed solution decomposes the Scan-to-CAD problem in Scan-to-BRep ensuring the right step towards feature-based modeling, and therefore, leveraging other existing BRep-to-CAD modeling methods. Our proposed Scan-to-BRep neural network learns to detect BRep boundaries and junctions by minimizing focal-loss and non-maximal suppression (NMS) during training time. Experimental results show that our BRepDetNet with NMS-Loss achieves impressive results.

* 6 pages, 5 figures

Via

Access Paper or Ask Questions

ShapeAug++: More Realistic Shape Augmentation for Event Data

Sep 17, 2024

Katharina Bendig, René Schuster, Didier Stricker

Figure 1 for ShapeAug++: More Realistic Shape Augmentation for Event Data

Figure 2 for ShapeAug++: More Realistic Shape Augmentation for Event Data

Figure 3 for ShapeAug++: More Realistic Shape Augmentation for Event Data

Figure 4 for ShapeAug++: More Realistic Shape Augmentation for Event Data

Abstract:The novel Dynamic Vision Sensors (DVSs) gained a great amount of attention recently as they are superior compared to RGB cameras in terms of latency, dynamic range and energy consumption. This is particularly of interest for autonomous applications since event cameras are able to alleviate motion blur and allow for night vision. One challenge in real-world autonomous settings is occlusion where foreground objects hinder the view on traffic participants in the background. The ShapeAug method addresses this problem by using simulated events resulting from objects moving on linear paths for event data augmentation. However, the shapes and movements lack complexity, making the simulation fail to resemble the behavior of objects in the real world. Therefore in this paper, we propose ShapeAug++, an extended version of ShapeAug which involves randomly generated polygons as well as curved movements. We show the superiority of our method on multiple DVS classification datasets, improving the top-1 accuracy by up to 3.7% compared to ShapeAug.

* accepted in Lecture Notes in Computer Science (LNCS)

Via

Access Paper or Ask Questions

GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Aug 27, 2024

Sven Oehri, Nikolas Ebert, Ahmed Abdullah, Didier Stricker, Oliver Wasenmüller

Figure 1 for GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Figure 2 for GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Figure 3 for GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Figure 4 for GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Abstract:Recent studies showcase the competitive accuracy of Vision Transformers (ViTs) in relation to Convolutional Neural Networks (CNNs), along with their remarkable robustness. However, ViTs demand a large amount of data to achieve adequate performance, which makes their application to small datasets challenging, falling behind CNNs. To overcome this, we propose GenFormer, a data augmentation strategy utilizing generated images, thereby improving transformer accuracy and robustness on small-scale image classification tasks. In our comprehensive evaluation we propose Tiny ImageNetV2, -R, and -A as new test set variants of Tiny ImageNet by transferring established ImageNet generalization and robustness benchmarks to the small-scale data domain. Similarly, we introduce MedMNIST-C and EuroSAT-C as corrupted test set variants of established fine-grained datasets in the medical and aerial domain. Through a series of experiments conducted on small datasets of various domains, including Tiny ImageNet, CIFAR, EuroSAT and MedMNIST datasets, we demonstrate the synergistic power of our method, in particular when combined with common train and test time augmentations, knowledge distillation, and architectural design choices. Additionally, we prove the effectiveness of our approach under challenging conditions with limited training data, demonstrating significant improvements in both accuracy and robustness, bridging the gap between CNNs and ViTs in the small-scale dataset domain.

* This paper has been accepted at International Conference on Pattern Recognition (ICPR), 2024

Via

Access Paper or Ask Questions

G3FA: Geometry-guided GAN for Face Animation

Aug 23, 2024

Alireza Javanmardi, Alain Pagani, Didier Stricker

Figure 1 for G3FA: Geometry-guided GAN for Face Animation

Figure 2 for G3FA: Geometry-guided GAN for Face Animation

Figure 3 for G3FA: Geometry-guided GAN for Face Animation

Figure 4 for G3FA: Geometry-guided GAN for Face Animation

Abstract:Animating human face images aims to synthesize a desired source identity in a natural-looking way mimicking a driving video's facial movements. In this context, Generative Adversarial Networks have demonstrated remarkable potential in real-time face reenactment using a single source image, yet are constrained by limited geometry consistency compared to graphic-based approaches. In this paper, we introduce Geometry-guided GAN for Face Animation (G3FA) to tackle this limitation. Our novel approach empowers the face animation model to incorporate 3D information using only 2D images, improving the image generation capabilities of the talking head synthesis model. We integrate inverse rendering techniques to extract 3D facial geometry properties, improving the feedback loop to the generator through a weighted average ensemble of discriminators. In our face reenactment model, we leverage 2D motion warping to capture motion dynamics along with orthogonal ray sampling and volume rendering techniques to produce the ultimate visual output. To evaluate the performance of our G3FA, we conducted comprehensive experiments using various evaluation protocols on VoxCeleb2 and TalkingHead benchmarks to demonstrate the effectiveness of our proposed framework compared to the state-of-the-art real-time face animation methods.

* BMVC 2024, Accepted

Via

Access Paper or Ask Questions

CLEO: Continual Learning of Evolving Ontologies

Jul 11, 2024

Shishir Muralidhara, Saqib Bukhari, Georg Schneider, Didier Stricker, René Schuster

Figure 1 for CLEO: Continual Learning of Evolving Ontologies

Figure 2 for CLEO: Continual Learning of Evolving Ontologies

Figure 3 for CLEO: Continual Learning of Evolving Ontologies

Figure 4 for CLEO: Continual Learning of Evolving Ontologies

Abstract:Continual learning (CL) addresses the problem of catastrophic forgetting in neural networks, which occurs when a trained model tends to overwrite previously learned information, when presented with a new task. CL aims to instill the lifelong learning characteristic of humans in intelligent systems, making them capable of learning continuously while retaining what was already learned. Current CL problems involve either learning new domains (domain-incremental) or new and previously unseen classes (class-incremental). However, general learning processes are not just limited to learning information, but also refinement of existing information. In this paper, we define CLEO - Continual Learning of Evolving Ontologies, as a new incremental learning setting under CL to tackle evolving classes. CLEO is motivated by the need for intelligent systems to adapt to real-world ontologies that change over time, such as those in autonomous driving. We use Cityscapes, PASCAL VOC, and Mapillary Vistas to define the task settings and demonstrate the applicability of CLEO. We highlight the shortcomings of existing CIL methods in adapting to CLEO and propose a baseline solution, called Modelling Ontologies (MoOn). CLEO is a promising new approach to CL that addresses the challenge of evolving ontologies in real-world applications. MoOn surpasses previous CL approaches in the context of CLEO.

* Accepted to ECCV 2024

Via

Access Paper or Ask Questions

EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support

Jul 03, 2024

Ramy Battrawy, René Schuster, Didier Stricker

Abstract:Recent weakly-supervised methods for scene flow estimation from LiDAR point clouds are limited to explicit reasoning on object-level. These methods perform multiple iterative optimizations for each rigid object, which makes them vulnerable to clustering robustness. In this paper, we propose our EgoFlowNet - a point-level scene flow estimation network trained in a weakly-supervised manner and without object-based abstraction. Our approach predicts a binary segmentation mask that implicitly drives two parallel branches for ego-motion and scene flow. Unlike previous methods, we provide both branches with all input points and carefully integrate the binary mask into the feature extraction and losses. We also use a shared cost volume with local refinement that is updated at multiple scales without explicit clustering or rigidity assumptions. On realistic KITTI scenes, we show that our EgoFlowNet performs better than state-of-the-art methods in the presence of ground surface points.

* This paper is published in BMVC2023 (pp. 441-443)

Via

Access Paper or Ask Questions

Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

Jun 22, 2024

Muhammad Saif Ullah Khan, Muhammad Zeshan Afzal, Didier Stricker

Figure 1 for Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

Figure 2 for Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

Figure 3 for Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

Figure 4 for Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

Abstract:Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 364k frames spanning 2635 3D models and 48 unique objects, our dataset provides depth and surface normal maps for texture-less object reconstruction. The proposed dataset includes synthetic images rendered with 3D modeling software to simulate various lighting conditions and viewing angles. It also includes a real-world subset comprising 4672 frames captured with a depth camera. Our comprehensive benchmarks, performed using a modified encoder-decoder network, showcase the dataset's capability to support the development of algorithms that robustly estimate depth and normals from RGB images. Our open-source data generation pipeline allows the dataset to be extended and adapted for future research. The dataset is publicly available at \url{https://github.com/saifkhichi96/Shape25D}.

* This dataset paper was originally written in 2022

Via

Access Paper or Ask Questions

Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

Jun 20, 2024

Muhammad Saif Ullah Khan, Tahira Shehzadi, Rabeya Noor, Didier Stricker, Muhammad Zeshan Afzal

Figure 1 for Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

Figure 2 for Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

Figure 3 for Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

Figure 4 for Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

Abstract:Automated signature verification on bank checks is critical for fraud prevention and ensuring transaction authenticity. This task is challenging due to the coexistence of signatures with other textual and graphical elements on real-world documents. Verification systems must first detect the signature and then validate its authenticity, a dual challenge often overlooked by current datasets and methodologies focusing only on verification. To address this gap, we introduce a novel dataset specifically designed for signature verification on bank checks. This dataset includes a variety of signature styles embedded within typical check elements, providing a realistic testing ground for advanced detection methods. Moreover, we propose a novel approach for writer-independent signature verification using an object detection network. Our detection-based verification method treats genuine and forged signatures as distinct classes within an object detection framework, effectively handling both detection and verification. We employ a DINO-based network augmented with a dilation module to detect and verify signatures on check images simultaneously. Our approach achieves an AP of 99.2 for genuine and 99.4 for forged signatures, a significant improvement over the DINO baseline, which scored 93.1 and 89.3 for genuine and forged signatures, respectively. This improvement highlights our dilation module's effectiveness in reducing both false positives and negatives. Our results demonstrate substantial advancements in detection-based signature verification technology, offering enhanced security and efficiency in financial document processing.

* Accepted for publication in 16th IAPR International Workshop on Document Analysis Systems 2024

Via

Access Paper or Ask Questions

Situational Instructions Database: Task Guidance in Dynamic Environments

Jun 19, 2024

Muhammad Saif Ullah Khan, Sankalp Sinha, Didier Stricker, Muhammad Zeshan Afzal

Figure 1 for Situational Instructions Database: Task Guidance in Dynamic Environments

Figure 2 for Situational Instructions Database: Task Guidance in Dynamic Environments

Figure 3 for Situational Instructions Database: Task Guidance in Dynamic Environments

Figure 4 for Situational Instructions Database: Task Guidance in Dynamic Environments

Abstract:The Situational Instructions Database (SID) addresses the need for enhanced situational awareness in artificial intelligence (AI) systems operating in dynamic environments. By integrating detailed scene graphs with dynamically generated, task-specific instructions, SID provides a novel dataset that allows AI systems to perform complex, real-world tasks with improved context sensitivity and operational accuracy. This dataset leverages advanced generative models to simulate a variety of realistic scenarios based on the 3D Semantic Scene Graphs (3DSSG) dataset, enriching it with scenario-specific information that details environmental interactions and tasks. SID facilitates the development of AI applications that can adapt to new and evolving conditions without extensive retraining, supporting research in autonomous technology and AI-driven decision-making processes. This dataset is instrumental in developing robust, context-aware AI agents capable of effectively navigating and responding to unpredictable settings. Available for research and development, SID serves as a critical resource for advancing the capabilities of intelligent systems in complex environments. Dataset available at \url{https://github.com/mindgarage/situational-instructions-database}.

* 9 pages, 6 figures

Via

Access Paper or Ask Questions

UnSupDLA: Towards Unsupervised Document Layout Analysis

Jun 10, 2024

Talha Uddin Sheikh, Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Muhammad Zeshan Afzal

Figure 1 for UnSupDLA: Towards Unsupervised Document Layout Analysis

Figure 2 for UnSupDLA: Towards Unsupervised Document Layout Analysis

Figure 3 for UnSupDLA: Towards Unsupervised Document Layout Analysis

Figure 4 for UnSupDLA: Towards Unsupervised Document Layout Analysis

Abstract:Document layout analysis is a key area in document research, involving techniques like text mining and visual analysis. Despite various methods developed to tackle layout analysis, a critical but frequently overlooked problem is the scarcity of labeled data needed for analyses. With the rise of internet use, an overwhelming number of documents are now available online, making the process of accurately labeling them for research purposes increasingly challenging and labor-intensive. Moreover, the diversity of documents online presents a unique set of challenges in maintaining the quality and consistency of these labels, further complicating document layout analysis in the digital era. To address this, we employ a vision-based approach for analyzing document layouts designed to train a network without labels. Instead, we focus on pre-training, initially generating simple object masks from the unlabeled document images. These masks are then used to train a detector, enhancing object detection and segmentation performance. The model's effectiveness is further amplified through several unsupervised training iterations, continuously refining its performance. This approach significantly advances document layout analysis, particularly precision and efficiency, without labels.

* ICDAR 2024 - Workshop

Via

Access Paper or Ask Questions