Abstract:Many existing methods for 3D cuboid annotation of vehicles rely on expensive and carefully calibrated camera-LiDAR or stereo setups, limiting their accessibility for large-scale data collection. We introduce ToosiCubix, a simple yet powerful approach for annotating ground-truth cuboids using only monocular images and intrinsic camera parameters. Our method requires only about 10 user clicks per vehicle, making it highly practical for adding 3D annotations to existing datasets originally collected without specialized equipment. By annotating specific features (e.g., wheels, car badge, symmetries) across different vehicle parts, we accurately estimate each vehicle's position, orientation, and dimensions up to a scale ambiguity (8 DoF). The geometric constraints are formulated as an optimization problem, which we solve using a coordinate descent strategy, alternating between Perspective-n-Points (PnP) and least-squares subproblems. To handle common ambiguities such as scale and unobserved dimensions, we incorporate probabilistic size priors, enabling 9 DoF cuboid placements. We validate our annotations against the KITTI and Cityscapes3D datasets, demonstrating that our method offers a cost-effective and scalable solution for high-quality 3D cuboid annotation.
Abstract:In this paper, we introduce a novel framework for memory-efficient and privacy-preserving continual learning in 3D object classification. Unlike conventional memory-based approaches in continual learning that require storing numerous exemplars, our method constructs a compact shape model for each class, retaining only the mean shape along with a few key modes of variation. This strategy not only enables the generation of diverse training samples while drastically reducing memory usage but also enhances privacy by eliminating the need to store original data. To further improve model robustness against input variations, an issue common in 3D domains due to the absence of strong backbones and limited training data, we incorporate Gradient Mode Regularization. This technique enhances model stability and broadens classification margins, resulting in accuracy improvements. We validate our approach through extensive experiments on the ModelNet40, ShapeNet, and ScanNet datasets, where we achieve state-of-the-art performance. Notably, our method consumes only 15% of the memory required by competing methods on the ModelNet40 and ShapeNet, while achieving comparable performance on the challenging ScanNet dataset with just 8.5% of the memory. These results underscore the scalability, effectiveness, and privacy-preserving strengths of our framework for 3D object classification.
Abstract:We introduce a novel framework for Continual Learning in 3D object classification (CL3D). Our approach is based on the selection of prototypes from each class using spectral clustering. For non-Euclidean data such as point clouds, spectral clustering can be employed as long as one can define a distance measure between pairs of samples. Choosing the appropriate distance measure enables us to leverage 3D geometric characteristics to identify representative prototypes for each class. We explore the effectiveness of clustering in the input space (3D points), local feature space (1024-dimensional points), and global feature space. We conduct experiments on the ModelNet40, ShapeNet, and ScanNet datasets, achieving state-of-the-art accuracy exclusively through the use of input space features. By leveraging the combined input, local, and global features, we have improved the state-of-the-art on ModelNet and ShapeNet, utilizing nearly half the memory used by competing approaches. For the challenging ScanNet dataset, our method enhances accuracy by 4.1% while consuming just 28% of the memory used by our competitors, demonstrating the scalability of our approach.