Alert button
Picture for Abdulmotaleb El Saddik

Abdulmotaleb El Saddik

Alert button

3D Open-vocabulary Segmentation with Foundation Models

May 24, 2023
Kunhao Liu, Fangneng Zhan, Jiahui Zhang, Muyu Xu, Yingchen Yu, Abdulmotaleb El Saddik, Christian Theobalt, Eric Xing, Shijian Lu

Figure 1 for 3D Open-vocabulary Segmentation with Foundation Models
Figure 2 for 3D Open-vocabulary Segmentation with Foundation Models
Figure 3 for 3D Open-vocabulary Segmentation with Foundation Models
Figure 4 for 3D Open-vocabulary Segmentation with Foundation Models

Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature significantly as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting the open-vocabulary multimodal knowledge and object reasoning capability of pre-trained foundation models CLIP and DINO, without necessitating any fine-tuning. Specifically, we distill open-vocabulary visual and textual knowledge from CLIP into a neural radiance field (NeRF) which effectively lifts 2D features into view-consistent 3D segmentation. Furthermore, we introduce the Relevancy-Distribution Alignment loss and Feature-Distribution Alignment loss to respectively mitigate the ambiguities of CLIP features and distill precise object boundaries from DINO features, eliminating the need for segmentation annotations during training. Extensive experiments show that our method even outperforms fully supervised models trained with segmentation annotations, suggesting that 3D open-vocabulary segmentation can be effectively learned from 2D images and text-image pairs.

* code is available at https://github.com/Kunhao-Liu/3D-OVS 
Viaarxiv icon

Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial Training

Apr 22, 2023
Kudaibergen Abutalip, Numan Saeed, Mustaqeem Khan, Abdulmotaleb El Saddik

Figure 1 for Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial Training
Figure 2 for Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial Training
Figure 3 for Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial Training
Figure 4 for Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial Training

Variability in staining protocols, such as different slide preparation techniques, chemicals, and scanner configurations, can result in a diverse set of whole slide images (WSIs). This distribution shift can negatively impact the performance of deep learning models on unseen samples, presenting a significant challenge for developing new computational pathology applications. In this study, we propose a method for improving the generalizability of convolutional neural networks (CNNs) to stain changes in a single-source setting for semantic segmentation. Recent studies indicate that style features mainly exist as covariances in earlier network layers. We design a channel attention mechanism based on these findings that detects stain-specific features and modify the previously proposed stain-invariant training scheme. We reweigh the outputs of earlier layers and pass them to the stain-adversarial training branch. We evaluate our method on multi-center, multi-stain datasets and demonstrate its effectiveness through interpretability analysis. Our approach achieves substantial improvements over baselines and competitive performance compared to other methods, as measured by various evaluation metrics. We also show that combining our method with stain augmentation leads to mutually beneficial results and outperforms other techniques. Overall, our study makes significant contributions to the field of computational pathology.

Viaarxiv icon

3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds

Apr 03, 2023
Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

Figure 1 for 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds
Figure 2 for 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds
Figure 3 for 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds
Figure 4 for 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds

Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. We study all-weather 3DSS modeling under two setups: 1) domain adaptive 3DSS that adapts from normal-weather data to adverse-weather data; 2) domain generalizable 3DSS that learns all-weather 3DSS models from normal-weather data. Our studies reveal the challenge while existing 3DSS methods encounter adverse-weather data, showing the great value of SemanticSTF in steering the future endeavor along this very meaningful research direction. In addition, we design a domain randomization technique that alternatively randomizes the geometry styles of point clouds and aggregates their embeddings, ultimately leading to a generalizable model that can improve 3DSS under various adverse weather effectively. The SemanticSTF and related codes are available at \url{https://github.com/xiaoaoran/SemanticSTF}.

* CVPR2023 
Viaarxiv icon

StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields

Mar 24, 2023
Kunhao Liu, Fangneng Zhan, Yiwen Chen, Jiahui Zhang, Yingchen Yu, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

Figure 1 for StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
Figure 2 for StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
Figure 3 for StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
Figure 4 for StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields

3D style transfer aims to render stylized novel views of a 3D scene with multi-view consistency. However, most existing work suffers from a three-way dilemma over accurate geometry reconstruction, high-quality stylization, and being generalizable to arbitrary new styles. We propose StyleRF (Style Radiance Fields), an innovative 3D style transfer technique that resolves the three-way dilemma by performing style transformation within the feature space of a radiance field. StyleRF employs an explicit grid of high-level features to represent 3D scenes, with which high-fidelity geometry can be reliably restored via volume rendering. In addition, it transforms the grid features according to the reference style which directly leads to high-quality zero-shot style transfer. StyleRF consists of two innovative designs. The first is sampling-invariant content transformation that makes the transformation invariant to the holistic statistics of the sampled 3D points and accordingly ensures multi-view consistency. The second is deferred style transformation of 2D feature maps which is equivalent to the transformation of 3D points but greatly reduces memory footprint without degrading multi-view consistency. Extensive experiments show that StyleRF achieves superior 3D stylization quality with precise geometry reconstruction and it can generalize to various new styles in a zero-shot manner.

* Accepted to CVPR 2023. Project website: https://kunhao-liu.github.io/StyleRF/ 
Viaarxiv icon

Development of an automatic 3D human head scanning-printing system

Dec 26, 2022
Longyu Zhang, Bote Han, Haiwei Dong, Abdulmotaleb El Saddik

Three-dimensional (3D) technologies have been developing rapidly recent years, and have influenced industrial, medical, cultural, and many other fields. In this paper, we introduce an automatic 3D human head scanning-printing system, which provides a complete pipeline to scan, reconstruct, select, and finally print out physical 3D human heads. To enhance the accuracy of our system, we developed a consumer-grade composite sensor (including a gyroscope, an accelerometer, a digital compass, and a Kinect v2 depth sensor) as our sensing device. This sensing device is then mounted on a robot, which automatically rotates around the human subject with approximate 1-meter radius, to capture the full-view information. The data streams are further processed and fused into a 3D model of the subject using a tablet located on the robot. In addition, an automatic selection method, based on our specific system configurations, is proposed to select the head portion. We evaluated the accuracy of the proposed system by comparing our generated 3D head models, from both standard human head model and real human subjects, with the ones reconstructed from FastSCAN and Cyberware commercial laser scanning systems through computing and visualizing Hausdorff distances. Computational cost is also provided to further assess our proposed system.

* Multimedia Tools and Applications, vol. 76, no. 3, pp. 4381-4403, 2017  
Viaarxiv icon

A Combined Approach Toward Consistent Reconstructions of Indoor Spaces Based on 6D RGB-D Odometry and KinectFusion

Dec 25, 2022
Nadia Figueroa, Haiwei Dong, Abdulmotaleb El Saddik

Figure 1 for A Combined Approach Toward Consistent Reconstructions of Indoor Spaces Based on 6D RGB-D Odometry and KinectFusion
Figure 2 for A Combined Approach Toward Consistent Reconstructions of Indoor Spaces Based on 6D RGB-D Odometry and KinectFusion
Figure 3 for A Combined Approach Toward Consistent Reconstructions of Indoor Spaces Based on 6D RGB-D Odometry and KinectFusion
Figure 4 for A Combined Approach Toward Consistent Reconstructions of Indoor Spaces Based on 6D RGB-D Odometry and KinectFusion

We propose a 6D RGB-D odometry approach that finds the relative camera pose between consecutive RGB-D frames by keypoint extraction and feature matching both on the RGB and depth image planes. Furthermore, we feed the estimated pose to the highly accurate KinectFusion algorithm, which uses a fast ICP (Iterative Closest Point) to fine-tune the frame-to-frame relative pose and fuse the depth data into a global implicit surface. We evaluate our method on a publicly available RGB-D SLAM benchmark dataset by Sturm et al. The experimental results show that our proposed reconstruction method solely based on visual odometry and KinectFusion outperforms the state-of-the-art RGB-D SLAM system accuracy. Moreover, our algorithm outputs a ready-to-use polygon mesh (highly suitable for creating 3D virtual worlds) without any postprocessing steps.

* ACM Trans. Intell. Syst., vol. 6, no. 2, pp. 14:1-10, 2015  
Viaarxiv icon

Development of a Self-Calibrated Motion Capture System by Nonlinear Trilateration of Multiple Kinects v2

Dec 25, 2022
Bowen Yang, Haiwei Dong, Abdulmotaleb El Saddik

Figure 1 for Development of a Self-Calibrated Motion Capture System by Nonlinear Trilateration of Multiple Kinects v2
Figure 2 for Development of a Self-Calibrated Motion Capture System by Nonlinear Trilateration of Multiple Kinects v2
Figure 3 for Development of a Self-Calibrated Motion Capture System by Nonlinear Trilateration of Multiple Kinects v2
Figure 4 for Development of a Self-Calibrated Motion Capture System by Nonlinear Trilateration of Multiple Kinects v2

In this paper, a Kinect-based distributed and real-time motion capture system is developed. A trigonometric method is applied to calculate the relative position of Kinect v2 sensors with a calibration wand and register the sensors' positions automatically. By combining results from multiple sensors with a nonlinear least square method, the accuracy of the motion capture is optimized. Moreover, to exclude inaccurate results from sensors, a computational geometry is applied in the occlusion approach, which discovers occluded joint data. The synchronization approach is based on an NTP protocol that synchronizes the time between the clocks of a server and clients dynamically, ensuring that the proposed system is a real-time system. Experiments for validating the proposed system are conducted from the perspective of calibration, occlusion, accuracy, and efficiency. Furthermore, to demonstrate the practical performance of our system, a comparison of previously developed motion capture systems (the linear trilateration approach and the geometric trilateration approach) with the benchmark OptiTrack system is conducted, therein showing that the accuracy of our proposed system is $38.3\%$ and 24.1% better than the two aforementioned trilateration systems, respectively.

* IEEE Sensors Journal, vol. 17, no. 8, pp. 2481-2491, 2017  
Viaarxiv icon

EVM-CNN: Real-Time Contactless Heart Rate Estimation from Facial Video

Dec 25, 2022
Ying Qiu, Yang Liu, Juan Arteaga-Falconi, Haiwei Dong, Abdulmotaleb El Saddik

Figure 1 for EVM-CNN: Real-Time Contactless Heart Rate Estimation from Facial Video
Figure 2 for EVM-CNN: Real-Time Contactless Heart Rate Estimation from Facial Video
Figure 3 for EVM-CNN: Real-Time Contactless Heart Rate Estimation from Facial Video
Figure 4 for EVM-CNN: Real-Time Contactless Heart Rate Estimation from Facial Video

With the increase in health consciousness, noninvasive body monitoring has aroused interest among researchers. As one of the most important pieces of physiological information, researchers have remotely estimated the heart rate (HR) from facial videos in recent years. Although progress has been made over the past few years, there are still some limitations, like the processing time increasing with accuracy and the lack of comprehensive and challenging datasets for use and comparison. Recently, it was shown that HR information can be extracted from facial videos by spatial decomposition and temporal filtering. Inspired by this, a new framework is introduced in this paper to remotely estimate the HR under realistic conditions by combining spatial and temporal filtering and a convolutional neural network. Our proposed approach shows better performance compared with the benchmark on the MMSE-HR dataset in terms of both the average HR estimation and short-time HR estimation. High consistency in short-time HR estimation is observed between our method and the ground truth.

* IEEE Transactions on Multimedia, vol. 21, no. 7, pp. 1778-1787, 2019  
Viaarxiv icon

Learning to Estimate 3D Human Pose from Point Cloud

Dec 25, 2022
Yufan Zhou, Haiwei Dong, Abdulmotaleb El Saddik

Figure 1 for Learning to Estimate 3D Human Pose from Point Cloud
Figure 2 for Learning to Estimate 3D Human Pose from Point Cloud
Figure 3 for Learning to Estimate 3D Human Pose from Point Cloud
Figure 4 for Learning to Estimate 3D Human Pose from Point Cloud

3D pose estimation is a challenging problem in computer vision. Most of the existing neural-network-based approaches address color or depth images through convolution networks (CNNs). In this paper, we study the task of 3D human pose estimation from depth images. Different from the existing CNN-based human pose estimation method, we propose a deep human pose network for 3D pose estimation by taking the point cloud data as input data to model the surface of complex human structures. We first cast the 3D human pose estimation from 2D depth images to 3D point clouds and directly predict the 3D joint position. Our experiments on two public datasets show that our approach achieves higher accuracy than previous state-of-art methods. The reported results on both ITOP and EVAL datasets demonstrate the effectiveness of our method on the targeted tasks.

* IEEE Sensors Journal, vol. 20, no. 20, pp. 12334-12342, 2020  
Viaarxiv icon