Alert button
Picture for Mingtao Feng

Mingtao Feng

Alert button

Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation

Aug 05, 2023
Zijie Wu, Yaonan Wang, Mingtao Feng, He Xie, Ajmal Mian

Figure 1 for Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation
Figure 2 for Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation
Figure 3 for Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation
Figure 4 for Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation

Diffusion probabilistic models have achieved remarkable success in text guided image generation. However, generating 3D shapes is still challenging due to the lack of sufficient data containing 3D models along with their descriptions. Moreover, text based descriptions of 3D shapes are inherently ambiguous and lack details. In this paper, we propose a sketch and text guided probabilistic diffusion model for colored point cloud generation that conditions the denoising process jointly with a hand drawn sketch of the object and its textual description. We incrementally diffuse the point coordinates and color values in a joint diffusion process to reach a Gaussian distribution. Colored point cloud generation thus amounts to learning the reverse diffusion process, conditioned by the sketch and text, to iteratively recover the desired shape and color. Specifically, to learn effective sketch-text embedding, our model adaptively aggregates the joint embedding of text prompt and the sketch based on a capsule attention network. Our model uses staged diffusion to generate the shape and then assign colors to different parts conditioned on the appearance prompt while preserving precise shapes from the first stage. This gives our model the flexibility to extend to multiple tasks, such as appearance re-editing and part segmentation. Experimental results demonstrate that our model outperforms recent state-of-the-art in point cloud generation.

* Accepted by ICCV 2023 
Viaarxiv icon

DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds

Mar 08, 2023
Yong He, Hongshan Yu, Zhengeng Yang, Wei Sun, Mingtao Feng, Ajmal Mian

Figure 1 for DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds
Figure 2 for DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds
Figure 3 for DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds
Figure 4 for DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds

Local features and contextual dependencies are crucial for 3D point cloud analysis. Many works have been devoted to designing better local convolutional kernels that exploit the contextual dependencies. However, current point convolutions lack robustness to varying point cloud density. Moreover, contextual modeling is dominated by non-local or self-attention models which are computationally expensive. To solve these problems, we propose density adaptive convolution, coined DAConv. The key idea is to adaptively learn the convolutional weights from geometric connections obtained from the point density and position. To extract precise context dependencies with fewer computations, we propose an interactive attention module (IAM) that embeds spatial information into channel attention along different spatial directions. DAConv and IAM are integrated in a hierarchical network architecture to achieve local density and contextual direction-aware learning for point cloud analysis. Experiments show that DAConv is significantly more robust to point density compared to existing methods and extensive comparisons on challenging 3D point cloud datasets show that our network achieves state-of-the-art classification results of 93.6% on ModelNet40, competitive semantic segmentation results of 68.71% mIoU on S3DIS and part segmentation results of 86.7% mIoU on ShapeNet.

* 9 
Viaarxiv icon

Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection

Apr 28, 2022
Mingtao Feng, Kendong Liu, Liang Zhang, Hongshan Yu, Yaonan Wang, Ajmal Mian

Figure 1 for Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection
Figure 2 for Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection
Figure 3 for Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection
Figure 4 for Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection

Saliency detection with light field images is becoming attractive given the abundant cues available, however, this comes at the expense of large-scale pixel level annotated data which is expensive to generate. In this paper, we propose to learn light field saliency from pixel-level noisy labels obtained from unsupervised hand crafted featured based saliency methods. Given this goal, a natural question is: can we efficiently incorporate the relationships among light field cues while identifying clean labels in a unified framework? We address this question by formulating the learning as a joint optimization of intra light field features fusion stream and inter scenes correlation stream to generate the predictions. Specially, we first introduce a pixel forgetting guided fusion module to mutually enhance the light field features and exploit pixel consistency across iterations to identify noisy pixels. Next, we introduce a cross scene noise penalty loss for better reflecting latent structures of training data and enabling the learning to be invariant to noise. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our framework showing that it learns saliency prediction comparable to state-of-the-art fully supervised light field saliency methods. Our code is available at https://github.com/OLobbCode/NoiseLF.

* Accepted by CVPR 2022 
Viaarxiv icon

Scene Graph Generation: A Comprehensive Survey

Jan 03, 2022
Guangming Zhu, Liang Zhang, Youliang Jiang, Yixuan Dang, Haoran Hou, Peiyi Shen, Mingtao Feng, Xia Zhao, Qiguang Miao, Syed Afaq Ali Shah, Mohammed Bennamoun

Figure 1 for Scene Graph Generation: A Comprehensive Survey
Figure 2 for Scene Graph Generation: A Comprehensive Survey
Figure 3 for Scene Graph Generation: A Comprehensive Survey
Figure 4 for Scene Graph Generation: A Comprehensive Survey

Deep learning techniques have led to remarkable breakthroughs in the field of generic object detection and have spawned a lot of scene-understanding tasks in recent years. Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding. Scene Graph Generation (SGG) refers to the task of automatically mapping an image into a semantic structural scene graph, which requires the correct labeling of detected objects and their relationships. Although this is a challenging task, the community has proposed a lot of SGG approaches and achieved good results. In this paper, we provide a comprehensive survey of recent achievements in this field brought about by deep learning techniques. We review 138 representative works that cover different input modalities, and systematically summarize existing methods of image-based SGG from the perspective of feature extraction and fusion. We attempt to connect and systematize the existing visual relationship detection methods, to summarize, and interpret the mechanisms and the strategies of SGG in a comprehensive way. Finally, we finish this survey with deep discussions about current existing problems and future research directions. This survey will help readers to develop a better understanding of the current research status and ideas.

* Submitted to TPAMI 
Viaarxiv icon

A Systematic Collection of Medical Image Datasets for Deep Learning

Jun 24, 2021
Johann Li, Guangming Zhu, Cong Hua, Mingtao Feng, BasheerBennamoun, Ping Li, Xiaoyuan Lu, Juan Song, Peiyi Shen, Xu Xu, Lin Mei, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun

Figure 1 for A Systematic Collection of Medical Image Datasets for Deep Learning
Figure 2 for A Systematic Collection of Medical Image Datasets for Deep Learning
Figure 3 for A Systematic Collection of Medical Image Datasets for Deep Learning
Figure 4 for A Systematic Collection of Medical Image Datasets for Deep Learning

The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training. The lack of data in the medical imaging field creates a bottleneck for the application of deep learning to medical image analysis. Medical image acquisition, annotation, and analysis are costly, and their usage is constrained by ethical restrictions. They also require many resources, such as human expertise and funding. That makes it difficult for non-medical researchers to have access to useful and large medical data. Thus, as comprehensive as possible, this paper provides a collection of medical image datasets with their associated challenges for deep learning research. We have collected information of around three hundred datasets and challenges mainly reported between 2013 and 2020 and categorized them into four categories: head & neck, chest & abdomen, pathology & blood, and ``others''. Our paper has three purposes: 1) to provide a most up to date and complete list that can be used as a universal reference to easily find the datasets for clinical image analysis, 2) to guide researchers on the methodology to test and evaluate their methods' performance and robustness on relevant datasets, 3) to provide a ``route'' to relevant algorithms for the relevant medical topics, and challenge leaderboards.

* This paper has been submitted to one journal 
Viaarxiv icon

Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

Mar 30, 2021
Mingtao Feng, Zhen Li, Qi Li, Liang Zhang, XiangDong Zhang, Guangming Zhu, Hui Zhang, Yaonan Wang, Ajmal Mian

Figure 1 for Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud
Figure 2 for Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud
Figure 3 for Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud
Figure 4 for Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

3D object grounding aims to locate the most relevant target object in a raw point cloud scene based on a free-form language description. Understanding complex and diverse descriptions, and lifting them directly to a point cloud is a new and challenging topic due to the irregular and sparse nature of point clouds. There are three main challenges in 3D object grounding: to find the main focus in the complex and diverse description; to understand the point cloud scene; and to locate the target object. In this paper, we address all three challenges. Firstly, we propose a language scene graph module to capture the rich structure and long-distance phrase correlations. Secondly, we introduce a multi-level 3D proposal relation graph module to extract the object-object and object-scene co-occurrence relationships, and strengthen the visual features of the initial proposals. Lastly, we develop a description guided 3D visual graph module to encode global contexts of phrases and proposals by a nodes matching strategy. Extensive experiments on challenging benchmark datasets (ScanRefer and Nr3D) show that our algorithm outperforms existing state-of-the-art. Our code is available at https://github.com/PNXD/FFL-3DOG.

Viaarxiv icon

Minimum Potential Energy of Point Cloud for Robust Global Registration

Jun 12, 2020
Zijie Wu, Yaonan Wang, Qing Zhu, Jianxu Mao, Haotian Wu, Mingtao Feng, Ajmal Mian

Figure 1 for Minimum Potential Energy of Point Cloud for Robust Global Registration
Figure 2 for Minimum Potential Energy of Point Cloud for Robust Global Registration
Figure 3 for Minimum Potential Energy of Point Cloud for Robust Global Registration
Figure 4 for Minimum Potential Energy of Point Cloud for Robust Global Registration

In this paper, we propose a novel minimum gravitational potential energy (MPE)-based algorithm for global point set registration. The feature descriptors extraction algorithms have emerged as the standard approach to align point sets in the past few decades. However, the alignment can be challenging to take effect when the point set suffers from raw point data problems such as noises (Gaussian and Uniformly). Different from the most existing point set registration methods which usually extract the descriptors to find correspondences between point sets, our proposed MPE alignment method is able to handle large scale raw data offset without depending on traditional descriptors extraction, whether for the local or global registration methods. We decompose the solution into a global optimal convex approximation and the fast descent process to a local minimum. For the approximation step, the proposed minimum potential energy (MPE) approach consists of two main steps. Firstly, according to the construction of the force traction operator, we could simply compute the position of the potential energy minimum; Secondly, with respect to the finding of the MPE point, we propose a new theory that employs the two flags to observe the status of the registration procedure. The method of fast descent process to the minimum that we employed is the iterative closest point algorithm; it can achieve the global minimum. We demonstrate the performance of the proposed algorithm on synthetic data as well as on real data. The proposed method outperforms the other global methods in terms of both efficiency, accuracy and noise resistance.

Viaarxiv icon

Relation Graph Network for 3D Object Detection in Point Clouds

Nov 30, 2019
Mingtao Feng, Syed Zulqarnain Gilani, Yaonan Wang, Liang Zhang, Ajmal Mian

Figure 1 for Relation Graph Network for 3D Object Detection in Point Clouds
Figure 2 for Relation Graph Network for 3D Object Detection in Point Clouds
Figure 3 for Relation Graph Network for 3D Object Detection in Point Clouds
Figure 4 for Relation Graph Network for 3D Object Detection in Point Clouds

Convolutional Neural Networks (CNNs) have emerged as a powerful strategy for most object detection tasks on 2D images. However, their power has not been fully realised for detecting 3D objects in point clouds directly without converting them to regular grids. Existing state-of-art 3D object detection methods aim to recognize 3D objects individually without exploiting their relationships during learning or inference. In this paper, we first propose a strategy that associates the predictions of direction vectors and pseudo geometric centers together leading to a win-win solution for 3D bounding box candidates regression. Secondly, we propose point attention pooling to extract uniform appearance features for each 3D object proposal, benefiting from the learned direction features, semantic features and spatial coordinates of the object points. Finally, the appearance features are used together with the position features to build 3D object-object relationship graphs for all proposals to model their co-existence. We explore the effect of relation graphs on proposals' appearance features enhancement under supervised and unsupervised settings. The proposed relation graph network consists of a 3D object proposal generation module and a 3D relation module, makes it an end-to-end trainable network for detecting 3D object in point clouds. Experiments on challenging benchmarks ( SunRGB-Dand ScanNet datasets ) of 3D point clouds show that our algorithm can perform better than the existing state-of-the-art methods.

* Manuscript 
Viaarxiv icon

Point Attention Network for Semantic Segmentation of 3D Point Clouds

Sep 27, 2019
Mingtao Feng, Liang Zhang, Xuefei Lin, Syed Zulqarnain Gilani, Ajmal Mian

Figure 1 for Point Attention Network for Semantic Segmentation of 3D Point Clouds
Figure 2 for Point Attention Network for Semantic Segmentation of 3D Point Clouds
Figure 3 for Point Attention Network for Semantic Segmentation of 3D Point Clouds
Figure 4 for Point Attention Network for Semantic Segmentation of 3D Point Clouds

Convolutional Neural Networks (CNNs) have performed extremely well on data represented by regularly arranged grids such as images. However, directly leveraging the classic convolution kernels or parameter sharing mechanisms on sparse 3D point clouds is inefficient due to their irregular and unordered nature. We propose a point attention network that learns rich local shape features and their contextual correlations for 3D point cloud semantic segmentation. Since the geometric distribution of the neighboring points is invariant to the point ordering, we propose a Local Attention-Edge Convolution (LAE Conv) to construct a local graph based on the neighborhood points searched in multi-directions. We assign attention coefficients to each edge and then aggregate the point features as a weighted sum of its neighbors. The learned LAE-Conv layer features are then given to a point-wise spatial attention module to generate an interdependency matrix of all points regardless of their distances, which captures long-range spatial contextual features contributing to more precise semantic information. The proposed point attention network consists of an encoder and decoder which, together with the LAE-Conv layers and the point-wise spatial attention modules, make it an end-to-end trainable network for predicting dense labels for 3D point cloud segmentation. Experiments on challenging benchmarks of 3D point clouds show that our algorithm can perform at par or better than the existing state of the art methods.

* Submitted to a journal 
Viaarxiv icon