Alert button
Picture for Sungmin Woo

Sungmin Woo

Alert button

Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition

Dec 09, 2022
Jungho Lee, Minhyeok Lee, Suhwan Cho, Sungmin Woo, Sangyoun Lee

Figure 1 for Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition
Figure 2 for Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition
Figure 3 for Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition
Figure 4 for Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition

Skeleton-based action recognition has attracted considerable attention due to its compact skeletal structure of the human body. Many recent methods have achieved remarkable performance using graph convolutional networks (GCNs) and convolutional neural networks (CNNs), which extract spatial and temporal features, respectively. Although spatial and temporal dependencies in the human skeleton have been explored, spatio-temporal dependency is rarely considered. In this paper, we propose the Inter-Frame Curve Network (IFC-Net) to effectively leverage the spatio-temporal dependency of the human skeleton. Our proposed network consists of two novel elements: 1) The Inter-Frame Curve (IFC) module; and 2) Dilated Graph Convolution (D-GC). The IFC module increases the spatio-temporal receptive field by identifying meaningful node connections between every adjacent frame and generating spatio-temporal curves based on the identified node connections. The D-GC allows the network to have a large spatial receptive field, which specifically focuses on the spatial domain. The kernels of D-GC are computed from the given adjacency matrices of the graph and reflect large receptive field in a way similar to the dilated CNNs. Our IFC-Net combines these two modules and achieves state-of-the-art performance on three skeleton-based action recognition benchmarks: NTU-RGB+D 60, NTU-RGB+D 120, and Northwestern-UCLA.

* 12 pages, 5 figures 
Viaarxiv icon

CKConv: Learning Feature Voxelization for Point Cloud Analysis

Jul 27, 2021
Sungmin Woo, Dogyoon Lee, Junhyeop Lee, Sangwon Hwang, Woojin Kim, Sangyoun Lee

Figure 1 for CKConv: Learning Feature Voxelization for Point Cloud Analysis
Figure 2 for CKConv: Learning Feature Voxelization for Point Cloud Analysis
Figure 3 for CKConv: Learning Feature Voxelization for Point Cloud Analysis
Figure 4 for CKConv: Learning Feature Voxelization for Point Cloud Analysis

Despite the remarkable success of deep learning, optimal convolution operation on point cloud remains indefinite due to its irregular data structure. In this paper, we present Cubic Kernel Convolution (CKConv) that learns to voxelize the features of local points by exploiting both continuous and discrete convolutions. Our continuous convolution uniquely employs a 3D cubic form of kernel weight representation that splits a feature into voxels in embedding space. By consecutively applying discrete 3D convolutions on the voxelized features in a spatial manner, preceding continuous convolution is forced to learn spatial feature mapping, i.e., feature voxelization. In this way, geometric information can be detailed by encoding with subdivided features, and our 3D convolutions on these fixed structured data do not suffer from discretization artifacts thanks to voxelization in embedding space. Furthermore, we propose a spatial attention module, Local Set Attention (LSA), to provide comprehensive structure awareness within the local point set and hence produce representative features. By learning feature voxelization with LSA, CKConv can extract enriched features for effective point cloud analysis. We show that CKConv has great applicability to point cloud processing tasks including object classification, object part segmentation, and scene semantic segmentation with state-of-the-art results.

Viaarxiv icon

Regularization Strategy for Point Cloud via Rigidly Mixed Sample

Feb 03, 2021
Dogyoon Lee, Jaeha Lee, Junhyeop Lee, Hyeongmin Lee, Minhyeok Lee, Sungmin Woo, Sangyoun Lee

Figure 1 for Regularization Strategy for Point Cloud via Rigidly Mixed Sample
Figure 2 for Regularization Strategy for Point Cloud via Rigidly Mixed Sample
Figure 3 for Regularization Strategy for Point Cloud via Rigidly Mixed Sample
Figure 4 for Regularization Strategy for Point Cloud via Rigidly Mixed Sample

Data augmentation is an effective regularization strategy to alleviate the overfitting, which is an inherent drawback of the deep neural networks. However, data augmentation is rarely considered for point cloud processing despite many studies proposing various augmentation methods for image data. Actually, regularization is essential for point clouds since lack of generality is more likely to occur in point cloud due to small datasets. This paper proposes a Rigid Subset Mix (RSMix), a novel data augmentation method for point clouds that generates a virtual mixed sample by replacing part of the sample with shape-preserved subsets from another sample. RSMix preserves structural information of the point cloud sample by extracting subsets from each sample without deformation using a neighboring function. The neighboring function was carefully designed considering unique properties of point cloud, unordered structure and non-grid. Experiments verified that RSMix successfully regularized the deep neural networks with remarkable improvement for shape classification. We also analyzed various combinations of data augmentations including RSMix with single and multi-view evaluations, based on abundant ablation studies.

* 10 pages, 5 figures, 7 tables 
Viaarxiv icon

PMVOS: Pixel-Level Matching-Based Video Object Segmentation

Sep 18, 2020
Suhwan Cho, Heansung Lee, Sungmin Woo, Sungjun Jang, Sangyoun Lee

Semi-supervised video object segmentation (VOS) aims to segment arbitrary target objects in video when the ground truth segmentation mask of the initial frame is provided. Due to this limitation of using prior knowledge about the target object, feature matching, which compares template features representing the target object with input features, is an essential step. Recently, pixel-level matching (PM), which matches every pixel in template features and input features, has been widely used for feature matching because of its high performance. However, despite its effectiveness, the information used to build the template features is limited to the initial and previous frames. We address this issue by proposing a novel method-PM-based video object segmentation (PMVOS)-that constructs strong template features containing the information of all past frames. Furthermore, we apply self-attention to the similarity maps generated from PM to capture global dependencies. On the DAVIS 2016 validation set, we achieve new state-of-the-art performance among real-time methods (> 30 fps), with a J&F score of 85.6%. Performance on the DAVIS 2017 and YouTube-VOS validation sets is also impressive, with J&F scores of 74.0% and 68.2%, respectively.

* Code: https://github.com/suhwan-cho/PMVOS 
Viaarxiv icon

False Positive Removal for 3D Vehicle Detection with Penetrated Point Classifier

May 28, 2020
Sungmin Woo, Sangwon Hwang, Woojin Kim, Junhyeop Lee, Dogyoon Lee, Sangyoun Lee

Figure 1 for False Positive Removal for 3D Vehicle Detection with Penetrated Point Classifier
Figure 2 for False Positive Removal for 3D Vehicle Detection with Penetrated Point Classifier
Figure 3 for False Positive Removal for 3D Vehicle Detection with Penetrated Point Classifier
Figure 4 for False Positive Removal for 3D Vehicle Detection with Penetrated Point Classifier

Recently, researchers have been leveraging LiDAR point cloud for higher accuracy in 3D vehicle detection. Most state-of-the-art methods are deep learning based, but are easily affected by the number of points generated on the object. This vulnerability leads to numerous false positive boxes at high recall positions, where objects are occasionally predicted with few points. To address the issue, we introduce Penetrated Point Classifier (PPC) based on the underlying property of LiDAR that points cannot be generated behind vehicles. It determines whether a point exists behind the vehicle of the predicted box, and if does, the box is distinguished as false positive. Our straightforward yet unprecedented approach is evaluated on KITTI dataset and achieved performance improvement of PointRCNN, one of the state-of-the-art methods. The experiment results show that precision at the highest recall position is dramatically increased by 15.46 percentage points and 14.63 percentage points on the moderate and hard difficulty of car class, respectively.

* Accepted by ICIP 2020 
Viaarxiv icon