Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hyeonseop Song

PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data

Mar 17, 2025

ChangHee Yang, Hyeonseop Song, Seokhun Choi, Seungwoo Lee, Jaechul Kim, Hoseok Do

Figure 1 for PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data

Figure 2 for PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data

Figure 3 for PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data

Figure 4 for PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data

Abstract:Despite considerable efforts to enhance the generalization of 3D pose estimators without costly 3D annotations, existing data augmentation methods struggle in real world scenarios with diverse human appearances and complex poses. We propose PoseSyn, a novel data synthesis framework that transforms abundant in the wild 2D pose dataset into diverse 3D pose image pairs. PoseSyn comprises two key components: Error Extraction Module (EEM), which identifies challenging poses from the 2D pose datasets, and Motion Synthesis Module (MSM), which synthesizes motion sequences around the challenging poses. Then, by generating realistic 3D training data via a human animation model aligned with challenging poses and appearances PoseSyn boosts the accuracy of various 3D pose estimators by up to 14% across real world benchmarks including various backgrounds and occlusions, challenging poses, and multi view scenarios. Extensive experiments further confirm that PoseSyn is a scalable and effective approach for improving generalization without relying on expensive 3D annotations, regardless of the pose estimator's model size or design.

* The first three authors contributed equally to this work

Via

Access Paper or Ask Questions

Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

Jul 16, 2024

Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim, Hoseok Do

Abstract:Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D scenes. In this study, we propose Click-Gaussian, which learns distinguishable feature fields of two-level granularity, facilitating segmentation without time-consuming post-processing. We delve into challenges stemming from inconsistently learned feature fields resulting from 2D segmentation obtained independently from a 3D scene. 3D segmentation accuracy deteriorates when 2D segmentation results across the views, primary cues for 3D segmentation, are in conflict. To overcome these issues, we propose Global Feature-guided Learning (GFL). GFL constructs the clusters of global feature candidates from noisy 2D segments across the views, which smooths out noises when training the features of 3D Gaussians. Our method runs in 10 ms per click, 15 to 130 times as fast as the previous methods, while also significantly improving segmentation accuracy. Our project page is available at https://seokhunchoi.github.io/Click-Gaussian

* Accepted to ECCV 2024. The first two authors contributed equally to this work

Via

Access Paper or Ask Questions

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

Sep 11, 2023

Hyeonseop Song, Seokhun Choi, Hoseok Do, Chul Lee, Taehyeong Kim

Abstract:Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text. By using a pretrained vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects with varying colors and densities, modify textures, and remove parts of the original object. Our extensive experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts. Our project page is available at https://seokhunchoi.github.io/Blending-NeRF/

* Accepted to ICCV 2023. The first two authors contributed equally to this work

Via

Access Paper or Ask Questions

Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Jul 31, 2022

Taehyeong Kim, Hyeonseop Song, Byoung-Tak Zhang

Figure 1 for Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Figure 2 for Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Figure 3 for Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Figure 4 for Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Abstract:Human infants learn the names of objects and develop their own conceptual systems without explicit supervision. In this study, we propose methods for learning aligned vision-language conceptual systems inspired by infants' word learning mechanisms. The proposed model learns the associations of visual objects and words online and gradually constructs cross-modal relational graph networks. Additionally, we also propose an aligned cross-modal representation learning method that learns semantic representations of visual objects and words in a self-supervised manner based on the cross-modal relational graph networks. It allows entities of different modalities with conceptually the same meaning to have similar semantic representation vectors. We quantitatively and qualitatively evaluate our method, including object-to-word mapping and zero-shot learning tasks, showing that the proposed model significantly outperforms the baselines and that each conceptual system is topologically aligned.

* 19 pages, 4 figures

Via

Access Paper or Ask Questions