Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Woontack Woo

Fast Texture Transfer for XR Avatars via Barycentric UV Conversion

Aug 27, 2025

Hail Song, Seokhwan Yang, Woontack Woo

Abstract:We present a fast and efficient method for transferring facial textures onto SMPL-X-based full-body avatars. Unlike conventional affine-transform methods that are slow and prone to visual artifacts, our method utilizes a barycentric UV conversion technique. Our approach precomputes the entire UV mapping into a single transformation matrix, enabling texture transfer in a single operation. This results in a speedup of over 7000x compared to the baseline, while also significantly improving the final texture quality by eliminating boundary artifacts. Through quantitative and qualitative evaluations, we demonstrate that our method offers a practical solution for personalization in immersive XR applications. The code is available online.

Via

Access Paper or Ask Questions

LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study

May 26, 2025

Dongil Yang, Minjin Kim, Sunghwan Kim, Beong-woo Kwak, Minjun Park, Jinseok Hong, Woontack Woo, Jinyoung Yeo

Figure 1 for LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study

Figure 2 for LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study

Figure 3 for LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study

Figure 4 for LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study

Abstract:The remarkable reasoning and generalization capabilities of Large Language Models (LLMs) have paved the way for their expanding applications in embodied AI, robotics, and other real-world tasks. To effectively support these applications, grounding in spatial and temporal understanding in multimodal environments is essential. To this end, recent works have leveraged scene graphs, a structured representation that encodes entities, attributes, and their relationships in a scene. However, a comprehensive evaluation of LLMs' ability to utilize scene graphs remains limited. In this work, we introduce Text-Scene Graph (TSG) Bench, a benchmark designed to systematically assess LLMs' ability to (1) understand scene graphs and (2) generate them from textual narratives. With TSG Bench we evaluate 11 LLMs and reveal that, while models perform well on scene graph understanding, they struggle with scene graph generation, particularly for complex narratives. Our analysis indicates that these models fail to effectively decompose discrete scenes from a complex narrative, leading to a bottleneck when generating scene graphs. These findings underscore the need for improved methodologies in scene graph generation and provide valuable insights for future research. The demonstration of our benchmark is available at https://tsg-bench.netlify.app. Additionally, our code and evaluation data are publicly available at https://anonymous.4open.science/r/TSG-Bench.

* ACL 2025

Via

Access Paper or Ask Questions

S3D: Sketch-Driven 3D Model Generation

May 07, 2025

Hail Song, Wonsik Shin, Naeun Lee, Soomin Chung, Nojun Kwak, Woontack Woo

Abstract:Generating high-quality 3D models from 2D sketches is a challenging task due to the inherent ambiguity and sparsity of sketch data. In this paper, we present S3D, a novel framework that converts simple hand-drawn sketches into detailed 3D models. Our method utilizes a U-Net-based encoder-decoder architecture to convert sketches into face segmentation masks, which are then used to generate a 3D representation that can be rendered from novel views. To ensure robust consistency between the sketch domain and the 3D output, we introduce a novel style-alignment loss that aligns the U-Net bottleneck features with the initial encoder outputs of the 3D generation module, significantly enhancing reconstruction fidelity. To further enhance the network's robustness, we apply augmentation techniques to the sketch dataset. This streamlined framework demonstrates the effectiveness of S3D in generating high-quality 3D models from sketch inputs. The source code for this project is publicly available at https://github.com/hailsong/S3D.

* Accepted as a short paper to the GMCV Workshop at CVPR'25

Via

Access Paper or Ask Questions

Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Sep 06, 2024

Woojin Cho, Jihyun Lee, Minjae Yi, Minje Kim, Taeyun Woo, Donghwan Kim, Taewook Ha, Hyokeun Lee, Je-Hwan Ryu, Woontack Woo(+1 more)

Figure 1 for Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Figure 2 for Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Figure 3 for Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Figure 4 for Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Abstract:Existing datasets for 3D hand-object interaction are limited either in the data cardinality, data variations in interaction scenarios, or the quality of annotations. In this work, we present a comprehensive new training dataset for hand-object interaction called HOGraspNet. It is the only real dataset that captures full grasp taxonomies, providing grasp annotation and wide intraclass variations. Using grasp taxonomies as atomic actions, their space and time combinatorial can represent complex hand activities around objects. We select 22 rigid objects from the YCB dataset and 8 other compound objects using shape and size taxonomies, ensuring coverage of all hand grasp configurations. The dataset includes diverse hand shapes from 99 participants aged 10 to 74, continuous video frames, and a 1.5M RGB-Depth of sparse frames with annotations. It offers labels for 3D hand and object meshes, 3D keypoints, contact maps, and \emph{grasp labels}. Accurate hand and object 3D meshes are obtained by fitting the hand parametric model (MANO) and the hand implicit function (HALO) to multi-view RGBD frames, with the MoCap system only for objects. Note that HALO fitting does not require any parameter tuning, enabling scalability to the dataset's size with comparable accuracy to MANO. We evaluate HOGraspNet on relevant tasks: grasp classification and 3D hand pose estimation. The result shows performance variations based on grasp type and object class, indicating the potential importance of the interaction space captured by our dataset. The provided data aims at learning universal shape priors or foundation models for 3D hand-object interaction. Our dataset and code are available at https://hograspnet2024.github.io/.

* 14 pages except for references. It will be published at European Conference on Computer Vision(ECCV) 2024

Via

Access Paper or Ask Questions

eCAR: edge-assisted Collaborative Augmented Reality Framework

May 11, 2024

Jinwoo Jeon, Woontack Woo

Figure 1 for eCAR: edge-assisted Collaborative Augmented Reality Framework

Figure 2 for eCAR: edge-assisted Collaborative Augmented Reality Framework

Figure 3 for eCAR: edge-assisted Collaborative Augmented Reality Framework

Figure 4 for eCAR: edge-assisted Collaborative Augmented Reality Framework

Abstract:We propose a novel edge-assisted multi-user collaborative augmented reality framework in a large indoor environment. In Collaborative Augmented Reality, data communication that synchronizes virtual objects has large network traffic and high network latency. Due to drift, CAR applications without continuous data communication for coordinate system alignment have virtual object inconsistency. In addition, synchronization messages for online virtual object updates have high latency as the number of collaborative devices increases. To solve this problem, we implement the CAR framework, called eCAR, which utilizes edge computing to continuously match the device's coordinate system with less network traffic. Furthermore, we extend the co-visibility graph of the edge server to maintain virtual object spatial-temporal consistency in neighboring devices by synchronizing a local graph. We evaluate the system quantitatively and qualitatively in the public dataset and a physical indoor environment. eCAR communicates data for coordinate system alignment between the edge server and devices with less network traffic and latency. In addition, collaborative augmented reality synchronization algorithms quickly and accurately host and resolve virtual objects. The proposed system continuously aligns coordinate systems to multiple devices in a large indoor environment and shares augmented reality content. Through our system, users interact with virtual objects and share augmented reality experiences with neighboring users.

Via

Access Paper or Ask Questions

Seg&Struct: The Interplay Between Part Segmentation and Structure Inference for 3D Shape Parsing

Nov 01, 2022

Jeonghyun Kim, Kaichun Mo, Minhyuk Sung, Woontack Woo

Abstract:We propose Seg&Struct, a supervised learning framework leveraging the interplay between part segmentation and structure inference and demonstrating their synergy in an integrated framework. Both part segmentation and structure inference have been extensively studied in the recent deep learning literature, while the supervisions used for each task have not been fully exploited to assist the other task. Namely, structure inference has been typically conducted with an autoencoder that does not leverage the point-to-part associations. Also, segmentation has been mostly performed without structural priors that tell the plausibility of the output segments. We present how these two tasks can be best combined while fully utilizing supervision to improve performance. Our framework first decomposes a raw input shape into part segments using an off-the-shelf algorithm, whose outputs are then mapped to nodes in a part hierarchy, establishing point-to-part associations. Following this, ours predicts the structural information, e.g., part bounding boxes and part relationships. Lastly, the segmentation is rectified by examining the confusion of part boundaries using the structure-based part features. Our experimental results based on the StructureNet and PartNet demonstrate that the interplay between the two tasks results in remarkable improvements in both tasks: 27.91% in structure inference and 0.5% in segmentation.

* WACV 2023 (Algorithm Track)

Via

Access Paper or Ask Questions