Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Moon-Ryul Jung

Appearance-Invariant Detection of Suggestive Motion via Laban Movement Descriptors on SMPL Skeletons

May 23, 2026

Jaehoon Ahn, Jeonghan Kong, Moon-Ryul Jung

Abstract:Content moderation in online multiplayer 3D virtual environments has recently been relegated to automated, AI-based pipelines. However, the field has mainly been involved in detection of illicit content in images, video, and audio, leaving blind spots in detection techniques for suggestive motion. We present a motion-only classification pipeline that detects suggestive and explicit movement from SMPL skeleton trajectories using Laban Movement Analysis (LMA) descriptors. On 20,514 motion fragments (17+ hours) spanning four ordinal tiers -- everyday, artistic, suggestive, explicit -- logistic regression over 110 LMA features achieves 57.3% four-way accuracy (2.3x chance), 72.1% three-way, and 78.7% binary SFW/NSFW. Confusion concentrates on adjacent tiers, confirming that classification errors are concentrated between adjacent tiers over non-adjacent ones. Moreover, different movement qualities dominate at each level of the taxonomy -- no single feature drives the classification, suggesting that the four-tier structure reflects genuinely distinct motion regimes.

* 2 pages, 2 figures. Accepted as a Poster at SIGGRAPH 2026

Via

Access Paper or Ask Questions

The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking

May 12, 2026

Jaehoon Ahn, Tae Gum Hwang, Moon-Ryul Jung

Abstract:Over the past two decades, the task of musical beat tracking has transitioned from heuristic onset detection algorithms to highly capable deep neural networks (DNN). Although DNN-based beat tracking models achieve near-perfect performance on mainstream, percussive datasets, the SMC dataset has stubbornly yielded low F-measure scores. By testing how well state-of-the-art models detect beats on individual tracks in the SMC dataset, we identify three distinct failure modes: octave errors, continuity errors, and complete tracking failure where all metrics fall below 0.3. We reveal that state-of-the-art models tend to generate "confident-but-wrong" activations. Furthermore, we show that the standard DBN's default minimum tempo of 55 BPM prevents it from inferring the correct tempo for 21\% of SMC tracks, forcing double-tempo predictions on slow music. By exposing such fundamental oversights, we provide concrete directions for improving beat and downbeat detection, specifically emphasizing training data diversification and multi-hypothesis tempo estimation.

* 6 pages, 3 figures. Technical report on beat tracking failure modes; prepared for ISMIR 2026

Via

Access Paper or Ask Questions

Beat Detection as Object Detection

Oct 16, 2025

Jaehoon Ahn, Moon-Ryul Jung

Abstract:Recent beat and downbeat tracking models (e.g., RNNs, TCNs, Transformers) output frame-level activations. We propose reframing this task as object detection, where beats and downbeats are modeled as temporal "objects." Adapting the FCOS detector from computer vision to 1D audio, we replace its original backbone with WaveBeat's temporal feature extractor and add a Feature Pyramid Network to capture multi-scale temporal patterns. The model predicts overlapping beat/downbeat intervals with confidence scores, followed by non-maximum suppression (NMS) to select final predictions. This NMS step serves a similar role to DBNs in traditional trackers, but is simpler and less heuristic. Evaluated on standard music datasets, our approach achieves competitive results, showing that object detection techniques can effectively model musical beats with minimal adaptation.

* 11 pages, 4 figures, 5 tables

Via

Access Paper or Ask Questions

ConTEXTure: Consistent Multiview Images to Texture

Jul 15, 2024

Jaehoon Ahn, Sumin Cho, Harim Jung, Kibeom Hong, Seonghoon Ban, Moon-Ryul Jung

Figure 1 for ConTEXTure: Consistent Multiview Images to Texture

Figure 2 for ConTEXTure: Consistent Multiview Images to Texture

Figure 3 for ConTEXTure: Consistent Multiview Images to Texture

Figure 4 for ConTEXTure: Consistent Multiview Images to Texture

Abstract:We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure builds upon the TEXTure network, which uses text prompts for six viewpoints (e.g., 'Napoleon, front view', 'Napoleon, left view', etc.). However, TEXTure often generates images for non-front viewpoints that do not accurately represent those viewpoints.To address this issue, we employ Zero123++, which generates multiple view-consistent images for the six specified viewpoints simultaneously, conditioned on the initial front-view image and the depth maps of the mesh for the six viewpoints. By utilizing these view-consistent images, ConTEXTure learns the texture atlas from all viewpoint images concurrently, unlike previous methods that do so sequentially. This approach ensures that the rendered images from various viewpoints, including back, side, bottom, and top, are free from viewpoint irregularities.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions