Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sungil Kang

Speech Tokenizer is Key to Consistent Representation

Jul 09, 2025

Wonjin Jung, Sungil Kang, Dong-Yeon Cho

Abstract:Speech tokenization is crucial in digital speech processing, converting continuous speech signals into discrete units for various computational tasks. This paper introduces a novel speech tokenizer with broad applicability across downstream tasks. While recent advances in residual vector quantization (RVQ) have incorporated semantic elements, they often neglect critical acoustic features. We propose an advanced approach that simultaneously encodes both linguistic and acoustic information, preserving prosodic and emotional content. Our method significantly enhances speech representation fidelity across diverse applications. Empirical evaluations demonstrate its effectiveness in speech coding, voice conversion, emotion recognition, and multimodal language modeling, without requiring additional training. This versatility underscores its potential as a key tool for advancing AI-driven speech processing.

Via

Access Paper or Ask Questions

Associative Partial Domain Adaptation

Aug 07, 2020

Youngeun Kim, Sungeun Hong, Seunghan Yang, Sungil Kang, Yunho Jeon, Jiwon Kim

Figure 1 for Associative Partial Domain Adaptation

Figure 2 for Associative Partial Domain Adaptation

Figure 3 for Associative Partial Domain Adaptation

Figure 4 for Associative Partial Domain Adaptation

Abstract:Partial Adaptation (PDA) addresses a practical scenario in which the target domain contains only a subset of classes in the source domain. While PDA should take into account both class-level and sample-level to mitigate negative transfer, current approaches mostly rely on only one of them. In this paper, we propose a novel approach to fully exploit multi-level associations that can arise in PDA. Our Associative Partial Domain Adaptation (APDA) utilizes intra-domain association to actively select out non-trivial anomaly samples in each source-private class that sample-level weighting cannot handle. Additionally, our method considers inter-domain association to encourage positive transfer by mapping between nearby target samples and source samples with high label-commonness. For this, we exploit feature propagation in a proposed label space consisting of source ground-truth labels and target probabilistic labels. We further propose a geometric guidance loss based on the label commonness of each source class to encourage positive transfer. Our APDA consistently achieves state-of-the-art performance across public datasets.

* 8 pages, 8 figures, 3 tables

Via

Access Paper or Ask Questions

Key Instance Selection for Unsupervised Video Object Segmentation

Jul 26, 2019

Donghyeon Cho, Sungeun Hong, Sungil Kang, Jiwon Kim

Figure 1 for Key Instance Selection for Unsupervised Video Object Segmentation

Figure 2 for Key Instance Selection for Unsupervised Video Object Segmentation

Figure 3 for Key Instance Selection for Unsupervised Video Object Segmentation

Figure 4 for Key Instance Selection for Unsupervised Video Object Segmentation

Abstract:This paper proposes key instance selection based on video saliency covering objectness and dynamics for unsupervised video object segmentation (UVOS). Our method takes frames sequentially and extracts object proposals with corresponding masks for each frame. We link objects according to their similarity until the M-th frame and then assign them unique IDs (i.e., instances). Similarity measure takes into account multiple properties such as ReID descriptor, expected trajectory, and semantic co-segmentation result. After M-th frame, we select K IDs based on video saliency and frequency of appearance; then only these key IDs are tracked through the remaining frames. Thanks to these technical contributions, our results are ranked third on the leaderboard of UVOS DAVIS challenge.

* Ranked 3rd in 'Unsupervised DAVIS Challenge' (CVPR 2019)

Via

Access Paper or Ask Questions