Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eun Som Jeon

Ground Reaction Force Estimation via Time-aware Knowledge Distillation

Jun 12, 2025

Eun Som Jeon, Sinjini Mitra, Jisoo Lee, Omik M. Save, Ankita Shukla, Hyunglae Lee, Pavan Turaga

Figure 1 for Ground Reaction Force Estimation via Time-aware Knowledge Distillation

Figure 2 for Ground Reaction Force Estimation via Time-aware Knowledge Distillation

Figure 3 for Ground Reaction Force Estimation via Time-aware Knowledge Distillation

Figure 4 for Ground Reaction Force Estimation via Time-aware Knowledge Distillation

Abstract:Human gait analysis with wearable sensors has been widely used in various applications, such as daily life healthcare, rehabilitation, physical therapy, and clinical diagnostics and monitoring. In particular, ground reaction force (GRF) provides critical information about how the body interacts with the ground during locomotion. Although instrumented treadmills have been widely used as the gold standard for measuring GRF during walking, their lack of portability and high cost make them impractical for many applications. As an alternative, low-cost, portable, wearable insole sensors have been utilized to measure GRF; however, these sensors are susceptible to noise and disturbance and are less accurate than treadmill measurements. To address these challenges, we propose a Time-aware Knowledge Distillation framework for GRF estimation from insole sensor data. This framework leverages similarity and temporal features within a mini-batch during the knowledge distillation process, effectively capturing the complementary relationships between features and the sequential properties of the target and input data. The performance of the lightweight models distilled through this framework was evaluated by comparing GRF estimations from insole sensor data against measurements from an instrumented treadmill. Empirical results demonstrated that Time-aware Knowledge Distillation outperforms current baselines in GRF estimation from wearable sensor data.

* IEEE Internet of Things Journal, 2025

Via

Access Paper or Ask Questions

Intra-class Patch Swap for Self-Distillation

May 20, 2025

Hongjun Choi, Eun Som Jeon, Ankita Shukla, Pavan Turaga

Abstract:Knowledge distillation (KD) is a valuable technique for compressing large deep learning models into smaller, edge-suitable networks. However, conventional KD frameworks rely on pre-trained high-capacity teacher networks, which introduce significant challenges such as increased memory/storage requirements, additional training costs, and ambiguity in selecting an appropriate teacher for a given student model. Although a teacher-free distillation (self-distillation) has emerged as a promising alternative, many existing approaches still rely on architectural modifications or complex training procedures, which limit their generality and efficiency. To address these limitations, we propose a novel framework based on teacher-free distillation that operates using a single student network without any auxiliary components, architectural modifications, or additional learnable parameters. Our approach is built on a simple yet highly effective augmentation, called intra-class patch swap augmentation. This augmentation simulates a teacher-student dynamic within a single model by generating pairs of intra-class samples with varying confidence levels, and then applying instance-to-instance distillation to align their predictive distributions. Our method is conceptually simple, model-agnostic, and easy to implement, requiring only a single augmentation function. Extensive experiments across image classification, semantic segmentation, and object detection show that our method consistently outperforms both existing self-distillation baselines and conventional teacher-based KD approaches. These results suggest that the success of self-distillation could hinge on the design of the augmentation itself. Our codes are available at https://github.com/hchoi71/Intra-class-Patch-Swap.

* Accepted for publication in Neurocomputing

Via

Access Paper or Ask Questions

Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data

Feb 02, 2025

Eun Som Jeon, Hongjun Choi, Matthew P. Buman, Pavan Turaga

Figure 1 for Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data

Figure 2 for Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data

Figure 3 for Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data

Figure 4 for Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data

Abstract:The analysis of wearable sensor data has enabled many successes in several applications. To represent the high-sampling rate time-series with sufficient detail, the use of topological data analysis (TDA) has been considered, and it is found that TDA can complement other time-series features. Nonetheless, due to the large time consumption and high computational resource requirements of extracting topological features through TDA, it is difficult to deploy topological knowledge in various applications. To tackle this problem, knowledge distillation (KD) can be adopted, which is a technique facilitating model compression and transfer learning to generate a smaller model by transferring knowledge from a larger network. By leveraging multiple teachers in KD, both time-series and topological features can be transferred, and finally, a superior student using only time-series data is distilled. On the other hand, mixup has been popularly used as a robust data augmentation technique to enhance model performance during training. Mixup and KD employ similar learning strategies. In KD, the student model learns from the smoothed distribution generated by the teacher model, while mixup creates smoothed labels by blending two labels. Hence, this common smoothness serves as the connecting link that establishes a connection between these two methods. In this paper, we analyze the role of mixup in KD with time-series as well as topological persistence, employing multiple teachers. We present a comprehensive analysis of various methods in KD and mixup on wearable sensor data.

* IEEE Sensors Journal (2024)

Via

Access Paper or Ask Questions

Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation

Aug 12, 2024

Utkarsh Nath, Rajeev Goel, Eun Som Jeon, Changhoon Kim, Kyle Min, Yezhou Yang, Yingzhen Yang, Pavan Turaga

Figure 1 for Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation

Figure 2 for Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation

Figure 3 for Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation

Figure 4 for Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation

Abstract:To address the data scarcity associated with 3D assets, 2D-lifting techniques such as Score Distillation Sampling (SDS) have become a widely adopted practice in text-to-3D generation pipelines. However, the diffusion models used in these techniques are prone to viewpoint bias and thus lead to geometric inconsistencies such as the Janus problem. To counter this, we introduce MT3D, a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias and explicitly infuse geometric understanding into the generation pipeline. Firstly, we employ depth maps derived from a high-quality 3D model as control signals to guarantee that the generated 2D images preserve the fundamental shape and structure, thereby reducing the inherent viewpoint bias. Next, we utilize deep geometric moments to ensure geometric consistency in the 3D representation explicitly. By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects, thereby improving the quality and usability of our 3D representations.

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data

Jul 07, 2024

Eun Som Jeon, Hongjun Choi, Ankita Shukla, Yuan Wang, Hyunglae Lee, Matthew P. Buman, Pavan Turaga

Figure 1 for Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data

Figure 2 for Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data

Figure 3 for Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data

Figure 4 for Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data

Abstract:Deep learning methods have achieved a lot of success in various applications involving converting wearable sensor data to actionable health insights. A common application areas is activity recognition, where deep-learning methods still suffer from limitations such as sensitivity to signal quality, sensor characteristic variations, and variability between subjects. To mitigate these issues, robust features obtained by topological data analysis (TDA) have been suggested as a potential solution. However, there are two significant obstacles to using topological features in deep learning: (1) large computational load to extract topological features using TDA, and (2) different signal representations obtained from deep learning and TDA which makes fusion difficult. In this paper, to enable integration of the strengths of topological methods in deep-learning for time-series data, we propose to use two teacher networks, one trained on the raw time-series data, and another trained on persistence images generated by TDA methods. The distilled student model utilizes only the raw time-series data at test-time. This approach addresses both issues. The use of KD with multiple teachers utilizes complementary information, and results in a compact model with strong supervisory features and an integrated richer representation. To assimilate desirable information from different modalities, we design new constraints, including orthogonality imposed on feature correlation maps for improving feature expressiveness and allowing the student to easily learn from the teacher. Also, we apply an annealing strategy in KD for fast saturation and better accommodation from different features, while the knowledge gap between the teachers and student is reduced. Finally, a robust student model is distilled, which uses only the time-series data as an input, while implicitly preserving topological features.

* Engineering Applications of Artificial Intelligence, 130, 107719 (2024)
* Engineering Applications of Artificial Intelligence 130, 107719

Via

Access Paper or Ask Questions

Leveraging Topological Guidance for Improved Knowledge Distillation

Jul 07, 2024

Eun Som Jeon, Rahul Khurana, Aishani Pathak, Pavan Turaga

Figure 1 for Leveraging Topological Guidance for Improved Knowledge Distillation

Figure 2 for Leveraging Topological Guidance for Improved Knowledge Distillation

Figure 3 for Leveraging Topological Guidance for Improved Knowledge Distillation

Figure 4 for Leveraging Topological Guidance for Improved Knowledge Distillation

Abstract:Deep learning has shown its efficacy in extracting useful features to solve various computer vision tasks. However, when the structure of the data is complex and noisy, capturing effective information to improve performance is very difficult. To this end, topological data analysis (TDA) has been utilized to derive useful representations that can contribute to improving performance and robustness against perturbations. Despite its effectiveness, the requirements for large computational resources and significant time consumption in extracting topological features through TDA are critical problems when implementing it on small devices. To address this issue, we propose a framework called Topological Guidance-based Knowledge Distillation (TGD), which uses topological features in knowledge distillation (KD) for image classification tasks. We utilize KD to train a superior lightweight model and provide topological features with multiple teachers simultaneously. We introduce a mechanism for integrating features from different teachers and reducing the knowledge gap between teachers and the student, which aids in improving performance. We demonstrate the effectiveness of our approach through diverse empirical evaluations.

* ICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling

Via

Access Paper or Ask Questions

Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks

Mar 21, 2024

Jinyung Hong, Eun Som Jeon, Changhoon Kim, Keun Hee Park, Utkarsh Nath, Yezhou Yang, Pavan Turaga, Theodore P. Pavlic

Figure 1 for Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks

Figure 2 for Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks

Figure 3 for Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks

Figure 4 for Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks

Abstract:Biased attributes, spuriously correlated with target labels in a dataset, can problematically lead to neural networks that learn improper shortcuts for classifications and limit their capabilities for out-of-distribution (OOD) generalization. Although many debiasing approaches have been proposed to ensure correct predictions from biased datasets, few studies have considered learning latent embedding consisting of intrinsic and biased attributes that contribute to improved performance and explain how the model pays attention to attributes. In this paper, we propose a novel debiasing framework, Debiasing Global Workspace, introducing attention-based information bottlenecks for learning compositional representations of attributes without defining specific bias types. Based on our observation that learning shape-centric representation helps robust performance on OOD datasets, we adopt those abilities to learn robust and generalizable representations of decomposable latent embeddings corresponding to intrinsic and biasing attributes. We conduct comprehensive evaluations on biased datasets, along with both quantitative and qualitative analyses, to showcase our approach's efficacy in attribute-centric representation learning and its ability to differentiate between intrinsic and bias-related features.

* 24 pages, 16 figures, 3 tables

Via

Access Paper or Ask Questions

Leveraging Angular Distributions for Improved Knowledge Distillation

Feb 27, 2023

Eun Som Jeon, Hongjun Choi, Ankita Shukla, Pavan Turaga

Figure 1 for Leveraging Angular Distributions for Improved Knowledge Distillation

Figure 2 for Leveraging Angular Distributions for Improved Knowledge Distillation

Figure 3 for Leveraging Angular Distributions for Improved Knowledge Distillation

Figure 4 for Leveraging Angular Distributions for Improved Knowledge Distillation

Abstract:Knowledge distillation as a broad class of methods has led to the development of lightweight and memory efficient models, using a pre-trained model with a large capacity (teacher network) to train a smaller model (student network). Recently, additional variations for knowledge distillation, utilizing activation maps of intermediate layers as the source of knowledge, have been studied. Generally, in computer vision applications, it is seen that the feature activation learned by a higher capacity model contains richer knowledge, highlighting complete objects while focusing less on the background. Based on this observation, we leverage the dual ability of the teacher to accurately distinguish between positive (relevant to the target object) and negative (irrelevant) areas. We propose a new loss function for distillation, called angular margin-based distillation (AMD) loss. AMD loss uses the angular distance between positive and negative features by projecting them onto a hypersphere, motivated by the near angular distributions seen in many feature extractors. Then, we create a more attentive feature that is angularly distributed on the hypersphere by introducing an angular margin to the positive feature. Transferring such knowledge from the teacher network enables the student model to harness the higher discrimination of positive and negative features for the teacher, thus distilling superior student models. The proposed method is evaluated for various student-teacher network pairs on four public datasets. Furthermore, we show that the proposed method has advantages in compatibility with other learning techniques, such as using fine-grained features, augmentation, and other distillation methods.

* Neurocomputing, Volume 518, 2023, Pages 466-481
* Neurocomputing, Volume 518, 21 January 2023, Pages 466-481

Via

Access Paper or Ask Questions

Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

Nov 09, 2022

Hongjun Choi, Eun Som Jeon, Ankita Shukla, Pavan Turaga

Figure 1 for Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

Figure 2 for Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

Figure 3 for Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

Figure 4 for Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

Abstract:Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that "smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD.

* To be presented at WACV 2023

Via

Access Paper or Ask Questions

Role of Data Augmentation Strategies in Knowledge Distillation for Wearable Sensor Data

Jan 01, 2022

Eun Som Jeon, Anirudh Som, Ankita Shukla, Kristina Hasanaj, Matthew P. Buman, Pavan Turaga

Figure 1 for Role of Data Augmentation Strategies in Knowledge Distillation for Wearable Sensor Data

Figure 2 for Role of Data Augmentation Strategies in Knowledge Distillation for Wearable Sensor Data

Figure 3 for Role of Data Augmentation Strategies in Knowledge Distillation for Wearable Sensor Data

Figure 4 for Role of Data Augmentation Strategies in Knowledge Distillation for Wearable Sensor Data

Abstract:Deep neural networks are parametrized by several thousands or millions of parameters, and have shown tremendous success in many classification problems. However, the large number of parameters makes it difficult to integrate these models into edge devices such as smartphones and wearable devices. To address this problem, knowledge distillation (KD) has been widely employed, that uses a pre-trained high capacity network to train a much smaller network, suitable for edge devices. In this paper, for the first time, we study the applicability and challenges of using KD for time-series data for wearable devices. Successful application of KD requires specific choices of data augmentation methods during training. However, it is not yet known if there exists a coherent strategy for choosing an augmentation approach during KD. In this paper, we report the results of a detailed study that compares and contrasts various common choices and some hybrid data augmentation strategies in KD based human activity analysis. Research in this area is often limited as there are not many comprehensive databases available in the public domain from wearable devices. Our study considers databases from small scale publicly available to one derived from a large scale interventional study into human activity and sedentary behavior. We find that the choice of data augmentation techniques during KD have a variable level of impact on end performance, and find that the optimal network choice as well as data augmentation strategies are specific to a dataset at hand. However, we also conclude with a general set of recommendations that can provide a strong baseline performance across databases.

Via

Access Paper or Ask Questions