Alert button
Picture for Seungjae Jung

Seungjae Jung

Alert button

Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning

Dec 13, 2022
Kyuyong Shin, Hanock Kwak, Wonjae Kim, Jisu Jeong, Seungjae Jung, Kyung-Min Kim, Jung-Woo Ha, Sang-Woo Lee

Figure 1 for Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning
Figure 2 for Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning
Figure 3 for Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning
Figure 4 for Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning

Recent studies have proposed unified user modeling frameworks that leverage user behavior data from various applications. Many of them benefit from utilizing users' behavior sequences as plain texts, representing rich information in any domain or system without losing generality. Hence, a question arises: Can language modeling for user history corpus help improve recommender systems? While its versatile usability has been widely investigated in many domains, its applications to recommender systems still remain underexplored. We show that language modeling applied directly to task-specific user histories achieves excellent results on diverse recommendation tasks. Also, leveraging additional task-agnostic user histories delivers significant performance benefits. We further demonstrate that our approach can provide promising transfer learning capabilities for a broad spectrum of real-world recommender systems, even on unseen domains and services.

* 14 pages, 5 figures, 9 tables 
Viaarxiv icon

Hazard Gradient Penalty for Survival Analysis

May 27, 2022
Seungjae Jung, Kyung-Min Kim

Figure 1 for Hazard Gradient Penalty for Survival Analysis
Figure 2 for Hazard Gradient Penalty for Survival Analysis
Figure 3 for Hazard Gradient Penalty for Survival Analysis
Figure 4 for Hazard Gradient Penalty for Survival Analysis

Survival analysis appears in various fields such as medicine, economics, engineering, and business. Recent studies showed that the Ordinary Differential Equation (ODE) modeling framework unifies many existing survival models while the framework is flexible and widely applicable. However, naively applying the ODE framework to survival analysis problems may model fiercely changing density function which may worsen the model's performance. Though we can apply L1 or L2 regularizers to the ODE model, their effect on the ODE modeling framework is barely known. In this paper, we propose hazard gradient penalty (HGP) to enhance the performance of a survival analysis model. Our method imposes constraints on local data points by regularizing the gradient of hazard function with respect to the data point. Our method applies to any survival analysis model including the ODE modeling framework and is easy to implement. We theoretically show that our method is related to minimizing the KL divergence between the density function at a data point and that of the neighborhood points. Experimental results on three public benchmarks show that our approach outperforms other regularization methods.

* 9 pages, 2 figures 
Viaarxiv icon

Global-Local Item Embedding for Temporal Set Prediction

Sep 05, 2021
Seungjae Jung, Young-Jin Park, Jisu Jeong, Kyung-Min Kim, Hiun Kim, Minkyu Kim, Hanock Kwak

Figure 1 for Global-Local Item Embedding for Temporal Set Prediction
Figure 2 for Global-Local Item Embedding for Temporal Set Prediction
Figure 3 for Global-Local Item Embedding for Temporal Set Prediction
Figure 4 for Global-Local Item Embedding for Temporal Set Prediction

Temporal set prediction is becoming increasingly important as many companies employ recommender systems in their online businesses, e.g., personalized purchase prediction of shopping baskets. While most previous techniques have focused on leveraging a user's history, the study of combining it with others' histories remains untapped potential. This paper proposes Global-Local Item Embedding (GLOIE) that learns to utilize the temporal properties of sets across whole users as well as within a user by coining the names as global and local information to distinguish the two temporal patterns. GLOIE uses Variational Autoencoder (VAE) and dynamic graph-based model to capture global and local information and then applies attention to integrate resulting item embeddings. Additionally, we propose to use Tweedie output for the decoder of VAE as it can easily model zero-inflated and long-tailed distribution, which is more suitable for several real-world data distributions than Gaussian or multinomial counterparts. When evaluated on three public benchmarks, our algorithm consistently outperforms previous state-of-the-art methods in most ranking metrics.

* 8 pages, 3 figures. To appear in RecSys 2021 LBR 
Viaarxiv icon

One4all User Representation for Recommender Systems in E-commerce

May 24, 2021
Kyuyong Shin, Hanock Kwak, Kyung-Min Kim, Minkyu Kim, Young-Jin Park, Jisu Jeong, Seungjae Jung

Figure 1 for One4all User Representation for Recommender Systems in E-commerce
Figure 2 for One4all User Representation for Recommender Systems in E-commerce
Figure 3 for One4all User Representation for Recommender Systems in E-commerce
Figure 4 for One4all User Representation for Recommender Systems in E-commerce

General-purpose representation learning through large-scale pre-training has shown promising results in the various machine learning fields. For an e-commerce domain, the objective of general-purpose, i.e., one for all, representations would be efficient applications for extensive downstream tasks such as user profiling, targeting, and recommendation tasks. In this paper, we systematically compare the generalizability of two learning strategies, i.e., transfer learning through the proposed model, ShopperBERT, vs. learning from scratch. ShopperBERT learns nine pretext tasks with 79.2M parameters from 0.8B user behaviors collected over two years to produce user embeddings. As a result, the MLPs that employ our embedding method outperform more complex models trained from scratch for five out of six tasks. Specifically, the pre-trained embeddings have superiority over the task-specific supervised features and the strong baselines, which learn the auxiliary dataset for the cold-start problem. We also show the computational efficiency and embedding visualization of the pre-trained features.

Viaarxiv icon

A Worrying Analysis of Probabilistic Time-series Models for Sales Forecasting

Nov 21, 2020
Seungjae Jung, Kyung-Min Kim, Hanock Kwak, Young-Jin Park

Figure 1 for A Worrying Analysis of Probabilistic Time-series Models for Sales Forecasting
Figure 2 for A Worrying Analysis of Probabilistic Time-series Models for Sales Forecasting
Figure 3 for A Worrying Analysis of Probabilistic Time-series Models for Sales Forecasting
Figure 4 for A Worrying Analysis of Probabilistic Time-series Models for Sales Forecasting

Probabilistic time-series models become popular in the forecasting field as they help to make optimal decisions under uncertainty. Despite the growing interest, a lack of thorough analysis hinders choosing what is worth applying for the desired task. In this paper, we analyze the performance of three prominent probabilistic time-series models for sales forecasting. To remove the role of random chance in architecture's performance, we make two experimental principles; 1) Large-scale dataset with various cross-validation sets. 2) A standardized training and hyperparameter selection. The experimental results show that a simple Multi-layer Perceptron and Linear Regression outperform the probabilistic models on RMSE without any feature engineering. Overall, the probabilistic models fail to achieve better performance on point estimation, such as RMSE and MAPE, than comparably simple baselines. We analyze and discuss the performances of probabilistic time-series models.

* NeurIPS 2020 workshop (I Can't Believe It's Not Better, ICBINB@NeurIPS 2020). All authors contributed equally to this research 
Viaarxiv icon

Encoder-Powered Generative Adversarial Networks

Jun 03, 2019
Jiseob Kim, Seungjae Jung, Hyundo Lee, Byoung-Tak Zhang

Figure 1 for Encoder-Powered Generative Adversarial Networks
Figure 2 for Encoder-Powered Generative Adversarial Networks
Figure 3 for Encoder-Powered Generative Adversarial Networks
Figure 4 for Encoder-Powered Generative Adversarial Networks

We present an encoder-powered generative adversarial network (EncGAN) that is able to learn both the multi-manifold structure and the abstract features of data. Unlike the conventional decoder-based GANs, EncGAN uses an encoder to model the manifold structure and invert the encoder to generate data. This unique scheme enables the proposed model to exclude discrete features from the smooth structure modeling and learn multi-manifold data without being hindered by the disconnections. Also, as EncGAN requires a single latent space to carry the information for all the manifolds, it builds abstract features shared among the manifolds in the latent space. For an efficient computation, we formulate EncGAN using a simple regularizer, and mathematically prove its validity. We also experimentally demonstrate that EncGAN successfully learns the multi-manifold structure and the abstract features of MNIST, 3D-chair and UT-Zap50k datasets. Our analysis shows that the learned abstract features are disentangled and make a good style-transfer even when the source data is off the trained distribution.

Viaarxiv icon