Alert button
Picture for Dong Yao

Dong Yao

Alert button

Denoising Multi-modal Sequential Recommenders with Contrastive Learning

May 03, 2023
Dong Yao, Shengyu Zhang, Zhou Zhao, Jieming Zhu, Wenqiao Zhang, Rui Zhang, Xiaofei He, Fei Wu

Figure 1 for Denoising Multi-modal Sequential Recommenders with Contrastive Learning
Figure 2 for Denoising Multi-modal Sequential Recommenders with Contrastive Learning
Figure 3 for Denoising Multi-modal Sequential Recommenders with Contrastive Learning
Figure 4 for Denoising Multi-modal Sequential Recommenders with Contrastive Learning

There is a rapidly-growing research interest in engaging users with multi-modal data for accurate user modeling on recommender systems. Existing multimedia recommenders have achieved substantial improvements by incorporating various modalities and devising delicate modules. However, when users decide to interact with items, most of them do not fully read the content of all modalities. We refer to modalities that directly cause users' behaviors as point-of-interests, which are important aspects to capture users' interests. In contrast, modalities that do not cause users' behaviors are potential noises and might mislead the learning of a recommendation model. Not surprisingly, little research in the literature has been devoted to denoising such potential noises due to the inaccessibility of users' explicit feedback on their point-of-interests. To bridge the gap, we propose a weakly-supervised framework based on contrastive learning for denoising multi-modal recommenders (dubbed Demure). In a weakly-supervised manner, Demure circumvents the requirement of users' explicit feedback and identifies the noises by analyzing the modalities of all interacted items from a given user.

Viaarxiv icon

CCL4Rec: Contrast over Contrastive Learning for Micro-video Recommendation

Aug 17, 2022
Shengyu Zhang, Bofang Li, Dong Yao, Fuli Feng, Jieming Zhu, Wenyan Fan, Zhou Zhao, Xiaofei He, Tat-seng Chua, Fei Wu

Figure 1 for CCL4Rec: Contrast over Contrastive Learning for Micro-video Recommendation
Figure 2 for CCL4Rec: Contrast over Contrastive Learning for Micro-video Recommendation
Figure 3 for CCL4Rec: Contrast over Contrastive Learning for Micro-video Recommendation
Figure 4 for CCL4Rec: Contrast over Contrastive Learning for Micro-video Recommendation

Micro-video recommender systems suffer from the ubiquitous noises in users' behaviors, which might render the learned user representation indiscriminating, and lead to trivial recommendations (e.g., popular items) or even weird ones that are far beyond users' interests. Contrastive learning is an emergent technique for learning discriminating representations with random data augmentations. However, due to neglecting the noises in user behaviors and treating all augmented samples equally, the existing contrastive learning framework is insufficient for learning discriminating user representations in recommendation. To bridge this research gap, we propose the Contrast over Contrastive Learning framework for training recommender models, named CCL4Rec, which models the nuances of different augmented views by further contrasting augmented positives/negatives with adaptive pulling/pushing strengths, i.e., the contrast over (vanilla) contrastive learning. To accommodate these contrasts, we devise the hardness-aware augmentations that track the importance of behaviors being replaced in the query user and the relatedness of substitutes, and thus determining the quality of augmented positives/negatives. The hardness-aware augmentation also permits controllable contrastive learning, leading to performance gains and robust training. In this way, CCL4Rec captures the nuances of historical behaviors for a given user, which explicitly shields off the learned user representation from the effects of noisy behaviors. We conduct extensive experiments on two micro-video recommendation benchmarks, which demonstrate that CCL4Rec with far less model parameters could achieve comparable performance to existing state-of-the-art method, and improve the training/inference speed by several orders of magnitude.

* 11 pages, 4 figures 
Viaarxiv icon

Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation

Aug 17, 2022
Shengyu Zhang, Lingxiao Yang, Dong Yao, Yujie Lu, Fuli Feng, Zhou Zhao, Tat-seng Chua, Fei Wu

Figure 1 for Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation
Figure 2 for Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation
Figure 3 for Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation
Figure 4 for Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation

Effectively representing users lie at the core of modern recommender systems. Since users' interests naturally exhibit multiple aspects, it is of increasing interest to develop multi-interest frameworks for recommendation, rather than represent each user with an overall embedding. Despite their effectiveness, existing methods solely exploit the encoder (the forward flow) to represent multiple aspects of interests. However, without explicit regularization, the interest embeddings may not be distinct from each other nor semantically reflect representative historical items. Towards this end, we propose the Re4 framework, which leverages the backward flow to reexamine each interest embedding. Specifically, Re4 encapsulates three backward flows, i.e., 1) Re-contrast, which drives each interest embedding to be distinct from other interests using contrastive learning; 2) Re-attend, which ensures the interest-item correlation estimation in the forward flow to be consistent with the criterion used in final recommendation; and 3) Re-construct, which ensures that each interest embedding can semantically reflect the information of representative items that relate to the corresponding interest. We demonstrate the novel forward-backward multi-interest paradigm on ComiRec, and perform extensive experiments on three real-world datasets. Empirical studies validate that Re4 helps to learn learning distinct and effective multi-interest representations.

* 11 pages, 4 figures, accepted by WWW 2022 
Viaarxiv icon

Contrastive Learning with Positive-Negative Frame Mask for Music Representation

Apr 03, 2022
Dong Yao, Zhou Zhao, Shengyu Zhang, Jieming Zhu, Yudong Zhu, Rui Zhang, Xiuqiang He

Figure 1 for Contrastive Learning with Positive-Negative Frame Mask for Music Representation
Figure 2 for Contrastive Learning with Positive-Negative Frame Mask for Music Representation
Figure 3 for Contrastive Learning with Positive-Negative Frame Mask for Music Representation
Figure 4 for Contrastive Learning with Positive-Negative Frame Mask for Music Representation

Self-supervised learning, especially contrastive learning, has made an outstanding contribution to the development of many deep learning research fields. Recently, researchers in the acoustic signal processing field noticed its success and leveraged contrastive learning for better music representation. Typically, existing approaches maximize the similarity between two distorted audio segments sampled from the same music. In other words, they ensure a semantic agreement at the music level. However, those coarse-grained methods neglect some inessential or noisy elements at the frame level, which may be detrimental to the model to learn the effective representation of music. Towards this end, this paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR. Concretely, PEMR incorporates a Positive-Negative Mask Generation module, which leverages transformer blocks to generate frame masks on the Log-Mel spectrogram. We can generate self-augmented negative and positive samples by masking important components or inessential components, respectively. We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music. We conduct experiments on four public datasets. The experimental results of two music-related downstream tasks, music classification, and cover song identification, demonstrate the generalization ability and transferability of music representation learned by PEMR.

* Accepted by WWW2022 
Viaarxiv icon

CauseRec: Counterfactual User Sequence Synthesis for Sequential Recommendation

Sep 11, 2021
Shengyu Zhang, Dong Yao, Zhou Zhao, Tat-seng Chua, Fei Wu

Figure 1 for CauseRec: Counterfactual User Sequence Synthesis for Sequential Recommendation
Figure 2 for CauseRec: Counterfactual User Sequence Synthesis for Sequential Recommendation
Figure 3 for CauseRec: Counterfactual User Sequence Synthesis for Sequential Recommendation
Figure 4 for CauseRec: Counterfactual User Sequence Synthesis for Sequential Recommendation

Learning user representations based on historical behaviors lies at the core of modern recommender systems. Recent advances in sequential recommenders have convincingly demonstrated high capability in extracting effective user representations from the given behavior sequences. Despite significant progress, we argue that solely modeling the observational behaviors sequences may end up with a brittle and unstable system due to the noisy and sparse nature of user interactions logged. In this paper, we propose to learn accurate and robust user representations, which are required to be less sensitive to (attack on) noisy behaviors and trust more on the indispensable ones, by modeling counterfactual data distribution. Specifically, given an observed behavior sequence, the proposed CauseRec framework identifies dispensable and indispensable concepts at both the fine-grained item level and the abstract interest level. CauseRec conditionally samples user concept sequences from the counterfactual data distributions by replacing dispensable and indispensable concepts within the original concept sequence. With user representations obtained from the synthesized user sequences, CauseRec performs contrastive user representation learning by contrasting the counterfactual with the observational. We conduct extensive experiments on real-world public recommendation benchmarks and justify the effectiveness of CauseRec with multi-aspects model analysis. The results demonstrate that the proposed CauseRec outperforms state-of-the-art sequential recommenders by learning accurate and robust user representations.

* Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) 
Viaarxiv icon

Modeling High-order Interactions across Multi-interests for Micro-video Reommendation

Apr 01, 2021
Dong Yao, Shengyu Zhang, Zhou Zhao, Wenyan Fan, Jieming Zhu, Xiuqiang He, Fei Wu

Figure 1 for Modeling High-order Interactions across Multi-interests for Micro-video Reommendation
Figure 2 for Modeling High-order Interactions across Multi-interests for Micro-video Reommendation
Figure 3 for Modeling High-order Interactions across Multi-interests for Micro-video Reommendation

Personalized recommendation system has become pervasive in various video platform. Many effective methods have been proposed, but most of them didn't capture the user's multi-level interest trait and dependencies between their viewed micro-videos well. To solve these problems, we propose a Self-over-Co Attention module to enhance user's interest representation. In particular, we first use co-attention to model correlation patterns across different levels and then use self-attention to model correlation patterns within a specific level. Experimental results on filtered public datasets verify that our presented module is useful.

* accepted to AAAI 2021 
Viaarxiv icon