Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aixin Sun

Recommender for Its Purpose: Repeat and Exploration in Food Delivery Recommendations

Feb 22, 2024

Jiayu Li, Aixin Sun, Weizhi Ma, Peijie Sun, Min Zhang

Abstract:Recommender systems have been widely used for various scenarios, such as e-commerce, news, and music, providing online contents to help and enrich users' daily life. Different scenarios hold distinct and unique characteristics, calling for domain-specific investigations and corresponding designed recommender systems. Therefore, in this paper, we focus on food delivery recommendations to unveil unique features in this domain, where users order food online and enjoy their meals shortly after delivery. We first conduct an in-depth analysis on food delivery datasets. The analysis shows that repeat orders are prevalent for both users and stores, and situations' differently influence repeat and exploration consumption in the food delivery recommender systems. Moreover, we revisit the ability of existing situation-aware methods for repeat and exploration recommendations respectively, and find them unable to effectively solve both tasks simultaneously. Based on the analysis and experiments, we have designed two separate recommendation models -- ReRec for repeat orders and ExpRec for exploration orders; both are simple in their design and computation. We conduct experiments on three real-world food delivery datasets, and our proposed models outperform various types of baselines on repeat, exploration, and combined recommendation tasks. This paper emphasizes the importance of dedicated analyses and methods for domain-specific characteristics for the recommender system studies.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

SciAgent: Tool-augmented Language Models for Scientific Reasoning

Feb 21, 2024

Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla(+1 more)

Figure 1 for SciAgent: Tool-augmented Language Models for Scientific Reasoning

Figure 2 for SciAgent: Tool-augmented Language Models for Scientific Reasoning

Figure 3 for SciAgent: Tool-augmented Language Models for Scientific Reasoning

Figure 4 for SciAgent: Tool-augmented Language Models for Scientific Reasoning

Abstract:Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.

Via

Access Paper or Ask Questions

Parameter-Efficient Conversational Recommender System as a Language Processing Task

Feb 03, 2024

Mathieu Ravaut, Hao Zhang, Lu Xu, Aixin Sun, Yong Liu

Abstract:Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation. Prior work often utilizes external knowledge graphs for items' semantic information, a language model for dialogue generation, and a recommendation module for ranking relevant items. This combination of multiple components suffers from a cumbersome training process, and leads to semantic misalignment issues between dialogue generation and item recommendation. In this paper, we represent items in natural language and formulate CRS as a natural language processing task. Accordingly, we leverage the power of pre-trained language models to encode items, understand user intent via conversation, perform item recommendation through semantic matching, and generate dialogues. As a unified model, our PECRS (Parameter-Efficient CRS), can be optimized in a single stage, without relying on non-textual metadata such as a knowledge graph. Experiments on two benchmark CRS datasets, ReDial and INSPIRED, demonstrate the effectiveness of PECRS on recommendation and conversation. Our code is available at: https://github.com/Ravoxsg/efficient_unified_crs.

* 9 pages, 4 figures, 8 tables, EACL 2024 conference

Via

Access Paper or Ask Questions

Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization

Jan 16, 2024

Chongzhi Zhang, Mingyuan Zhang, Zhiyang Teng, Jiayi Li, Xizhou Zhu, Lewei Lu, Ziwei Liu, Aixin Sun

Abstract:Natural Language Video Localization (NLVL), grounding phrases from natural language descriptions to corresponding video segments, is a complex yet critical task in video understanding. Despite ongoing advancements, many existing solutions lack the capability to globally capture temporal dynamics of the video data. In this study, we present a novel approach to NLVL that aims to address this issue. Our method involves the direct generation of a global 2D temporal map via a conditional denoising diffusion process, based on the input video and language query. The main challenges are the inherent sparsity and discontinuity of a 2D temporal map in devising the diffusion decoder. To address these challenges, we introduce a multi-scale technique and develop an innovative diffusion decoder. Our approach effectively encapsulates the interaction between the query and video data across various time scales. Experiments on the Charades and DiDeMo datasets underscore the potency of our design.

Via

Access Paper or Ask Questions

On Position Bias in Summarization with Large Language Models

Oct 16, 2023

Mathieu Ravaut, Shafiq Joty, Aixin Sun, Nancy F. Chen

Figure 1 for On Position Bias in Summarization with Large Language Models

Figure 2 for On Position Bias in Summarization with Large Language Models

Figure 3 for On Position Bias in Summarization with Large Language Models

Figure 4 for On Position Bias in Summarization with Large Language Models

Abstract:Large language models (LLMs) excel in zero-shot abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, surpassing token limits of 32k or more. However, in the realm of multi-document question answering, language models exhibit uneven utilization of their input context. They tend to favor the initial and final segments, resulting in a U-shaped performance pattern concerning where the answer is located within the input. This bias raises concerns, particularly in summarization tasks where crucial content may be dispersed throughout the source document(s). This paper presents a comprehensive investigation encompassing 10 datasets, 4 LLMs, and 5 evaluation metrics to analyze how these models leverage their input for abstractive summarization. Our findings reveal a pronounced bias towards the introductory content (and to a lesser extent, the final content), posing challenges for LLM performance across a range of diverse summarization benchmarks.

* work in progress

Via

Access Paper or Ask Questions

Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution

Oct 09, 2023

Xinze Li, Yixin Cao2, Liangming Pan, Yubo Ma, Aixin Sun

Abstract:Although achieving great success, Large Language Models (LLMs) usually suffer from unreliable hallucinations. In this paper, we define a new task of Knowledge-aware Language Model Attribution (KaLMA) that improves upon three core concerns on conventional attributed LMs. First, we extend attribution source from unstructured texts to Knowledge Graph (KG), whose rich structures benefit both the attribution performance and working scenarios. Second, we propose a new ``Conscious Incompetence" setting considering the incomplete knowledge repository, where the model identifies the need for supporting knowledge beyond the provided KG. Third, we propose a comprehensive automatic evaluation metric encompassing text quality, citation quality, and text citation alignment. To implement the above innovations, we build a dataset in biography domain BioKaLMA via a well-designed evolutionary question generation strategy, to control the question complexity and necessary knowledge to the answer. For evaluation, we develop a baseline solution and demonstrate the room for improvement in LLMs' citation generation, emphasizing the importance of incorporating the "Conscious Incompetence" setting, and the critical role of retrieval accuracy.

* 8 pages

Via

Access Paper or Ask Questions

An Image Dataset for Benchmarking Recommender Systems with Raw Pixels

Sep 17, 2023

Yu Cheng, Yunzhu Pan, Jiaqi Zhang, Yongxin Ni, Aixin Sun, Fajie Yuan

Figure 1 for An Image Dataset for Benchmarking Recommender Systems with Raw Pixels

Figure 2 for An Image Dataset for Benchmarking Recommender Systems with Raw Pixels

Figure 3 for An Image Dataset for Benchmarking Recommender Systems with Raw Pixels

Figure 4 for An Image Dataset for Benchmarking Recommender Systems with Raw Pixels

Abstract:Recommender systems (RS) have achieved significant success by leveraging explicit identification (ID) features. However, the full potential of content features, especially the pure image pixel features, remains relatively unexplored. The limited availability of large, diverse, and content-driven image recommendation datasets has hindered the use of raw images as item representations. In this regard, we present PixelRec, a massive image-centric recommendation dataset that includes approximately 200 million user-image interactions, 30 million users, and 400,000 high-quality cover images. By providing direct access to raw image pixels, PixelRec enables recommendation models to learn item representation directly from them. To demonstrate its utility, we begin by presenting the results of several classical pure ID-based baseline models, termed IDNet, trained on PixelRec. Then, to show the effectiveness of the dataset's image features, we substitute the itemID embeddings (from IDNet) with a powerful vision encoder that represents items using their raw image pixels. This new model is dubbed PixelNet.Our findings indicate that even in standard, non-cold start recommendation settings where IDNet is recognized as highly effective, PixelNet can already perform equally well or even better than IDNet. Moreover, PixelNet has several other notable advantages over IDNet, such as being more effective in cold-start and cross-domain recommendation scenarios. These results underscore the importance of visual features in PixelRec. We believe that PixelRec can serve as a critical resource and testing ground for research on recommendation models that emphasize image pixel content. The dataset, code, and leaderboard will be available at https://github.com/westlake-repl/PixelRec.

Via

Access Paper or Ask Questions

FLM-101B: An Open LLM and How to Train It with $100K Budget

Sep 17, 2023

Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin(+3 more)

Abstract:Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.31T tokens can be trained with a budget of 100K US dollars. Inspired by IQ tests, we also consolidate an additional range of evaluations on top of existing evaluations that focus on knowledge-oriented abilities. These IQ evaluations include symbolic mapping, rule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model, named FLM-101B, trained with a budget of 100K US dollars, achieves performance comparable to powerful and well-known models, e.g., GPT-3 and GLM-130B, especially on the additional range of IQ evaluations. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.

Via

Access Paper or Ask Questions

Our Model Achieves Excellent Performance on MovieLens: What Does it Mean?

Jul 19, 2023

Yu-chen Fan, Yitong Ji, Jie Zhang, Aixin Sun

Figure 1 for Our Model Achieves Excellent Performance on MovieLens: What Does it Mean?

Figure 2 for Our Model Achieves Excellent Performance on MovieLens: What Does it Mean?

Figure 3 for Our Model Achieves Excellent Performance on MovieLens: What Does it Mean?

Figure 4 for Our Model Achieves Excellent Performance on MovieLens: What Does it Mean?

Abstract:A typical benchmark dataset for recommender system (RecSys) evaluation consists of user-item interactions generated on a platform within a time period. The interaction generation mechanism partially explains why a user interacts with (e.g.,like, purchase, rate) an item, and the context of when a particular interaction happened. In this study, we conduct a meticulous analysis on the MovieLens dataset and explain the potential impact on using the dataset for evaluating recommendation algorithms. We make a few main findings from our analysis. First, there are significant differences in user interactions at the different stages when a user interacts with the MovieLens platform. The early interactions largely define the user portrait which affect the subsequent interactions. Second, user interactions are highly affected by the candidate movies that are recommended by the platform's internal recommendation algorithm(s). Removal of interactions that happen nearer to the last few interactions of a user leads to increasing difficulty in learning user preference, thus deteriorating recommendation accuracy. Third, changing the order of user interactions makes it more difficult for sequential algorithms to capture the progressive interaction process. Based on these findings, we further discuss the discrepancy between the interaction generation mechanism that is employed by the MovieLens system and that of typical real world recommendation scenarios. In summary, models that achieve excellent recommendation accuracy on the MovieLens dataset may not demonstrate superior performance in practice for at least two kinds of differences: (i) the differences in the contexts of user-item interaction generation, and (ii) the differences in user knowledge about the item collections.

Via

Access Paper or Ask Questions

Dipping PLMs Sauce: Bridging Structure and Text for Effective Knowledge Graph Completion via Conditional Soft Prompting

Jul 04, 2023

Chen Chen, Yufei Wang, Aixin Sun, Bing Li, Kwok-Yan Lam

Abstract:Knowledge Graph Completion (KGC) often requires both KG structural and textual information to be effective. Pre-trained Language Models (PLMs) have been used to learn the textual information, usually under the fine-tune paradigm for the KGC task. However, the fine-tuned PLMs often overwhelmingly focus on the textual information and overlook structural knowledge. To tackle this issue, this paper proposes CSProm-KG (Conditional Soft Prompts for KGC) which maintains a balance between structural information and textual knowledge. CSProm-KG only tunes the parameters of Conditional Soft Prompts that are generated by the entities and relations representations. We verify the effectiveness of CSProm-KG on three popular static KGC benchmarks WN18RR, FB15K-237 and Wikidata5M, and two temporal KGC benchmarks ICEWS14 and ICEWS05-15. CSProm-KG outperforms competitive baseline models and sets new state-of-the-art on these benchmarks. We conduct further analysis to show (i) the effectiveness of our proposed components, (ii) the efficiency of CSProm-KG, and (iii) the flexibility of CSProm-KG.

* Accepted by ACL 2023 Findings, Long Paper

Via

Access Paper or Ask Questions