Recommendation is the task of providing personalized suggestions to users based on their preferences and behavior.
Encyclopedic knowledge platforms are key gateways through which users explore information online. The recent release of Grokipedia, a fully AI-generated encyclopedia, introduces a new alternative to traditional, well-established platforms like Wikipedia. In this context, search engine mechanisms play an important role in guiding users exploratory paths, yet their behavior across different encyclopedic systems remains underexplored. In this work, we address this gap by providing the first comparative analysis of search engine in Wikipedia and Grokipedia. Using nearly 10,000 neutral English words and their substrings as queries, we collect over 70,000 search engine results and examine their semantic alignment, overlap, and topical structure. We find that both platforms frequently generate results that are weakly related to the original query and, in many cases, surface unexpected content starting from innocuous queries. Despite these shared properties, the two systems often produce substantially different recommendation sets for the same query. Through topical annotation and trajectory analysis, we further identify systematic differences in how content categories are surfaced and how search engine results evolve over multiple stages of exploration. Overall, our findings show that unexpected search engine outcomes are a common feature of both the platforms, even though they exhibit discrepancies in terms of topical distribution and query suggestions.
Contrastive learning has demonstrated promising potential in recommender systems. Existing methods typically construct sparser views by randomly perturbing the original interaction graph, as they have no idea about the authentic user preferences. Owing to the sparse nature of recommendation data, this paradigm can only capture insufficient semantic information. To address the issue, we propose InfoDCL, a novel diffusion-based contrastive learning framework for recommendation. Rather than injecting randomly sampled Gaussian noise, we employ a single-step diffusion process that integrates noise with auxiliary semantic information to generate signals and feed them to the standard diffusion process to generate authentic user preferences as contrastive views. Besides, based on a comprehensive analysis of the mutual influence between generation and preference learning in InfoDCL, we build a collaborative training objective strategy to transform the interference between them into mutual collaboration. Additionally, we employ multiple GCN layers only during inference stage to incorporate higher-order co-occurrence information while maintaining training efficiency. Extensive experiments on five real-world datasets demonstrate that InfoDCL significantly outperforms state-of-the-art methods. Our InfoDCL offers an effective solution for enhancing recommendation performance and suggests a novel paradigm for applying diffusion method in contrastive learning frameworks.
The Science Consultant Agent is a web-based Artificial Intelligence (AI) tool that helps practitioners select and implement the most effective modeling strategy for AI-based solutions. It operates through four core components: Questionnaire, Smart Fill, Research-Guided Recommendation, and Prototype Builder. By combining structured questionnaires, literature-backed solution recommendations, and prototype generation, the Science Consultant Agent accelerates development for everyone from Product Managers and Software Developers to Researchers. The full pipeline is illustrated in Figure 1.




Language models (LMs) are becoming increasingly dependent on external tools. LM-based agentic frameworks frequently interact with their environment via such tools to search files, run code, call APIs, etc. Further, modern reasoning-based LMs use tools such as web search and Python code execution to enhance their reasoning capabilities. While tools greatly improve the capabilities of LMs, they also introduce performance bottlenecks during the inference process. In this paper, we introduce novel systems optimizations to address such performance bottlenecks by speculating tool calls and forcing sequences to remain resident in the inference engine to minimize overheads. Our optimizations lead to throughput improvements of several hundred tokens per second when hosting inference for LM agents. We provide a theoretical analysis of our algorithms to provide insights into speculation configurations that will yield the best performance. Further, we recommend a new "tool cache" API endpoint to enable LM providers to easily adopt these optimizations.




The rapid development of generative artificial intelligence (AI) and large language models (LLMs), and the availability of services that make them accessible, have led the general public to begin incorporating them into everyday life. The extended reality (XR) community has also sought to integrate LLMs, particularly in the form of conversational agents, to enhance user experience and task efficiency. When interacting with such conversational agents, users may easily disclose sensitive information due to the naturalistic flow of the conversations, and combining such conversational data with fine-grained sensor data may lead to novel privacy issues. To address these issues, a user-centric understanding of technology acceptance and concerns is essential. Therefore, to this end, we conducted a large-scale crowdsourcing study with 1036 participants, examining user decision-making processes regarding LLM-powered conversational agents in XR, across factors of XR setting type, speech interaction type, and data processing location. We found that while users generally accept these technologies, they express concerns related to security, privacy, social implications, and trust. Our results suggest that familiarity plays a crucial role, as daily generative AI use is associated with greater acceptance. In contrast, previous ownership of XR devices is linked to less acceptance, possibly due to existing familiarity with the settings. We also found that men report higher acceptance with fewer concerns than women. Regarding data type sensitivity, location data elicited the most significant concern, while body temperature and virtual object states were considered least sensitive. Overall, our study highlights the importance of practitioners effectively communicating their measures to users, who may remain distrustful. We conclude with implications and recommendations for LLM-powered XR.




Recent advances in explainable recommendations have explored the integration of language models to analyze natural language rationales for user-item interactions. Despite their potential, existing methods often rely on ID-based representations that obscure semantic meaning and impose structural constraints on language models, thereby limiting their applicability in open-ended scenarios. These challenges are intensified by the complex nature of real-world interactions, where diverse user intents are entangled and collaborative signals rarely align with linguistic semantics. To overcome these limitations, we propose BEAT, a unified and transferable framework that tokenizes user and item behaviors into discrete, interpretable sequences. We construct a behavior vocabulary via a vector-quantized autoencoding process that disentangles macro-level interests and micro-level intentions from graph-based representations. We then introduce multi-level semantic supervision to bridge the gap between behavioral signals and language space. A semantic alignment regularization mechanism is designed to embed behavior tokens directly into the input space of frozen language models. Experiments on three public datasets show that BEAT improves zero-shot recommendation performance while generating coherent and informative explanations. Further analysis demonstrates that our behavior tokens capture fine-grained semantics and offer a plug-and-play interface for integrating complex behavior patterns into large language models.




Every day, a significant number of users visit the internet for different needs. The owners of a website generate profits from the user interaction with the contents or items of the website. A robust recommendation system can increase user interaction with a website by recommending items according to the user's unique preferences. BERT and CNN-integrated neural collaborative filtering (NCF) have been proposed for the recommendation system in this experiment. The proposed model takes inputs from the user and item profile and finds the user's interest. This model can handle numeric, categorical, and image data to extract the latent features from the inputs. The model is trained and validated on a small sample of the MovieLens dataset for 25 epochs. The same dataset has been used to train and validate a simple NCF and a BERT-based NCF model and compared with the proposed model. The proposed model outperformed those two baseline models. The obtained result for the proposed model is 0.72 recall and 0.486 Hit Ratio @ 10 for 799 users on the MovieLens dataset. This experiment concludes that considering both categorical and image data can improve the performance of a recommendation system.
Recommendation plays a key role in e-commerce, enhancing user experience and boosting commercial success. Existing works mainly focus on recommending a set of items, but online e-commerce platforms have recently begun to pay attention to exploring users' potential interests at the category level. Category-level recommendation allows e-commerce platforms to promote users' engagements by expanding their interests to different types of items. In addition, it complements item-level recommendations when the latter becomes extremely challenging for users with little-known information and past interactions. Furthermore, it facilitates item-level recommendations in existing works. The predicted category, which is called intention in those works, aids the exploration of item-level preference. However, such category-level preference prediction has mostly been accomplished through applying item-level models. Some key differences between item-level recommendations and category-level recommendations are ignored in such a simplistic adaptation. In this paper, we propose a cascading category recommender (CCRec) model with a variational autoencoder (VAE) to encode item-level information to perform category-level recommendations. Experiments show the advantages of this model over methods designed for item-level recommendations.
Deep Learning Recommendation Models (DLRMs) underpin personalized services but face a critical freshness-accuracy tradeoff due to massive parameter synchronization overheads. Production DLRMs deploy decoupled training/inference clusters, where synchronizing petabyte-scale embedding tables (EMTs) causes multi-minute staleness, degrading recommendation quality and revenue. We observe that (1) inference nodes exhibit sustained CPU underutilization (peak <= 20%), and (2) EMT gradients possess intrinsic low-rank structure, enabling compact update representation. We present LiveUpdate, a system that eliminates inter-cluster synchronization by colocating Low-Rank Adaptation (LoRA) trainers within inference nodes. LiveUpdate addresses two core challenges: (1) dynamic rank adaptation via singular value monitoring to constrain memory overhead (<2% of EMTs), and (2) NUMA-aware resource scheduling with hardware-enforced QoS to eliminate update inference contention (P99 latency impact <20ms). Evaluations show LiveUpdate reduces update costs by 2x versus delta-update baselines while achieving higher accuracy within 1-hour windows. By transforming idle inference resources into freshness engines, LiveUpdate delivers online model updates while outperforming state-of-the-art delta-update methods by 0.04% to 0.24% in accuracy.
Recently, transformer-based generative recommendation has garnered significant attention for user behavior modeling. However, it often requires discretizing items into multi-code representations (e.g., typically four code tokens or more), which sharply increases the length of the original item sequence. This expansion poses challenges to transformer-based models for modeling user behavior sequences with inherent noises, since they tend to overallocate attention to irrelevant or noisy context. To mitigate this issue, we propose FAIR, the first generative recommendation framework with focused attention, which enhances attention scores to relevant context while suppressing those to irrelevant ones. Specifically, we propose (1) a focused attention mechanism integrated into the standard Transformer, which learns two separate sets of Q and K attention weights and computes their difference as the final attention scores to eliminate attention noise while focusing on relevant contexts; (2) a noise-robustness objective, which encourages the model to maintain stable attention patterns under stochastic perturbations, preventing undesirable shifts toward irrelevant context due to noise; and (3) a mutual information maximization objective, which guides the model to identify contexts that are most informative for next-item prediction. We validate the effectiveness of FAIR on four public benchmarks, demonstrating its superior performance compared to existing methods.