Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yubin Zhang

IDProxy: Cold-Start CTR Prediction for Ads and Recommendation at Xiaohongshu with Multimodal LLMs

Mar 02, 2026

Yubin Zhang, Haiming Xu, Guillaume Salha-Galvan, Ruiyan Han, Feiyang Xiao, Yanhua Huang, Li Lin, Yang Luo, Yao Hu

Abstract:Click-through rate (CTR) models in advertising and recommendation systems rely heavily on item ID embeddings, which struggle in item cold-start settings. We present IDProxy, a solution that leverages multimodal large language models (MLLMs) to generate proxy embeddings from rich content signals, enabling effective CTR prediction for new items without usage data. These proxies are explicitly aligned with the existing ID embedding space and are optimized end-to-end under CTR objectives together with the ranking model, allowing seamless integration into existing large-scale ranking pipelines. Offline experiments and online A/B tests demonstrate the effectiveness of IDProxy, which has been successfully deployed in both Content Feed and Display Ads features of Xiaohongshu's Explore Feed, serving hundreds of millions of users daily.

Via

Access Paper or Ask Questions

Non-Invasive Diagnosis for Clubroot Using Terahertz Time-Domain Spectroscopy and Physics-Constrained Neural Networks

Jan 19, 2026

Pengfei Zhu, Jiaxu Wu, Alyson Deslongchamps, Yubin Zhang, Xavier Maldague

Abstract:Clubroot, a major soilborne disease affecting canola and other cruciferous crops, is characterized by the development of large galls on the roots of susceptible hosts. In this study, we present the first application of terahertz time-domain spectroscopy (THz-TDS) as a non-invasive diagnosis tool in plant pathology. Compared with conventional molecular, spectroscopic, and immunoassay-based methods, THz-TDS offers distinct advantages, including non-contact, non-destructive, and preparation-free measurement, enabling rapid in situ screening of plant and soil samples. Our results demonstrate that THz-TDS can differentiate between healthy and clubroot-infected tissues by detecting both structural and biochemical alterations. Specifically, infected roots exhibit a blue shift in the refractive index in the low-frequency THz range, along with distinct peaks-indicative of disruptions in water transport and altered metabolic activity in both roots and leaves. Interestingly, the characteristic root swelling observed in infected plants reflects internal tissue disorganization rather than an actual increase in water content. Furthermore, a physics-constrained neural network is proposed to extract the main feature in THz-TDS. A comprehensive evaluation, including time-domain signals, amplitude and phase images, refractive index and absorption coefficient maps, and principal component analysis, provides enhanced contrast and spatial resolution compared to raw time-domain or frequency signals. These findings suggest that THz-TDS holds significant potential for early, non-destructive detection of plant diseases and may serve as a valuable tool to limit their spread in agricultural systems.

Via

Access Paper or Ask Questions

A Metric for MLLM Alignment in Large-scale Recommendation

Aug 07, 2025

Yubin Zhang, Yanhua Huang, Haiming Xu, Mingliang Qi, Chang Wang, Jiarui Jin, Xiangyuan Ren, Xiaodan Wang, Ruiwen Xu

Figure 1 for A Metric for MLLM Alignment in Large-scale Recommendation

Figure 2 for A Metric for MLLM Alignment in Large-scale Recommendation

Figure 3 for A Metric for MLLM Alignment in Large-scale Recommendation

Figure 4 for A Metric for MLLM Alignment in Large-scale Recommendation

Abstract:Multimodal recommendation has emerged as a critical technique in modern recommender systems, leveraging content representations from advanced multimodal large language models (MLLMs). To ensure these representations are well-adapted, alignment with the recommender system is essential. However, evaluating the alignment of MLLMs for recommendation presents significant challenges due to three key issues: (1) static benchmarks are inaccurate because of the dynamism in real-world applications, (2) evaluations with online system, while accurate, are prohibitively expensive at scale, and (3) conventional metrics fail to provide actionable insights when learned representations underperform. To address these challenges, we propose the Leakage Impact Score (LIS), a novel metric for multimodal recommendation. Rather than directly assessing MLLMs, LIS efficiently measures the upper bound of preference data. We also share practical insights on deploying MLLMs with LIS in real-world scenarios. Online A/B tests on both Content Feed and Display Ads of Xiaohongshu's Explore Feed production demonstrate the effectiveness of our proposed method, showing significant improvements in user spent time and advertiser value.

* Pre-print.Under Review

Via

Access Paper or Ask Questions

Deep Speech Synthesis from MRI-Based Articulatory Representations

Jul 05, 2023

Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W Black, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

Figure 1 for Deep Speech Synthesis from MRI-Based Articulatory Representations

Figure 2 for Deep Speech Synthesis from MRI-Based Articulatory Representations

Figure 3 for Deep Speech Synthesis from MRI-Based Articulatory Representations

Figure 4 for Deep Speech Synthesis from MRI-Based Articulatory Representations

Abstract:In this paper, we study articulatory synthesis, a speech synthesis method using human vocal tract information that offers a way to develop efficient, generalizable and interpretable synthesizers. While recent advances have enabled intelligible articulatory synthesis using electromagnetic articulography (EMA), these methods lack critical articulatory information like excitation and nasality, limiting generalization capabilities. To bridge this gap, we propose an alternative MRI-based feature set that covers a much more extensive articulatory space than EMA. We also introduce normalization and denoising procedures to enhance the generalizability of deep learning methods trained on MRI data. Moreover, we propose an MRI-to-speech model that improves both computational efficiency and speech fidelity. Finally, through a series of ablations, we show that the proposed MRI representation is more comprehensive than EMA and identify the most suitable MRI feature subset for articulatory synthesis.

Via

Access Paper or Ask Questions