Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nanshan Jia

Controllable Coupled Image Generation via Diffusion Models

Jun 07, 2025

Chenfei Yuan, Nanshan Jia, Hangqi Li, Peter W. Glynn, Zeyu Zheng

Abstract:We provide an attention-level control method for the task of coupled image generation, where "coupled" means that multiple simultaneously generated images are expected to have the same or very similar backgrounds. While backgrounds coupled, the centered objects in the generated images are still expected to enjoy the flexibility raised from different text prompts. The proposed method disentangles the background and entity components in the model's cross-attention modules, attached with a sequence of time-varying weight control parameters depending on the time step of sampling. We optimize this sequence of weight control parameters with a combined objective that assesses how coupled the backgrounds are as well as text-to-image alignment and overall visual quality. Empirical results demonstrate that our method outperforms existing approaches across these criteria.

Via

Access Paper or Ask Questions

Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Apr 15, 2025

Nanshan Jia, Chenfei Yuan, Yuhang Wu, Zeyu Zheng

Figure 1 for Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Figure 2 for Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Figure 3 for Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Figure 4 for Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Abstract:The fast development of Large Language Models (LLMs) offers growing opportunities to further improve sequential recommendation systems. Yet for some practitioners, integrating LLMs to their existing base recommendation systems raises questions about model interpretability, transparency and related safety. To partly alleviate challenges from these questions, we propose guided embedding refinement, a method that carries out a guided and interpretable usage of LLM to enhance the embeddings associated with the base recommendation system. Instead of directly using LLMs as the backbone of sequential recommendation systems, we utilize them as auxiliary tools to emulate the sales logic of recommendation and generate guided embeddings that capture domain-relevant semantic information on interpretable attributes. Benefiting from the strong generalization capabilities of the guided embedding, we construct refined embedding by using the guided embedding and reduced-dimension version of the base embedding. We then integrate the refined embedding into the recommendation module for training and inference. A range of numerical experiments demonstrate that guided embedding is adaptable to various given existing base embedding models, and generalizes well across different recommendation tasks. The numerical results show that the refined embedding not only improves recommendation performance, achieving approximately $10\%$ to $50\%$ gains in Mean Reciprocal Rank (MRR), Recall rate, and Normalized Discounted Cumulative Gain (NDCG), but also enhances interpretability, as evidenced by case studies.

Via

Access Paper or Ask Questions

Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Oct 24, 2024

Nanshan Jia, Tingyu Zhu, Haoyu Liu, Zeyu Zheng

Figure 1 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Figure 2 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Figure 3 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Figure 4 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Abstract:We propose a class of structured diffusion models, in which the prior distribution is chosen as a mixture of Gaussians, rather than a standard Gaussian distribution. The specific mixed Gaussian distribution, as prior, can be chosen to incorporate certain structured information of the data. We develop a simple-to-implement training procedure that smoothly accommodates the use of mixed Gaussian as prior. Theory is provided to quantify the benefits of our proposed models, compared to the classical diffusion models. Numerical experiments with synthetic, image and operational data are conducted to show comparative advantages of our model. Our method is shown to be robust to mis-specifications and in particular suits situations where training resources are limited or faster training in real time is desired.

Via

Access Paper or Ask Questions