Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sumin Shim

Anchoring and Rescaling Attention for Semantically Coherent Inbetweening

Mar 18, 2026

Tae Eun Choi, Sumin Shim, Junhyeok Kim, Seong Jae Hwang

Abstract:Generative inbetweening (GI) seeks to synthesize realistic intermediate frames between the first and last keyframes beyond mere interpolation. As sequences become sparser and motions larger, previous GI models struggle with inconsistent frames with unstable pacing and semantic misalignment. Since GI involves fixed endpoints and numerous plausible paths, this task requires additional guidance gained from the keyframes and text to specify the intended path. Thus, we give semantic and temporal guidance from the keyframes and text onto each intermediate frame through Keyframe-anchored Attention Bias. We also better enforce frame consistency with Rescaled Temporal RoPE, which allows self-attention to attend to keyframes more faithfully. TGI-Bench, the first benchmark specifically designed for text-conditioned GI evaluation, enables challenge-targeted evaluation to analyze GI models. Without additional training, our method achieves state-of-the-art frame consistency, semantic fidelity, and pace stability for both short and long sequences across diverse challenges.

* Accepted to CVPR 2026; Code is released at https://github.com/teunchoi/TGI

Via

Access Paper or Ask Questions

G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness

May 08, 2025

Jaehyun Jeon, Janghan Yoon, Minsoo Kim, Sumin Shim, Yejin Choi, Hanbin Kim, Youngjae Yu

Figure 1 for G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness

Figure 2 for G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness

Figure 3 for G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness

Figure 4 for G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness

Abstract:Evaluating user interface (UI) design effectiveness extends beyond aesthetics to influencing user behavior, a principle central to Design Persuasiveness. A/B testing is the predominant method for determining which UI variations drive higher user engagement, but it is costly and time-consuming. While recent Vision-Language Models (VLMs) can process automated UI analysis, current approaches focus on isolated design attributes rather than comparative persuasiveness-the key factor in optimizing user interactions. To address this, we introduce WiserUI-Bench, a benchmark designed for Pairwise UI Design Persuasiveness Assessment task, featuring 300 real-world UI image pairs labeled with A/B test results and expert rationales. Additionally, we propose G-FOCUS, a novel inference-time reasoning strategy that enhances VLM-based persuasiveness assessment by reducing position bias and improving evaluation accuracy. Experimental results show that G-FOCUS surpasses existing inference strategies in consistency and accuracy for pairwise UI evaluation. Through promoting VLM-driven evaluation of UI persuasiveness, our work offers an approach to complement A/B testing, propelling progress in scalable UI preference modeling and design optimization. Code and data will be released publicly.

* 31 pages, 17 figures

Via

Access Paper or Ask Questions

Towards Visual Text Design Transfer Across Languages

Oct 24, 2024

Yejin Choi, Jiwan Chung, Sumin Shim, Giyeong Oh, Youngjae Yu

Figure 1 for Towards Visual Text Design Transfer Across Languages

Figure 2 for Towards Visual Text Design Transfer Across Languages

Figure 3 for Towards Visual Text Design Transfer Across Languages

Figure 4 for Towards Visual Text Design Transfer Across Languages

Abstract:Visual text design plays a critical role in conveying themes, emotions, and atmospheres in multimodal formats such as film posters and album covers. Translating these visual and textual elements across languages extends the concept of translation beyond mere text, requiring the adaptation of aesthetic and stylistic features. To address this, we introduce a novel task of Multimodal Style Translation (MuST-Bench), a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems while preserving design intent. Our initial experiments on MuST-Bench reveal that existing visual text generation models struggle with the proposed task due to the inadequacy of textual descriptions in conveying visual design. In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions. SIGIL enhances image generation models through three innovations: glyph latent for multilingual settings, pretrained VAEs for stable style guidance, and an OCR model with reinforcement learning feedback for optimizing readable character generation. SIGIL outperforms existing baselines by achieving superior style consistency and legibility while maintaining visual fidelity, setting itself apart from traditional description-based approaches. We release MuST-Bench publicly for broader use and exploration https://huggingface.co/datasets/yejinc/MuST-Bench.

Via

Access Paper or Ask Questions