Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaofan Lin

Conditional Panoramic Image Generation via Masked Autoregressive Modeling

May 22, 2025

Chaoyang Wang, Xiangtai Li, Lu Qi, Xiaofan Lin, Jinbin Bai, Qianyu Zhou, Yunhai Tong

Figure 1 for Conditional Panoramic Image Generation via Masked Autoregressive Modeling

Figure 2 for Conditional Panoramic Image Generation via Masked Autoregressive Modeling

Figure 3 for Conditional Panoramic Image Generation via Masked Autoregressive Modeling

Figure 4 for Conditional Panoramic Image Generation via Masked Autoregressive Modeling

Abstract:Recent progress in panoramic image generation has underscored two critical limitations in existing approaches. First, most methods are built upon diffusion models, which are inherently ill-suited for equirectangular projection (ERP) panoramas due to the violation of the identically and independently distributed (i.i.d.) Gaussian noise assumption caused by their spherical mapping. Second, these methods often treat text-conditioned generation (text-to-panorama) and image-conditioned generation (panorama outpainting) as separate tasks, relying on distinct architectures and task-specific data. In this work, we propose a unified framework, Panoramic AutoRegressive model (PAR), which leverages masked autoregressive modeling to address these challenges. PAR avoids the i.i.d. assumption constraint and integrates text and image conditioning into a cohesive architecture, enabling seamless generation across tasks. To address the inherent discontinuity in existing generative models, we introduce circular padding to enhance spatial coherence and propose a consistency alignment strategy to improve generation quality. Extensive experiments demonstrate competitive performance in text-to-image generation and panorama outpainting tasks while showcasing promising scalability and generalization capabilities.

Via

Access Paper or Ask Questions

GENIUS: A Generative Framework for Universal Multimodal Search

Mar 25, 2025

Sungyeon Kim, Xinliang Zhu, Xiaofan Lin, Muhammet Bastan, Douglas Gray, Suha Kwak

Figure 1 for GENIUS: A Generative Framework for Universal Multimodal Search

Figure 2 for GENIUS: A Generative Framework for Universal Multimodal Search

Figure 3 for GENIUS: A Generative Framework for Universal Multimodal Search

Figure 4 for GENIUS: A Generative Framework for Universal Multimodal Search

Abstract:Generative retrieval is an emerging approach in information retrieval that generates identifiers (IDs) of target data based on a query, providing an efficient alternative to traditional embedding-based retrieval methods. However, existing models are task-specific and fall short of embedding-based retrieval in performance. This paper proposes GENIUS, a universal generative retrieval framework supporting diverse tasks across multiple modalities and domains. At its core, GENIUS introduces modality-decoupled semantic quantization, transforming multimodal data into discrete IDs encoding both modality and semantics. Moreover, to enhance generalization, we propose a query augmentation that interpolates between a query and its target, allowing GENIUS to adapt to varied query forms. Evaluated on the M-BEIR benchmark, it surpasses prior generative methods by a clear margin. Unlike embedding-based retrieval, GENIUS consistently maintains high retrieval speed across database size, with competitive performance across multiple benchmarks. With additional re-ranking, GENIUS often achieves results close to those of embedding-based methods while preserving efficiency.

* Accepted to CVPR 2025

Via

Access Paper or Ask Questions

ProcSim: Proxy-based Confidence for Robust Similarity Learning

Nov 01, 2023

Oriol Barbany, Xiaofan Lin, Muhammet Bastan, Arnab Dhua

Abstract:Deep Metric Learning (DML) methods aim at learning an embedding space in which distances are closely related to the inherent semantic similarity of the inputs. Previous studies have shown that popular benchmark datasets often contain numerous wrong labels, and DML methods are susceptible to them. Intending to study the effect of realistic noise, we create an ontology of the classes in a dataset and use it to simulate semantically coherent labeling mistakes. To train robust DML models, we propose ProcSim, a simple framework that assigns a confidence score to each sample using the normalized distance to its class representative. The experimental results show that the proposed method achieves state-of-the-art performance on the DML benchmark datasets injected with uniform and the proposed semantically coherent noise.

* Accepted to the algorithms track of WACV 2024

Via

Access Paper or Ask Questions

Towards Accurate Binary Convolutional Neural Network

Nov 30, 2017

Xiaofan Lin, Cong Zhao, Wei Pan

Figure 1 for Towards Accurate Binary Convolutional Neural Network

Figure 2 for Towards Accurate Binary Convolutional Neural Network

Figure 3 for Towards Accurate Binary Convolutional Neural Network

Abstract:We introduce a novel scheme to train binary convolutional neural networks (CNNs) -- CNNs with weights and activations constrained to {-1,+1} at run-time. It has been known that using binary weights and activations drastically reduce memory size and accesses, and can replace arithmetic operations with more efficient bitwise operations, leading to much faster test-time inference and lower power consumption. However, previous works on binarizing CNNs usually result in severe prediction accuracy degradation. In this paper, we address this issue with two major innovations: (1) approximating full-precision weights with the linear combination of multiple binary weight bases; (2) employing multiple binary activations to alleviate information loss. The implementation of the resulting binary CNN, denoted as ABC-Net, is shown to achieve much closer performance to its full-precision counterpart, and even reach the comparable prediction accuracy on ImageNet and forest trail datasets, given adequate binary weight bases and activations.

* NIPS 2017

Via

Access Paper or Ask Questions