Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Katharina Schmid

GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

Mar 27, 2026

Nicolas von Lützow, Barbara Rössle, Katharina Schmid, Matthias Nießner

Abstract:Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternative and introduce GaussianGPT, a transformer-based model that directly generates 3D Gaussians via next-token prediction, thus facilitating full 3D scene generation. We first compress Gaussian primitives into a discrete latent grid using a sparse 3D convolutional autoencoder with vector quantization. The resulting tokens are serialized and modeled using a causal transformer with 3D rotary positional embedding, enabling sequential generation of spatial structure and appearance. Unlike diffusion-based methods that refine scenes holistically, our formulation constructs scenes step-by-step, naturally supporting completion, outpainting, controllable sampling via temperature, and flexible generation horizons. This formulation leverages the compositional inductive biases and scalability of autoregressive modeling while operating on explicit representations compatible with modern neural rendering pipelines, positioning autoregressive transformers as a complementary paradigm for controllable and context-aware 3D generation.

* Project page: https://nicolasvonluetzow.github.io/GaussianGPT/ - Project video: https://youtu.be/zVnMHkFzHDg

Via

Access Paper or Ask Questions

3D Scene Diffusion Guidance using Scene Graphs

Aug 08, 2023

Mohammad Naanaa, Katharina Schmid, Yinyu Nie

Figure 1 for 3D Scene Diffusion Guidance using Scene Graphs

Figure 2 for 3D Scene Diffusion Guidance using Scene Graphs

Figure 3 for 3D Scene Diffusion Guidance using Scene Graphs

Figure 4 for 3D Scene Diffusion Guidance using Scene Graphs

Abstract:Guided synthesis of high-quality 3D scenes is a challenging task. Diffusion models have shown promise in generating diverse data, including 3D scenes. However, current methods rely directly on text embeddings for controlling the generation, limiting the incorporation of complex spatial relationships between objects. We propose a novel approach for 3D scene diffusion guidance using scene graphs. To leverage the relative spatial information the scene graphs provide, we make use of relational graph convolutional blocks within our denoising network. We show that our approach significantly improves the alignment between scene description and generated scene.

* 5 figures

Via

Access Paper or Ask Questions

hmBERT: Historical Multilingual Language Models for Named Entity Recognition

May 31, 2022

Stefan Schweter, Luisa März, Katharina Schmid, Erion Çano

Figure 1 for hmBERT: Historical Multilingual Language Models for Named Entity Recognition

Figure 2 for hmBERT: Historical Multilingual Language Models for Named Entity Recognition

Figure 3 for hmBERT: Historical Multilingual Language Models for Named Entity Recognition

Figure 4 for hmBERT: Historical Multilingual Language Models for Named Entity Recognition

Abstract:Compared to standard Named Entity Recognition (NER), identifying persons, locations, and organizations in historical texts forms a big challenge. To obtain machine-readable corpora, the historical text is usually scanned and optical character recognition (OCR) needs to be performed. As a result, the historical corpora contain errors. Also, entities like location or organization can change over time, which poses another challenge. Overall historical texts come with several peculiarities that differ greatly from modern texts and large labeled corpora for training a neural tagger are hardly available for this domain. In this work, we tackle NER for historical German, English, French, Swedish, and Finnish by training large historical language models. We circumvent the need for labeled data by using unlabeled data for pretraining a language model. hmBERT, a historical multilingual BERT-based language model is proposed, with different sizes of it being publicly released. Furthermore, we evaluate the capability of hmBERT by solving downstream NER as part of this year's HIPE-2022 shared task and provide detailed analysis and insights. For the Multilingual Classical Commentary coarse-grained NER challenge, our tagger HISTeria outperforms the other teams' models for two out of three languages.

* Submitted HIPE-2022 Working Note Paper for CLEF 2022 (Conference and Labs of the Evaluation Forum (CLEF 2022))

Via

Access Paper or Ask Questions