We introduce Pixels2Play-0.1 (P2P0.1), a foundation model that learns to play a wide range of 3D video games with recognizable human-like behavior. Motivated by emerging consumer and developer use cases - AI teammates, controllable NPCs, personalized live-streamers, assistive testers - we argue that an agent must rely on the same pixel stream available to players and generalize to new titles with minimal game-specific engineering. P2P0.1 is trained end-to-end with behavior cloning: labeled demonstrations collected from instrumented human game-play are complemented by unlabeled public videos, to which we impute actions via an inverse-dynamics model. A decoder-only transformer with auto-regressive action output handles the large action space while remaining latency-friendly on a single consumer GPU. We report qualitative results showing competent play across simple Roblox and classic MS-DOS titles, ablations on unlabeled data, and outline the scaling and evaluation steps required to reach expert-level, text-conditioned control.
Reservoir computing is a form of machine learning particularly suited for time series analysis, including forecasting predictions. We take an implementation of \emph{quantum} reservoir computing that was initially designed to generate variants of musical scores and adapt it to create levels of Super Mario Bros. Motivated by our analysis of these levels, we develop a new Roblox \textit{obby} where the courses can be generated in real time on superconducting qubit hardware, and investigate some of the constraints placed by such real-time generation.
Foundation models trained on vast amounts of data have demonstrated remarkable reasoning and generation capabilities in the domains of text, images, audio and video. Our goal at Roblox is to build such a foundation model for 3D intelligence, a model that can support developers in producing all aspects of a Roblox experience, from generating 3D objects and scenes to rigging characters for animation to producing programmatic scripts describing object behaviors. We discuss three key design requirements for such a 3D foundation model and then present our first step towards building such a model. We expect that 3D geometric shapes will be a core data type and describe our solution for 3D shape tokenizer. We show how our tokenization scheme can be used in applications for text-to-shape generation, shape-to-text generation and text-to-scene generation. We demonstrate how these applications can collaborate with existing large language models (LLMs) to perform scene analysis and reasoning. We conclude with a discussion outlining our path to building a fully unified foundation model for 3D intelligence.
Lumen Social Robot is a humanoid robot development with the purpose that it could be a good friend to all people. In this year, the Lumen Social Robot is being developed into a guide in the exhibition and in the seminar of the Final Exam of undergraduate and graduate students in Electrical Engineering ITB, named Electrical Engineering Days 2015. In order to be the guide in that occasion, Lumen is supported by several things. They are Nao robot components, servers, and multiple processor systems. The image processing system is a processing application system that allows Lumen to recognize and determine an object from the image taken from the camera eye. The image processing system is provided with four modules. They are face detection module to detect a person's face, face recognition module to recognize a person's face, face tracking module to follow a person's face, and human detection module to detect humans based on the upper parts of person's body. Face detection module and human detection module are implemented by using the library harcascade.xml on EMGU CV. Face recognition module is implemented by adding the database for the face that has been detected and store it in that database. Face tracking module is implemented by using the Smooth Gaussian filter to the image. ----- Lumen Sosial Robot merupakan sebuah pengembangan robot humanoid agar dapat menjadi teman bagi banyak orang. Sistem pengolahan citra merupakan sistem aplikasi pengolah yang bertujuan Lumen dapat mengenali dan mengetahui suatu objek pada citra yang diambil dari camera mata Lumen. System pengolahan citra dilengkapi dengan empat buah modul, yaitu modul face detection untuk mendeteksi wajah seseorang, modul face recognition untuk mengenali wajah orang tersebut, modul face tracking untuk mengikuti wajah seseorang, dan modul human detection untuk mendeteksi manusia berdasarkan bagian tubuh atas orang