Abstract:Foundation models have had a big impact in recent years and billions of dollars are being invested in them in the current AI boom. The more popular ones, such as Chat-GPT, are trained on large amounts of data from the Internet, and then reinforcement learning, RAG, prompt engineering and cognitive modelling are used to fine-tune and augment their behavior. This technology has been used to create models of individual people, such as Caryn Marjorie. However, these chatbots are not based on people's actual emotional and physiological responses to their environment, so they are, at best, surface-level approximations to the characters they are imitating. This paper describes how a new type of foundation model - a first-person foundation model - could be created from recordings of what a person sees and hears as well as their emotional and physiological reactions to these stimuli. A first-person foundation model would map environmental stimuli to a person's emotional and physiological states, and map a person's emotional and physiological states to their behavior. First-person foundation models have many exciting applications, including a new type of recommendation engine, personal assistants, generative adversarial networks, dating and recruitment. To obtain training data for a first-person foundation model, we have developed a recording rig that captures what the wearer is seeing and hearing as well as their emotional and physiological states. This novel source of data could help to address the shortage of new data for building the next generation of foundation models.
Abstract:Foundation models have had a big impact in recent years and billions of dollars are being invested in them in the current AI boom. The more popular ones, such as Chat-GPT, are trained on large amounts of Internet data. However, it is becoming apparent that this data is likely to be exhausted soon, and technology companies are looking for new sources of data to train the next generation of foundation models. Reinforcement learning, RAG, prompt engineering and cognitive modelling are often used to fine-tune and augment the behaviour of foundation models. These techniques have been used to replicate people, such as Caryn Marjorie. These chatbots are not based on people's actual emotional and physiological responses to their environment, so they are, at best, a surface-level approximation to the characters they are imitating. To address these issues, we have developed a recording rig that captures what the wearer is seeing and hearing as well as their skin conductance (GSR), facial expression and brain state (14 channel EEG). AI algorithms are used to process this data into a rich picture of the environment and internal states of the subject. Foundation models trained on this data could replicate human behaviour much more accurately than the personality models that have been developed so far. This type of model has many potential applications, including recommendation, personal assistance, GAN systems, dating and recruitment. This paper gives some background to this work and describes the recording rig and preliminary tests of its functionality. It then suggests how a new type of foundation model could be created from the data captured by the rig and outlines some applications. Data gathering and model training are expensive, so we are currently working on the launch of a start-up that could raise funds for the next stage of the project.