The neural implementation of operant conditioning with few trials is unclear. We propose a Hippocampus-Inspired Cognitive Architecture (HICA) as a neural mechanism for operant conditioning. HICA explains a learning mechanism in which agents can learn a new behavior policy in a few trials, as mammals do in operant conditioning experiments. HICA is composed of two different types of modules. One is a universal learning module type that represents a cortical column in the neocortex gray matter. The working principle is modeled as Modulated Heterarchical Prediction Memory (mHPM). In mHPM, each module learns to predict a succeeding input vector given the sequence of the input vectors from lower layers and the context vectors from higher layers. The prediction is fed into the lower layers as a context signal (top-down feedback signaling), and into the higher layers as an input signal (bottom-up feedforward signaling). Rewards modulate the learning rate in those modules to memorize meaningful sequences effectively. In mHPM, each module updates in a local and distributed way compared to conventional end-to-end learning with backpropagation of the single objective loss. This local structure enables the heterarchical network of modules. The second type is an innate, special-purpose module representing various organs of the brain's subcortical system. Modules modeling organs such as the amygdala, hippocampus, and reward center are pre-programmed to enable instinctive behaviors. The hippocampus plays the role of the simulator. It is an autoregressive prediction model of the top-most level signal with a loop structure of memory, while cortical columns are lower layers that provide detailed information to the simulation. The simulation becomes the basis for learning with few trials and the deliberate planning required for operant conditioning.
In this paper, we present our research on programming human-level artificial intelligence (HLAI), including 1) a definition of HLAI, 2) an environment to develop and test HLAI, and 3) a cognitive architecture for HLAI. The term AI is used in a broad meaning, and HLAI is not clearly defined. I claim that the essence of Human-Level Intelligence to be the capability to learn from others' experiences via language. The key is that the event described by language has the same effect as if the agent experiences it firsthand for the update of the behavior policy. To develop and test models with such a capability, we are developing a simulated environment called SEDRo. There is a 3D Home, and a mother character takes care of the baby (the learning agent) and teaches languages. The environment provides comparable experiences to that of a human baby from birth to one year. Finally, I propose a cognitive architecture of HLAI called Modulated Heterarchical Prediction Memory (mHPM). In mHPM, there are three components: a universal module that learns to predict the next vector given the sequence of vector signals, a heterarchical network of those modules, and a reward-based modulation of learning. mHPM models the workings of the neocortex but the innate auxiliary units such hippocampus, reward system, instincts, and amygdala play critical roles, too.
Task-specific AI agents are showing remarkable performance across different domains. But modeling generalized AI agents like human intelligence will require more than current datasets or only reward-based environments that don't include experiences that an infant gathers throughout its initial stages. In this paper, we present Simulated Environment for Developmental Robotics (SEDRo). It simulates the environments for a baby agent that a human baby experiences throughout the pre-born fetus stage to post-birth 12 months. SEDRo also includes a mother character to provide social interaction with the agent. To evaluate different developmental milestones of the agent, SEDRo incorporates some experiments from developmental psychology.
Despite recent advances in many application-specific domains, we do not know how to build a human-level artificial intelligence (HLAI). We conjecture that learning from others' experience with the language is the essential characteristic that differentiates human intelligence from the rest. Humans can update the action-value function only with the verbal description as if they experience states, actions, and corresponding rewards sequences first hand. In this paper, we present our ongoing effort to build an environment to facilitate the research for models of this capability. In this environment, there are no explicit definitions of tasks or rewards given when accomplishing those tasks. Rather the models experience the experience of the human infants from fetus to 12 months. The agent should learn to speak the first words as a human child does. We expect the environment will contribute to the research for HLAI.
Even with impressive advances in application-specific models, we still lack knowledge about how to build a model that can learn in a human-like way and do multiple tasks. To learn in a human-like way, we need to provide a diverse experience that is comparable to humans. In this paper, we introduce our ongoing effort to build a simulated environment for developmental robotics (SEDRo). SEDRo provides diverse human experiences ranging from those of a fetus to a 12th-month-old. A series of simulated tests based on developmental psychology will be used to evaluate the progress of a learning model. We anticipate SEDRo to lower the cost of entry and facilitate research in the developmental robotics community.
As the current trend of artificial intelligence is shifting towards self-supervised learning, conventional norms such as highly curated domain-specific data, application-specific learning models, extrinsic reward based learning policies etc. might not provide with the suitable ground for such developments. In this paper, we introduce SEDRo, a Simulated Environment for Developmental Robotics which allows a learning agent to have similar experiences that a human infant goes through from the fetus stage up to 12 months. A series of simulated tests based on developmental psychology will be used to evaluate the progress of a learning model.