Abstract:In this study, we propose a technique to improve the accuracy and reduce the size of convolutional neural networks (CNNs) running on edge devices for real-world robot vision applications. CNNs running on edge devices must have a small architecture, and CNNs for robot vision applications involving on-site object recognition must be able to be trained efficiently to identify specific visual targets from data obtained under a limited variation of conditions. The visual nervous system (VNS) is a good example that meets the above requirements because it learns from few visual experiences. Therefore, we used a Gabor filter, a model of the feature extractor of the VNS, as a preprocessor for CNNs to investigate the accuracy of the CNNs trained with small amounts of data. To evaluate how well CNNs trained on image data acquired under a limited variation of conditions generalize to data acquired under other conditions, we created an image dataset consisting of images acquired from different camera positions, and investigated the accuracy of the CNNs that trained using images acquired at a certain distance. The results were compared after training on multiple CNN architectures with and without Gabor filters as preprocessing. The results showed that preprocessing with Gabor filters improves the generalization performance of CNNs and contributes to reducing the size of CNNs.
Abstract:Color is an important source of information for visual functions such as object recognition, but it is greatly affected by the color of illumination. The ability to perceive the color of a visual target independent of illumination color is called color constancy (CC), and is an important feature for vision systems that use color information. In this study, we investigated the effects of the light intensity encoding function on the performance of CC of the center/surround (C/S) retinex model, which is a well-known model inspired by CC of the visual nervous system. The functions used to encode light intensity are the logarithmic function used in the original C/S retinex model and the Naka-Rushton (N-R) function, which is a model of retinal photoreceptor response. Color-variable LEDs were used to illuminate visual targets with various lighting colors, and color information computed by each model was used to evaluate the degree to which the color of visual targets illuminated with different lighting colors could be discriminated. Color information was represented using the HSV color space and a color plane based on the classical opponent color theory. The results showed that the combination of the N-R function and the double opponent color plane representation provided superior discrimination performance.
Abstract:Spiking neural networks (SNNs) employing unsupervised learning methods inspired by neural plasticity are expected to be a new framework for artificial intelligence. In this study, we investigated the effect of multiple types of neural plasticity, such as spike-time-dependent plasticity (STDP) and synaptic scaling, on the learning in a winner-take-all (WTA) network composed of spiking neurons. We implemented a WTA network with multiple types of neural plasticity using Python. The MNIST and the Fashion-MNIST datasets were used for training and testing. We varied the number of neurons, the time constant of STDP, and the normalization method used in synaptic scaling to compare classification accuracy. The results demonstrated that synaptic scaling based on the L2 norm was the most effective in improving classification performance. By implementing L2-norm-based synaptic scaling and setting the number of neurons in both excitatory and inhibitory layers to 400, the network achieved classification accuracies of 88.84 % on the MNIST dataset and 68.01 % on the Fashion-MNIST dataset after one epoch of training.