This paper extends the reinforcement learning ideas into the multi-agents system, which is far more complicated than the previously studied single-agent system. We studied two different multi-agents systems. One is the fully-connected neural network consists of multiple single neurons. Another one is the simplified mechanical arm system which is controlled by multiple neurons. We suppose that each neuron is like an agent and it can do Gibbs sampling of the posterior probability of stimulus features. The policy is optimized in a way that the cumulative global rewards are maximized. The algorithm for the second system is based on the same idea but we incorporate the physics model into the constraints. The simulation results show that for the first system our algorithm converges well. For the second system it does not converge well in a reasonable simulation time length. In summary, we took the initial endeavor to study the reinforcement learning for multi-agents system. The computational complexity is always an issue and significant amount of works have to be done in order to better understand the problem.