The traditional methods of action recognition are not specific for the operator, thus results are easy to be disturbed when other actions are operated in videos. The network based on mixed convolutional resnet and RPN is proposed in this paper. The rMC is tested in the data set of UCF-101 to compare with the method of R3D. The result shows that its correct rate reaches 71.07%. Meanwhile, the action recognition network is tested in our gesture and body posture data sets for specific target. The simulation achieves a good performance in which the running speed reaches 200 FPS. Finally, our model is improved by introducing the regression block and performs better, which shows the great potential of this model.