This paper is a brief report to our submission to the VIPriors Action Recognition Challenge. Action recognition has attracted many researchers attention for its full application, but it is still challenging. In this paper, we study previous methods and propose our method. In our method, we are primarily making improvements on the SlowFast Network and fusing with TSM to make further breakthroughs. Also, we use a fast but effective way to extract motion features from videos by using residual frames as input. Better motion features can be extracted using residual frames with SlowFast, and the residual-frame-input path is an excellent supplement for existing RGB-frame-input models. And better performance obtained by combining 3D convolution(SlowFast) with 2D convolution(TSM). The above experiments were all trained from scratch on UCF101.