Forgery operations on video contents are nowadays within the reach of anyone, thanks to the availability of powerful and user-friendly editing software. Integrity verification and authentication of videos represent a major interest in both journalism (e.g., fake news debunking) and legal environments dealing with digital evidence (e.g., a court of law). While several strategies and different forensics traces have been proposed in recent years, latest solutions aim at increasing the accuracy by combining multiple detectors and features. This paper presents a video forgery localization framework that verifies the self-consistency of coding traces between and within video frames, by fusing the information derived from a set of independent feature descriptors. The feature extraction step is carried out by means of an explainable convolutional neural network architecture, specifically designed to look for and classify coding artifacts. The overall framework was validated in two typical forgery scenarios: temporal and spatial splicing. Experimental results show an improvement to the state-of-the-art on temporal splicing localization and also promising performance in the newly tackled case of spatial splicing, on both synthetic and real-world videos.