Human activity recognition (HAR) is fundamental in human-robot collaboration (HRC), enabling robots to respond to and dynamically adapt to human intentions. This paper introduces a HAR system combining a modular data glove equipped with Inertial Measurement Units and a vision-based tactile sensor to capture hand activities in contact with a robot. We tested our activity recognition approach under different conditions, including offline classification of segmented sequences, real-time classification under static conditions, and a realistic HRC scenario. The experimental results show a high accuracy for all the tasks, suggesting that multiple collaborative settings could benefit from this multi-modal approach.