Recognizing human activities from multi-channel time series data collected from wearable sensors is ever more practical. However, in real-world conditions, coherent activities and body movements could happen at the same time, like moving head during walking or sitting. A new problem, so-called "Coherent Human Activity Recognition (Co-HAR)", is more complicated than normal multi-class classification tasks since signals of different movements are mixed and interfered with each other. On the other side, we consider such Co-HAR as a dense labelling problem that classify each sample on a time step with a label to provide high-fidelity and duration-varied support to applications. In this paper, a novel condition-aware deep architecture "Conditional-UNet" is developed to allow dense labeling for Co-HAR problem. We also contribute a first-of-its-kind Co-HAR dataset for head movement recognition under walk or sit condition for future research. Experiments on head gesture recognition show that our model achieve overall 2%-3% performance gain of F1 score over existing state-of-the-art deep methods, and more importantly, systematic and comprehensive improvements on real head gesture classes.