Fast and robust hand segmentation and tracking is an essential basis for gesture recognition and thus an important component for contact-less human-computer interaction (HCI). Hand gesture recognition based on 2D video data has been intensively investigated. However, in practical scenarios purely intensity based approaches suffer from uncontrollable environmental conditions like cluttered background colors. In this paper we present a real-time hand segmentation and tracking algorithm using Time-of-Flight (ToF) range cameras and intensity data. The intensity and range information is fused into one pixel value, representing its combined intensity-depth homogeneity. The scene is hierarchically clustered using a GPU based parallel merging algorithm, allowing a robust identification of both hands even for inhomogeneous backgrounds. After the detection, both hands are tracked on the CPU. Our tracking algorithm can cope with the situation that one hand is temporarily covered by the other hand.