Object detection and data association are critical components in multi-object tracking (MOT) systems. Despite the fact that these two components are highly dependent on each other, one popular trend in MOT is to perform detection and data association as separate modules, processed in a cascaded order. Due to this cascaded process, the resulting MOT system can only perform forward inference and cannot back-propagate error through the entire pipeline and correct them. This leads to sub-optimal performance over the total pipeline. To address this issue, recent work jointly optimizes detection and data association and forms an integrated MOT approach, which has been shown to improve performance in both detection and tracking. In this work, we propose a new approach for joint MOT based on Graph Neural Networks (GNNs). The key idea of our approach is that GNNs can explicitly model complex interactions between multiple objects in both the spatial and temporal domains, which is essential for learning discriminative features for detection and data association. We also leverage the fact that motion features are useful for MOT when used together with appearance features. So our proposed joint MOT approach also incorporates appearance and motion features within our graph-based feature learning framework, leading to better feature learning for MOT. Through extensive experiments on the MOT challenge dataset, we show that our proposed method achieves state-of-the-art performance on both object detection and MOT.