Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency. However, they suffer from the notorious over-smoothing problem, in which the learned representations of densely connected nodes converge to alike vectors when many (>3) graph convolutional layers are stacked. In this paper, we argue that there-normalization trick used in GCN leads to overly homogeneous information propagation, which is the source of over-smoothing. To address this problem, we propose Graph Highway Networks(GHNet) which utilize gating units to automatically balance the trade-off between homogeneity and heterogeneity in the GCN learning process. The gating units serve as direct highways to maintain heterogeneous information from the node itself after feature propagation. This design enables GHNet to achieve much larger receptive fields per node without over-smoothing and thus access to more of the graph connectivity information. Experimental results on benchmark datasets demonstrate the superior performance of GHNet over GCN and related models.