



Abstract:Current graph representation learning techniques use Graph Neural Networks (GNNs) to extract features from dataset embeddings. In this work, we examine the quality of these embeddings and assess how changing them can affect the accuracy of GNNs. We explore different embedding extraction techniques for both images and texts; and find that the choice of embedding biases the performance of different GNN architectures and thus the choice of embedding influences the selection of GNNs regardless of the underlying dataset. In addition, we only see an improvement in accuracy from some GNN models compared to the accuracy of models trained from scratch or fine-tuned on the underlying data without utilising the graph connections. As an alternative, we propose Graph-connected Network (GraNet) layers to better leverage existing unconnected models within a GNN. Existing language and vision models are thus improved by allowing neighbourhood aggregation. This gives a chance for the model to use pre-trained weights, if possible, and we demonstrate that this approach improves the accuracy compared to traditional GNNs: on Flickr v2, GraNet beats GAT2 and GraphSAGE by 7.7% and 1.7% respectively.