Scene Graph Generation with External Knowledge and Image Reconstruction


Scene graph generation has received growing attention with advancement image understanding tasks such as object detection, attributes and relationship prediction, etc. However, existing datasets are biased in terms of object and relationship labels, or often come with noisy and missing annotations, which makes the development of a reliable scene graph prediction model very challenging. In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues. In particular, we extract commonsense knowledge from external knowledge base to refine object and phrase features for improving generalizability in scene graph generation. To address the bias of noisy object annotation, we introduce an auxiliary image reconstruction path to regularize the scene graph generation network. Extensive experiments show that our framework can generate better sense graph, achieving the state-of-the-art performance on two benchmark datasets: Visual Relationship Detection and Visual Genome datasets.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).