Bridging images and natural language with deep learning

Bridging images and natural language with deep learning

Abstract

We, as humans, can easily use our vision and language capabilities to accomplish a wide variety of tasks that combine the image and the text modalities. However, it is not easy for machines because it requires the model to understand the image and language, especially how they relate to each other. In recent years considerable progress has been made in applying deep learning to computer vision and natural language processing, but it is still challenging to connect images with natural language due to the different structures and characteristics between them. In this thesis, I seek to bridge images and natural language with deep learning.

Type
Publication
Ph.D. Thesis