Resilient Load Restoration in Microgrids Considering Mobile Energy Storage Fleets: A Deep Reinforcement Learning Approach
Mobile energy storage systems (MESSs) provide mobility and flexibility to enhance distribution system resilience. The paper proposes a …
Bridging images and natural language with deep learning
We, as humans, can easily use our vision and language capabilities to accomplish a wide variety of tasks that combine the image and the …
Watch It Twice: Video Captioning with a Refocused Video Encoder
With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance …
Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction
The explosion of video data on the internet requires effective and efficient technology to generate captions automatically for people …
Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty
Unpaired Image Captioning via Scene Graph Alignments
Most of the existing deep learning based image captioning methods are fully-supervised models, which require large-scale paired …
Jiuxiang Gu*, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang.
Scene Graph Generation with External Knowledge and Image Reconstruction
Scene graph generation has received growing attention with advancement image understanding tasks such as object detection, attributes …
Jiuxiang Gu*, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling.
Unpaired Image Captioning by Language Pivoting
Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping …
Jiuxiang Gu*, Shafiq Joty, Jianfei Cai, Gang Wang.
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities. …
Jiuxiang Gu*, Jianfei Cai, Shafiq Joty, Li Niu, Gang Wang.
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained …
An Empirical Study of Language CNN for Image Captioning
Language Models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a …
Jiuxiang Gu*, Gang Wang, Jianfei Cai, Tsuhan Chen.
Recent Advances in Convolutional Neural Networks
In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN …