Jiuxiang Gu
Home
News
Publications
CV
Publications
Type
Conference paper
Journal article
Book
Date
2022
2021
2020
2019
2018
2017
Open World Entity Segmentation
We introduce a new image segmentation task, called Entity Segmentation (ES), which aims to segment all visual entities (objects and …
Lu Qi
,
Jason Kuen
,
Yi Wang
,
Jiuxiang Gu
,
Hengshuang Zhao
,
Zhe Lin
,
Philip Torr
,
Jiaya Jia
Code
TPAMI
DocEdit: Language-guided Document Editing
Professional document editing tools require a certain level of expertise to perform complex edit operations. To make editing tools …
Puneet Mathur
,
Rajiv Jain
,
Jiuxiang Gu
,
Franck Dernoncourt
,
Dinesh Manocha
,
Vlad Morariu
AAAI 2023
User-Entity Differential Privacy in Learning Natural Language Models
In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection …
Phung Lai
,
NhatHai Phan
,
Tong Sun
,
Rajiv Jain
,
Franck Dernoncourt
,
Jiuxiang Gu
,
Nikolaos Barmpalios
BigData 2022
LayerDoc: Layer-wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents
Digital documents often contain images and scanned text. Parsing such visually-rich documents is a core task for automating document …
Puneet Mathur
,
Rajiv Jain
,
Ashutosh Mehra
,
Jiuxiang Gu
,
Franck Dernoncourt
,
Anandhavelu N
,
Quan Tran
,
Verena Kaynig-Fittkau
,
Ani Nenkova
,
Dinesh Manocha
,
Vlad Morariu
PDF
WACV 2023
MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding
Document images are a ubiquitous source of data where the text is organized in a complex hierarchical structure ranging from fine …
Zilong Wang
,
Jiuxiang Gu
,
Chris Tensmeyer
,
Nikolaos Barmpalios
,
Ani Nenkova
,
Tong Sun
,
Jingbo Shang
,
Vlad I. Morariu
PDF
EMNLP 2022
Delving into OOD Detection with Vision-Language Representations
Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world. The vast majority of …
Yifei Ming
,
Ziyang Cai
,
Jiuxiang Gu
,
Yiyou Sun
,
Wei Li
,
Yixuan Li
NeurIPS 2022
DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis
We propose a new task of synthesizing speech directly from semi-structured documents where the extracted text tokens from OCR systems …
Puneet Mathur
,
Franck Dernoncourt
,
Quan Hung Tran
,
Jiuxiang Gu
,
Ani Nenkova
,
Vlad Morariu
,
Rajiv Jain
,
Dinesh Manocha
Interspeech 2022
DocTime: A Document-level Temporal Dependency Graph Parser
We introduce DocTime - a novel temporal dependency graph (TDG) parser that takes as input a text document and produces a temporal …
Puneet Mathur
,
Vlad I Morariu
,
Verena Kaynig-Fittkau
,
Jiuxiang Gu
,
Franck Dernoncourt
,
Quan Hung Tran
,
Ani Nenkova
,
Dinesh Manocha
,
Rajiv Jain,
PDF
NAACL 2022
Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns
This work presents one of the first comprehensive studies on different sparse attention patterns in Transformer models. We first …
Zihan Wang
,
Jiuxiang Gu
,
Jason Kuen
,
Handong Zhao
,
Vlad I Morariu
,
Ruiyi Zhang
,
Ani Nenkova
,
Tong Sun
,
Jingbo Shang
ACL 2022
FedKC: Federated Knowledge Composition for Multilingual Natural Language Understanding
Multilingual natural language understanding, which aims to comprehend multilingual documents, is an important task. Existing efforts …
Haoyu Wang
,
Handong Zhao
,
Yaqing Wang
,
Tong Yu
,
Jiuxiang Gu
,
Jing Gao
WWW 2022
Unsupervised Cross-lingual Image Captioning
Most recent image captioning works are conducted in English as the majority of image-caption datasets are in English. However, there …
Jiahui Gao
,
Yi Zhou
,
Philip L. H. Yu
,
Shafiq Joty
,
Jiuxiang Gu
PDF
AAAI 2022
Interactive Image Generation with Natural-Language Feedback
Using natural-language feedback to guide image generation and manipulation can greatly lower the required efforts and skills. This …
Yufan Zhou
,
Ruiyi Zhang
,
Jiuxiang Gu
,
Chris Tensmeyer
,
Tong Yu
,
Changyou Chen
,
Jinhui Xu
,
Tong Sun
AAAI 2022
Unified Pretraining Framework for Document Understanding
Document intelligence automates the extraction of information from documents and supports many business applications. Recent …
Jiuxiang Gu
,
Jason Kuen
,
Vlad I Morariu
,
Handong Zhao
,
Rajiv Jain
,
Nikolaos Barmpalios
,
Ani Nenkova
,
Tong Sun
NeurIPS 2021
Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models
Recent studies indicate that NLU models are prone to rely on shortcut features for prediction. As a result, these models could …
Mengnan Du
,
Varun Manjunatha
,
Rajiv Jain
,
Ruchi Deshpande
,
Franck Dernoncourt
,
Jiuxiang Gu
,
Tong Sun
,
Xia Hu
NAACL 2021
SelfDoc: Self-Supervised Document Representation Learning
We propose SelfDoc, a task-agnostic pre-training framework for document image analysis. Because documents are multimodal displays and …
Peizhao Li
,
Jiuxiang Gu
,
Jason Kuen
,
Vlad Morariu
,
Handong Zhao
,
Rajiv Jain
,
Varun Manjunatha
,
Hongfu Liu
CVPR 2021
Multi-Scale Aligned Distillation for Low-Resolution Detection
In instance-level detection tasks (e.g., object detection), reducing input resolution is an easy option to improve runtime efficiency. …
Lu Qi
,
Jason Kuen
,
Jiuxiang Gu
,
Zhe Lin
,
Yi Wang
,
Yukang Chen
,
Yanwei Li
,
Jiaya Jia
CVPR 2021
Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection
Recent study on detecting facial action units (AU) has utilized auxiliary information (i.e., facial landmarks, relationship among AUs …
Huiyuan Yang
,
Lijun Yin
,
YI Zhou
,
Jiuxiang Gu
CVPR 2021
Self-Supervised Relationship Probing
Structured representations of images according to visual relationships are beneficial for many vision and vision-language applications. …
Jiuxiang Gu
,
Jason Kuen
,
Shafiq Joty
,
Jianfei Cai
,
Vlad Morariu
,
Handong Zhao
,
Tong Sun
NeurIPS 2020
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction
The explosion of video data on the internet requires effective and efficient technology to generate captions automatically for people …
Xiangxi Shi
,
Jianfei Cai
,
Jiuxiang Gu
,
Shafiq Joty
Neurocomputing
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning
The prevalent approach to the image captioning is an encoder-decoder framework, where the combination of convolutional neural networks …
Xiangxi Shi
,
Xu Yang
,
Jiuxiang Gu
,
Shafiq Joty
,
Jianfei Cai
ECCV 2020
Resilient Load Restoration in Microgrids Considering Mobile Energy Storage Fleets: A Deep Reinforcement Learning Approach
Mobile energy storage systems (MESSs) provide mobility and flexibility to enhance distribution system resilience. The paper proposes a …
Shuhan Yao
,
Jiuxiang Gu
,
Peng Wang
,
Tianyang Zhao
,
Huajun Zhang
,
Xiaochuan Liu
Preprint
IEEE PES GM 2020
Bridging images and natural language with deep learning
We, as humans, can easily use our vision and language capabilities to accomplish a wide variety of tasks that combine the image and the …
Jiuxiang Gu
PDF
Watch It Twice: Video Captioning with a Refocused Video Encoder
With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance …
Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu
ACMMM 2019
Unpaired Image Captioning via Scene Graph Alignments
Most of the existing deep learning based image captioning methods are fully-supervised models, which require large-scale paired …
Jiuxiang Gu
,
Shafiq Joty
,
Jianfei Cai
,
Handong Zhao
,
Xu Yang
,
Gang Wang
PDF
Code
ICCV 2019
Scene Graph Generation with External Knowledge and Image Reconstruction
Scene graph generation has received growing attention with advancement image understanding tasks such as object detection, attributes …
Jiuxiang Gu
,
Handong Zhao
,
Zhe Lin
,
Sheng Li
,
Jianfei Cai
,
Mingyang Ling
CVPR 2019
Unpaired Image Captioning by Language Pivoting
Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping …
Jiuxiang Gu
,
Shafiq Joty
,
Jianfei Cai
,
Gang Wang
PDF
Code
ECCV 2018
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities. …
Jiuxiang Gu
,
Jianfei Cai
,
Shafiq Joty
,
Li Niu
,
Gang Wang
PDF
CVPR 2018
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained …
Jiuxiang Gu
,
Jianfei Cai
,
Gang Wang
,
Tsuhan Chen
PDF
Code
AAAI 2018
An Empirical Study of Language CNN for Image Captioning
Language Models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a …
Jiuxiang Gu
,
Gang Wang
,
Jianfei Cai
,
Tsuhan Chen
PDF
Code
ICCV 2017
Recent Advances in Convolutional Neural Networks
In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN …
Jiuxiang Gu
,
Zhenhua Wang
,
Jason Kuen
,
Lianyang Ma
,
Amir Shahroudy
,
Bing Shuai
,
Ting Liu
,
Xingxing Wang
,
Li Wang
,
Gang Wang
,
Jianfei Cai
,
Tsuhan Chen
PDF
Pattern Recognition
Cite
×