CVPR

Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection

Recent study on detecting facial action units (AU) has utilized auxiliary information (i.e., facial landmarks, relationship among AUs and expressions, web facial images, etc.), in order to improve the AU detection performance. As of now, no semantic …

Multi-Scale Aligned Distillation for Low-Resolution Detection

In instance-level detection tasks (e.g., object detection), reducing input resolution is an easy option to improve runtime efficiency. However, this option severely hurts the detection performance. This paper focuses on boosting the performance of a …

SelfDoc: Self-Supervised Document Representation Learning

We propose SelfDoc, a task-agnostic pre-training framework for document image analysis. Because documents are multimodal displays and are intended for sequential reading, our framework involves positional, textual, and visual information for every …

Scene Graph Generation with External Knowledge and Image Reconstruction

Scene graph generation has received growing attention with advancement image understanding tasks such as object detection, attributes and relationship prediction, etc. However, existing datasets are biased in terms of object and relationship labels, …

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities. Learning appropriate representations for multi-modal data is crucial for the cross-modal retrieval performance. …