Self-Supervised Learning

Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns

This work presents one of the first comprehensive studies on different sparse attention patterns in Transformer models. We first discuss the essentiality of pre-training for sparse attention pattern models and point out that the efficient fine-tuning …

Interactive Image Generation with Natural-Language Feedback

Using natural-language feedback to guide image generation and manipulation can greatly lower the required efforts and skills. This topic has received increased attention in recent years through refinement of Generative Adversarial Networks (GANs); …

Unified Pretraining Framework for Document Understanding

Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions towards …

SelfDoc: Self-Supervised Document Representation Learning

We propose SelfDoc, a task-agnostic pre-training framework for document image analysis. Because documents are multimodal displays and are intended for sequential reading, our framework involves positional, textual, and visual information for every …

Self-Supervised Relationship Probing

Structured representations of images according to visual relationships are beneficial for many vision and vision-language applications. However, current human-annotated visual relationship datasets suffer from the long-tailed predicate distribution …