Pretraining

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding

Document images are a ubiquitous source of data where the text is organized in a complex hierarchical structure ranging from fine granularity (e.g., words), medium granularity (e.g., regions such as paragraphs or figures), to coarse granularity …

Unified Pretraining Framework for Document Understanding

Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions towards …

Self-Supervised Relationship Probing

Structured representations of images according to visual relationships are beneficial for many vision and vision-language applications. However, current human-annotated visual relationship datasets suffer from the long-tailed predicate distribution …