Attention

Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns

This work presents one of the first comprehensive studies on different sparse attention patterns in Transformer models. We first discuss the essentiality of pre-training for sparse attention pattern models and point out that the efficient fine-tuning …