Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers (arXiv 2024)

Publication
arXiv