Efficient Computation

Numerical Pruning for Efficient Autoregressive Models (Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2025)

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers (arXiv 2024)