Deep Learning

ImageFolder: Autoregressive Image Generation with Folded Tokens (ICLR 2025)

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers (Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2025)

Numerical Pruning for Efficient Autoregressive Models (Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2025)

ARTIST: Improving the Generation of Text-rich Images by Disentanglement (arXiv 2024)

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers (arXiv 2024)

Customization assistant for text-to-image generation (Proceedings of the IEEE onference on Computer Vision and Pattern Recognition 2024)

DocSynthv2: A Practical Autoregressive Modeling for Document Generation (arXiv 2024)

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models (arXiv 2024)

Lrm: Large reconstruction model for single image to 3d (The Twelfth International Conference on Learning Representations (ICLR) 2024)

TRINS: Towards Multimodal Language Models that Can Read (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2024)