Deep Learning

ImageFolder: Autoregressive Image Generation with Folded Tokens (ICLR 2025)

ARTIST: Improving the Generation of Text-rich Images by Disentanglement (arXiv 2024)

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers (arXiv 2024)

Customization assistant for text-to-image generation (Proceedings of the IEEE onference on Computer Vision and Pattern Recognition 2024)

DocSynthv2: A Practical Autoregressive Modeling for Document Generation (arXiv 2024)

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models (arXiv 2024)

Lrm: Large reconstruction model for single image to 3d (The Twelfth International Conference on Learning Representations (ICLR) 2024)

TRINS: Towards Multimodal Language Models that Can Read (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2024)

Tensor attention training: Provably efficient learning of higher-order transformers (arXiv 2024)

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation (arXiv 2024)