Jiuxiang Gu
Home
News
Publications
CV
Deep Learning
ImageFolder: Autoregressive Image Generation with Folded Tokens (ICLR 2025)
ARTIST: Improving the Generation of Text-rich Images by Disentanglement (arXiv 2024)
Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers (arXiv 2024)
Customization assistant for text-to-image generation (Proceedings of the IEEE onference on Computer Vision and Pattern Recognition 2024)
DocSynthv2: A Practical Autoregressive Modeling for Document Generation (arXiv 2024)
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models (arXiv 2024)
Lrm: Large reconstruction model for single image to 3d (The Twelfth International Conference on Learning Representations (ICLR) 2024)
TRINS: Towards Multimodal Language Models that Can Read (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2024)
Tensor attention training: Provably efficient learning of higher-order transformers (arXiv 2024)
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation (arXiv 2024)
»
Cite
×