Jiuxiang Gu
Home
News
Publications
CV
Computer Vision
ImageFolder: Autoregressive Image Generation with Folded Tokens (ICLR 2025)
ADOPD: A Large-Scale Document Page Decomposition Dataset (The Twelfth International Conference on Learning Representations 2024)
ARTIST: Improving the Generation of Text-rich Images by Disentanglement (arXiv 2024)
Customization assistant for text-to-image generation (Proceedings of the IEEE onference on Computer Vision and Pattern Recognition 2024)
Lrm: Large reconstruction model for single image to 3d (The Twelfth International Conference on Learning Representations (ICLR) 2024)
Aims: All-inclusive multi-level segmentation for anything (Advances in Neural Information Processing Systems (NeurIPS) 2023)
LayerDoc: layer-wise extraction of spatial hierarchical structure in visually-rich documents (Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV) 2023)
Llavar: Enhanced visual instruction tuning for text-rich image understanding (arXiv 2023)
Delving into out-of-distribution detection with vision-language representations (Advances in neural information processing systems (NeurIPS) 2022)
Learning the Visualness of Text Using Large Vision-Language Models (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022)
»
Cite
×