Computer Vision

ImageFolder: Autoregressive Image Generation with Folded Tokens (ICLR 2025)

ADOPD: A Large-Scale Document Page Decomposition Dataset (The Twelfth International Conference on Learning Representations 2024)

ARTIST: Improving the Generation of Text-rich Images by Disentanglement (arXiv 2024)

Customization assistant for text-to-image generation (Proceedings of the IEEE onference on Computer Vision and Pattern Recognition 2024)

Lrm: Large reconstruction model for single image to 3d (The Twelfth International Conference on Learning Representations (ICLR) 2024)

Aims: All-inclusive multi-level segmentation for anything (Advances in Neural Information Processing Systems (NeurIPS) 2023)

LayerDoc: layer-wise extraction of spatial hierarchical structure in visually-rich documents (Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV) 2023)

Llavar: Enhanced visual instruction tuning for text-rich image understanding (arXiv 2023)

Delving into out-of-distribution detection with vision-language representations (Advances in neural information processing systems (NeurIPS) 2022)

Learning the Visualness of Text Using Large Vision-Language Models (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022)