Jiuxiang Gu
Home
News
Publications
CV
Multimodal Learning
Commit: Coordinated instruction tuning for multimodal large language models (arXiv 2024)
TRINS: Towards Multimodal Language Models that Can Read (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2024)
Learning the Visualness of Text Using Large Vision-Language Models (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022)
Towards language-free training for text-to-image generation (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2022)
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018)
Unpaired image captioning by language pivoting (Proceedings of the European Conference on Computer Vision 2018)
Cite
×