TRINS: Towards Multimodal Language Models that Can Read (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2024)

Publication
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition