Llavar: Enhanced visual instruction tuning for text-rich image understanding (arXiv 2023)

Publication
arXiv