Deep Learning

Tensor attention training: Provably efficient learning of higher-order transformers (arXiv 2024)

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation (arXiv 2024)

Toward Infinite-Long Prefix in Transformer (arXiv 2024)

Aims: All-inclusive multi-level segmentation for anything (Advances in Neural Information Processing Systems (NeurIPS) 2023)

Llavar: Enhanced visual instruction tuning for text-rich image understanding (arXiv 2023)

Learning adaptive axis attentions in fine-tuning: Beyond fixed sparse attention patterns (Findings of the Association for Computational Linguistics (ACL) 2022)

Learning the Visualness of Text Using Large Vision-Language Models (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022)

MGDoc: Pre-training with multi-granular hierarchy for document image understanding (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022)

Towards language-free training for text-to-image generation (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2022)

Exploiting semantic embedding and visual feature for facial action unit detection (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021)