Publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. Efficient vision-language retrieval using structural pruning
    Handong Zhao, Yue Bai, Zhe Lin, and 4 more authors
    2025
    US Patent App. 18/347,877
  2. Generating temporal dependency graphs
    Puneet Mathur, Vlad Morariu, Verena Kaynig-Fittkau, and 7 more authors
    2025
    US Patent App. 18/493,465
  3. Artist: Improving the generation of text-rich images by disentanglement
    Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, and 5 more authors
    In WACV, 2025
  4. Differential privacy mechanisms in neural tangent kernel regression
    Jiuxiang Gu, Yingyu Liang, Zhizhou Sha, and 2 more authors
    In WACV, 2025
  5. A multi-llm debiasing framework
    Deonna M Owens, Ryan A Rossi, Sungchul Kim, and 7 more authors
    In Submitted to ACL, 2025
  6. Imagefolder: Autoregressive image generation with folded tokens
    Xiang Li, Kai Qiu, Hao Chen, and 4 more authors
    In ICLR, 2025
  7. LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding
    Jian Chen, Ruiyi Zhang, Yufan Zhou, and 6 more authors
    In ICLR, 2025
  8. Personalization of large language models: A survey
    Zhehao Zhang, Ryan A Rossi, Branislav Kveton, and 8 more authors
    TMLR, 2025
  9. Numerical pruning for efficient autoregressive models
    Xuan Shen, Zhao Song, Yufa Zhou, and 12 more authors
    In AAAI, 2025
  10. LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
    Xuan Shen, Zhao Song, Yufa Zhou, and 12 more authors
    In AAAI, 2025
  11. MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
    Hanwen Jiang, Zexiang Xu, Desai Xie, and 8 more authors
    In CVPR, 2025
  12. Efficient Reasoning with Hidden Thinking
    Xuan Shen, Yizhou Wang, Xiangxi Shi, and 3 more authors
    arXiv preprint arXiv:2501.19201, 2025
  13. LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding
    Jian Chen, Ruiyi Zhang, Yufan Zhou, and 6 more authors
    In ICLR, 2025
  14. From Selection to Generation: A Survey of LLM-based Active Learning
    Yu Xia, Subhojyoti Mukherjee, Zhouhang Xie, and 8 more authors
    In ACL, 2025
  15. Efficient Reasoning with Hidden Thinking
    Xuan Shen, Yizhou Wang, Xiangxi Shi, and 3 more authors
    In Submitted to NeurIPS, 2025
  16. FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
    Xuan Shen, Weize Ma, Yufa Zhou, and 11 more authors
    In Submitted to NeurIPS, 2025
  17. ADOPT: A Multimodal Framework for Document Understanding and Generation
    Jiuxiang Gu, Jing Shi, Wanrong Zhu, and 6 more authors
    In Submitted to NeurIPS, 2025
  18. DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance
    Xuan Shen, Chenxia Han, Yufa Zhou, and 7 more authors
    In Submitted to NeurIPS, 2025
  19. MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models
    Haozhe Zhao, Zefan Cai, Shuzheng Si, and 4 more authors
    In Submitted to NeurIPS, 2025
  20. From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models
    Zefan Cai, Haoyi Qiu, Haozhe Zhao, and 6 more authors
    In Submitted to NeurIPS, 2025
  21. R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
    Zefan Cai, Wen Xiao, Hanshi Sun, and 11 more authors
    In Submitted to NeurIPS, 2025
  22. OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive
    Xuan Shen, Brian Wingenroth, Zichao Wang, and 12 more authors
    In Submitted to NeurIPS, 2025
  23. ADOPD-Instruct: A Large-Scale Multimodal Dataset for Document Editing
    Wanrong Zhu, Xiangxi Shi, Yufan Zhou, and 3 more authors
    In Submitted to NeurIPS, 2025
  24. ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models
    Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, and 5 more authors
    In WACV, 2025
  25. Metal: A multi-agent framework for chart generation with test-time scaling
    Bingxuan Li, Yiwei Wang, Jiuxiang Gu, and 2 more authors
    In ACL, 2025
  26. Utilizing a generative neural network to interactively create and modify digital images based on natural language feedback
    Ruiyi Zhang, Yufan Zhou, Christopher Tensmeyer, and 3 more authors
    2025
    US Patent App. 18/952,023
  27. Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
    Kai Qiu, Xiang Li, Jason Kuen, and 7 more authors
    In Submitted to ICCV, 2025
  28. Generating 3d models from a single image
    Hao Tan, Yicong Hong, Kai Zhang, and 8 more authors
    2025
    US Patent App. 18/460,747
  29. Position-based text-to-speech model
    Puneet Mathur, Franck Dernoncourt, Quan Hung Tran, and 5 more authors
    2025
    US Patent App. 18/528,116
  30. QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
    Xuan Shen, Weize Ma, Jing Liu, and 8 more authors
    In CVPR, 2025
  31. Generating an improved named entity recognition model using noisy data with a self-cleaning discriminator model
    Ruiyi Zhang, Zhendong Chu, Vlad Morariu, and 4 more authors
    2025
    US Patent App. 18/472,746
  32. Towards Visual Text Grounding of Multimodal Large Language Model
    Ming Li, Ruiyi Zhang, Jian Chen, and 6 more authors
    In , 2025

2024

  1. Self-supervised document representation learning
    Jiuxiang Gu, Vlad Morariu, Varun Manjunatha, and 5 more authors
    2024
    US Patent 11,886,815
  2. Multi-scale distillation for low-resolution detection
    Jason Kuen, Jiuxiang Gu, and Zhe Lin
    2024
    US Patent 12,136,185
  3. Utilizing a generative neural network to interactively create and modify digital images based on natural language feedback
    Ruiyi Zhang, Yufan Zhou, Christopher Tensmeyer, and 3 more authors
    2024
    US Patent 12,148,119
  4. Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances
    Zhendong Chu, Ruiyi Zhang, Tong Yu, and 4 more authors
    In , 2024
  5. Lrm: Large reconstruction model for single image to 3d
    Yicong Hong, Kai Zhang, Jiuxiang Gu, and 7 more authors
    In , 2024
    Oral
  6. Customization assistant for text-to-image generation
    Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, and 1 more author
    In CVPR, 2024
  7. Selective reflection-tuning: Student-selected data recycling for llm instruction-tuning
    Ming Li, Lichang Chen, Jiuhai Chen, and 3 more authors
    In ACL, 2024
  8. ADoPD: A large-scale document page decomposition dataset
    Jiuxiang Gu, Xiangxi Shi, Jason Kuen, and 5 more authors
    In ICLR, 2024
  9. SOHES: Self-supervised open-world hierarchical entity segmentation
    Shengcao Cao, Jiuxiang Gu, Jason Kuen, and 7 more authors
    In ICLR, 2024
  10. Image and semantic based table recognition
    Jiuxiang Gu, Vlad Morariu, Tong Sun, and 2 more authors
    2024
    US Patent App. 17/947,737
  11. Label induction
    Rajiv Bhawanji Jain, Michelle Yuan, Vlad Ion Morariu, and 8 more authors
    2024
    US Patent App. 18/048,900
  12. Training language models and preserving privacy
    Franck Dernoncourt, Tong Sun, Thi Kim Phung Lai, and 3 more authors
    2024
    US Patent App. 18/173,199
  13. Exploring the frontiers of softmax: Provable optimization, applications in diffusion model, and beyond
    Jiuxiang Gu, Chenyang Li, Yingyu Liang, and 2 more authors
    arXiv preprint arXiv:2405.03251, 2024
  14. DocScript: Document-Level Script Event Prediction
    Puneet Mathur, Vlad I Morariu, Aparna Garimella, and 6 more authors
    In COLING, 2024
  15. Extracting document hierarchy using a multimodal, layer-wise link prediction neural network
    Vlad Morariu, Puneet Mathur, Rajiv Jain, and 8 more authors
    2024
    US Patent App. 18/055,752
  16. Language-guided document editing
    Vlad Ion Morariu, Puneet Mathur, Rajiv Bhawanji Jain, and 2 more authors
    2024
    US Patent 11,995,394
  17. Trins: Towards multimodal language models that can read
    Ruiyi Zhang, Yanzhe Zhang, Jian Chen, and 4 more authors
    In CVPR, 2024
  18. Toffee: Efficient million-scale dataset construction for subject-driven text-to-image generation
    Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, and 5 more authors
    arXiv preprint arXiv:2406.09305, 2024
  19. DocSynthv2: A Practical Autoregressive Modeling for Document Generation
    Sanket Biswas, Rajiv Jain, Vlad I Morariu, and 5 more authors
    In CVPR, 2024
  20. Self-Cleaning: Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances
    Zhendong Chu, Ruiyi Zhang, Tong Yu, and 4 more authors
    In NAACL, 2024
  21. Category-aware active domain adaptation
    Wenxiao Xiao, Jiuxiang Gu, and Hongfu Liu
    In ICML, 2024
  22. Llava-read: Enhancing reading ability of multimodal language models
    Ruiyi Zhang, Yufan Zhou, Jian Chen, and 3 more authors
    arXiv preprint arXiv:2407.19185, 2024
  23. Commit: Coordinated instruction tuning for multimodal large language models
    Junda Wu, Xintong Li, Tong Yu, and 6 more authors
    arXiv preprint arXiv:2407.20454, 2024
  24. Mmr: Evaluating reading ability of large multimodal models
    Jian Chen, Ruiyi Zhang, Yufan Zhou, and 3 more authors
    arXiv preprint arXiv:2408.14594, 2024
  25. TextLap: Customizing Language Models for Text-to-Layout Planning
    Jian Chen, Ruiyi Zhang, Yufan Zhou, and 4 more authors
    In EMNLP, 2024
  26. Advancing Vision-Language Models with Adapter Ensemble Strategies
    Yue Bai, Handong Zhao, Zhe Lin, and 5 more authors
    In EMNLP, 2024
  27. TEXT-TO-IMAGE SYSTEM AND METHOD
    Ruiyi Zhang, Yufan Zhou, Tong Yu, and 4 more authors
    2024
    US Patent App. 18/318,921
  28. IDENTIFYING VISUAL TEXT USING VISION-LANGUAGE MODELS
    Jiuxiang GU, Ryan Rossi, Gaurav Verma, and 2 more authors
    Dec 2024
    US Patent App. 18/339,883
  29. EFFICIENT AUGMENTATION FOR MULTIMODAL MACHINE LEARNING
    Handong Zhao, Yue Bai, Zhe Lin, and 4 more authors
    Dec 2024
    US Patent App. 18/328,950
  30. VipAct: Visual-perception enhancement via specialized vlm agent collaboration and tool-use
    Zhehao Zhang, Ryan Rossi, Tong Yu, and 7 more authors
    arXiv preprint arXiv:2410.16400, Dec 2024
  31. A survey of small language models
    Chien Van Nguyen, Xuan Shen, Ryan Aponte, and 8 more authors
    arXiv preprint arXiv:2410.20011, Dec 2024
  32. XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
    Xiang Li, Kai Qiu, Hao Chen, and 5 more authors
    arXiv preprint arXiv:2412.01762, Dec 2024
  33. Personalized Multimodal Large Language Models: A Survey
    Junda Wu, Hanjia Lyu, Yu Xia, and 8 more authors
    arXiv preprint arXiv:2412.02142, Dec 2024
  34. SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner
    Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, and 3 more authors
    arXiv preprint arXiv:2412.10533, Dec 2024

2023

  1. Knowledge distillation for neural networks using multiple augmentation strategies
    Jason Wen Yong Kuen, Zhe Lin, and Jiuxiang Gu
    Dec 2023
    US Patent 11,610,393
  2. High-Quality Entity Segmentation
    Qi Lu, Jason Kuen, Shen Tiancheng, and 5 more authors
    In ICCV, Dec 2023
  3. LayerDoc: layer-wise extraction of spatial hierarchical structure in visually-rich documents
    Puneet Mathur, Rajiv Jain, Ashutosh Mehra, and 8 more authors
    In WACV, Dec 2023
  4. A critical analysis of out-of-distribution detection for document understanding
    Jiuxiang Gu, Yifei Ming, Yi Zhou, and 8 more authors
    In ACL, Dec 2023
  5. Learning the visualness of text using large vision-language models
    Gaurav Verma, Ryan A Rossi, Christopher Tensmeyer, and 2 more authors
    In ACL, Dec 2023
  6. Preserving user-entity differential privacy in natural language modeling
    Thi Kim Phung Lai, Tong Sun, Rajiv Jain, and 3 more authors
    Dec 2023
    US Patent 11,816,243
  7. Enhanced document visual question answering system via hierarchical attention
    Shijie Geng, Christopher Tensmeyer, Curtis Michael Wigington, and 1 more author
    Dec 2023
    US Patent App. 17/528,972
  8. AIMS: all-inclusive multi-level segmentation for anything
    Lu Qi, Jason Kuen, Weidong Guo, and 5 more authors
    In NeurIPS, Dec 2023
  9. Docedit: language-guided document editing
    Puneet Mathur, Rajiv Jain, Jiuxiang Gu, and 3 more authors
    In AAAI, Dec 2023
  10. Llavar: Enhanced visual instruction tuning for text-rich image understanding
    Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, and 4 more authors
    In NeurIPS Workshop, Dec 2023
  11. Facilitating identification of fillable regions in a form
    Ashutosh Mehra, Christopher Alan Tensmeyer, Vlad Ion Morariu, and 1 more author
    Jul 2023
    US Patent App. 17/577,605
  12. Open vocabulary instance segmentation
    Jason Wen Yong Kuen, Dat Ba Huynh, Zhe Lin, and 1 more author
    Jul 2023
    US Patent App. 17/650,437
  13. Adaptive sparse attention pattern
    Jiuxiang Gu, Zihan Wang, Jason Wen Yong Kuen, and 5 more authors
    Jul 2023
    US Patent App. 17/740,497
  14. Systems and methods for product retrieval
    Handong Zhao, Haoyu Ma, Zhe Lin, and 5 more authors
    Jul 2023
    US Patent App. 17/664,079
  15. A critical analysis of document out-of-distribution detection
    Jiuxiang Gu, Yifei Ming, Yi Zhou, and 8 more authors
    In EMNLP, Jul 2023
  16. Multimodal extraction across multiple granularities
    Vlad Ion Morariu, Tong Sun, Nikolaos Barmpalios, and 4 more authors
    Jul 2023
    US Patent App. 17/746,779
  17. Open vocabulary instance segmentation with noise estimation and robust student
    Jason Wen Yong Kuen, Dat Ba Huynh, Zhe Lin, and 1 more author
    Jul 2023
    US Patent App. 17/806,097
  18. Reflection-tuning: Data recycling improves llm instruction-tuning
    Ming Li, Lichang Chen, Jiuhai Chen, and 4 more authors
    In NeurIPS Workshop, Jul 2023

2022

  1. Generating scene graphs from digital images using external knowledge and image reconstruction
    Handong Zhao, Zhe Lin, Sheng Li, and 2 more authors
    Jul 2022
    US Patent 11,373,390
  2. UNISON: Unpaired cross-lingual image captioning
    Jiahui Gao, Yi Zhou, LH Philip, and 2 more authors
    In AAAI, Jul 2022
  3. Open world entity segmentation
    Lu Qi, Jason Kuen, Yi Wang, and 5 more authors
    T-PAMI, Jul 2022
  4. Open-vocabulary instance segmentation via robust cross-modal pseudo-labeling
    Dat Huynh, Jason Kuen, Zhe Lin, and 2 more authors
    In CVPR, Jul 2022
  5. Towards language-free training for text-to-image generation
    Yufan Zhou, Ruiyi Zhang, Changyou Chen, and 6 more authors
    In CVPR, Jul 2022
  6. Ca-ssl: Class-agnostic semi-supervised learning for detection and segmentation
    Lu Qi, Jason Kuen, Zhe Lin, and 7 more authors
    In ECCV, Jul 2022
  7. User-entity differential privacy in learning natural language models
    Phung Lai, NhatHai Phan, Tong Sun, and 4 more authors
    In Big Data, Jul 2022
  8. Bit-aware randomized response for local differential privacy in federated learning
    Phung Lai, Hai Phan, Li Xiong, and 7 more authors
    Jul 2022
  9. Tigan: Text-based interactive image generation and manipulation
    Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, and 5 more authors
    In AAAI, Jul 2022
  10. Fedkc: Federated knowledge composition for multilingual natural language understanding
    Haoyu Wang, Handong Zhao, Yaqing Wang, and 3 more authors
    In ACM Web, Jul 2022
  11. Learning adaptive axis attentions in fine-tuning: Beyond fixed sparse attention patterns
    Zihan Wang, Jiuxiang Gu, Jason Kuen, and 6 more authors
    In ACL, Jul 2022
  12. Self-supervised visual-relationship probing
    Jiuxiang Gu, Vlad Ion Morariu, Tong Sun, and 2 more authors
    Jul 2022
    US Patent App. 17/093,185
  13. Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval
    Haoyu Ma, Handong Zhao, Zhe Lin, and 6 more authors
    In CVPR, Jul 2022
  14. Doctime: A document-level temporal dependency graph parser
    Puneet Mathur, Vlad Morariu, Verena Kaynig-Fittkau, and 6 more authors
    In NNACL, Jul 2022
  15. Meta spatio-temporal debiasing for video scene graph generation
    Li Xu, Haoxuan Qu, Jason Kuen, and 2 more authors
    In ECCV, Jul 2022
  16. DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis.
    Puneet Mathur, Franck Dernoncourt, Quan Hung Tran, and 5 more authors
    In INTERSPEECH, Jul 2022
  17. Generating scene graphs from digital images using external knowledge and image reconstruction
    Handong Zhao, Zhe Lin, Sheng Li, and 2 more authors
    Jul 2022
    US Patent App. 17/805,289
  18. Improving the reliability for confidence estimation
    Haoxuan Qu, Yanchao Li, Lin Geng Foo, and 3 more authors
    In ECCV, Jul 2022
  19. Delving into out-of-distribution detection with vision-language representations
    Yifei Ming, Ziyang Cai, Jiuxiang Gu, and 3 more authors
    NeurIPS, Jul 2022
  20. MGDoc: Pre-training with multi-granular hierarchy for document image understanding
    Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, and 5 more authors
    In EMNLP, Jul 2022

2021

  1. Towards interpreting and mitigating shortcut learning behavior of NLU models
    Mengnan Du, Varun Manjunatha, Rajiv Jain, and 5 more authors
    In NAACL, Jul 2021
  2. Multi-scale aligned distillation for low-resolution detection
    Lu Qi, Jason Kuen, Jiuxiang Gu, and 5 more authors
    In CVPR, Jul 2021
  3. Selfdoc: Self-supervised document representation learning
    Peizhao Li, Jiuxiang Gu, Jason Kuen, and 5 more authors
    In CVPR, Jul 2021
  4. Exploiting semantic embedding and visual feature for facial action unit detection
    Huiyuan Yang, Lijun Yin, Yi Zhou, and 1 more author
    In CVPR, Jul 2021
  5. Unidoc: Unified pretraining framework for document understanding
    Jiuxiang Gu, Jason Kuen, Vlad I Morariu, and 5 more authors
    In NeurIPS, Jul 2021

2020

  1. Resilient load restoration in microgrids considering mobile energy storage fleets: A deep reinforcement learning approach
    Shuhan Yao, Jiuxiang Gu, Huajun Zhang, and 3 more authors
    In PESGM, Jul 2020
    Best Paper
  2. Finding it at another side: A viewpoint-adapted matching encoder for change captioning
    Xiangxi Shi, Xu Yang, Jiuxiang Gu, and 2 more authors
    In ECCV, Jul 2020
  3. Self-supervised relationship probing
    Jiuxiang Gu, Jason Kuen, Shafiq Joty, and 4 more authors
    In , Jul 2020

2019

  1. Unpaired Image Captioning via Scene Graph Alignments
    Jiuxiang Gu, Shafiq Joty, Jianfei Cai, and 3 more authors
    In ICCV, Jul 2019
  2. Scene graph generation with external knowledge and image reconstruction
    Jiuxiang Gu, Handong Zhao, Zhe Lin, and 3 more authors
    In CVPR, Jul 2019
  3. Watch It Twice: Video Captioning with a Refocused Video Encoder
    Xiangxi Shi, Jianfei Cai, Shafiq Joty, and 1 more author
    In ACM MM, Jul 2019

2018

  1. Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
    Jiuxiang Gu, Jianfei Cai, Gang Wang, and 1 more author
    In AAAI, Jul 2018
    Oral
  2. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
    Jiuxiang Gu, Jianfei Cai, Shafiq Joty, and 2 more authors
    In CVPR, Jul 2018
    Spotlight
  3. Recent advances in convolutional neural networks
    Jiuxiang Gu, Zhenhua Wang, Jason Kuen, and 8 more authors
    Pattern Recognition, Jul 2018
  4. Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction
    Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, and 1 more author
    Neurocomputing, Jul 2018
  5. Unpaired image captioning by language pivoting
    Jiuxiang Gu, Shafiq Joty, Jianfei Cai, and 1 more author
    In ECCV, Jul 2018
  6. NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text.
    Muhammet Bastan, Xiangxi Shi, Jiuxiang Gu, and 4 more authors
    In TRECVID, Jul 2018

2017

  1. An empirical study of language cnn for image captioning
    Jiuxiang Gu, Gang Wang, Jianfei Cai, and 1 more author
    In ICCV, Jul 2017

2014

  1. HJ-1C real-time image processing technology based on GPU
    GU Gu, Renzhong Yang, Lu Shi, and 1 more author
    Journal of University of Chinese Academy of Sciences, Jul 2014

2013

  1. Research of RS Decoding Technology Based on GPU
    Jiuxiang Gu, Ren-zhong YANG, and Hong-wei WEI
    Microelectronics & Computer, Jul 2013