Jiuxiang Gu
Home
News
Publications
CV
Jiuxiang Gu
Latest
A Multi-LLM Debiasing Framework (arXiv 2024)
ADOPD: A Large-Scale Document Page Decomposition Dataset (The Twelfth International Conference on Learning Representations 2024)
ARTIST: Improving the Generation of Text-rich Images by Disentanglement (arXiv 2024)
Category-Aware Active Domain Adaptation (Forty-first International Conference on Machine Learning 2024)
Commit: Coordinated instruction tuning for multimodal large language models (arXiv 2024)
Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers (arXiv 2024)
Customization assistant for text-to-image generation (Proceedings of the IEEE onference on Computer Vision and Pattern Recognition 2024)
DocScript: Document-level Script Event Prediction (Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING) 2024)
DocSynthv2: A Practical Autoregressive Modeling for Document Generation (arXiv 2024)
Exploring the frontiers of softmax: Provable optimization, applications in diffusion model, and beyond (arXiv 2024)
Fast John Ellipsoid Computation with Differential Privacy Optimization (arXiv 2024)
Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic (arXiv 2024)
ImageFolder: Autoregressive Image Generation with Folded Tokens (arXiv 2024)
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models (arXiv 2024)
Lrm: Large reconstruction model for single image to 3d (The Twelfth International Conference on Learning Representations (ICLR) 2024)
MMR: Evaluating Reading Ability of Large Multimodal Models (arXiv 2024)
SOHES: Self-supervised Open-world Hierarchical Entity Segmentation (The Twelfth International Conference on Learning Representations 2024)
Selective reflection-tuning: Student-selected data recycling for llm instruction-tuning (Findings of the Association for Computational Linguistics (ACL) 2024)
Self-Cleaning: Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances (Findings of the Association for Computational Linguistics (NAACL) 2024)
TRINS: Towards Multimodal Language Models that Can Read (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024)
Tensor attention training: Provably efficient learning of higher-order transformers (arXiv 2024)
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation (arXiv 2024)
Toward Infinite-Long Prefix in Transformer (arXiv 2024)
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective (arXiv 2024)
A Critical Analysis of Document Out-of-Distribution Detection (Findings of the Association for Computational Linguistics (EMNLP) 2023)
Aims: All-inclusive multi-level segmentation for anything (Advances in Neural Information Processing Systems (NeurIPS) 2023)
DocEdit: language-guided document editing (Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2023)
LayerDoc: layer-wise extraction of spatial hierarchical structure in visually-rich documents (Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV) 2023)
Llavar: Enhanced visual instruction tuning for text-rich image understanding (arXiv 2023)
Reflection-tuning: Data recycling improves llm instruction-tuning (Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023)
Bit-aware randomized response for local differential privacy in federated learning ( 2022)
Ca-ssl: Class-agnostic semi-supervised learning for detection and segmentation (European Conference on Computer Vision (ECCV 2022)
Delving into out-of-distribution detection with vision-language representations (Advances in neural information processing systems (NeurIPS) 2022)
DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis. (INTERSPEECH 2022)
Doctime: A document-level temporal dependency graph parser (Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NNACL) 2022)
Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022)
FedKC: Federated knowledge composition for multilingual natural language understanding (Proceedings of the ACM Web Conference (ACM Web)) 2022)
Improving the reliability for confidence estimation (European Conference on Computer Vision (ECCV) 2022)
Learning adaptive axis attentions in fine-tuning: Beyond fixed sparse attention patterns (Findings of the Association for Computational Linguistics (ACL) 2022)
Learning the Visualness of Text Using Large Vision-Language Models (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022)
MGDoc: Pre-training with multi-granular hierarchy for document image understanding (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022)
Meta spatio-temporal debiasing for video scene graph generation (European Conference on Computer Vision (ECCV 2022)
Open world entity segmentation (IEEE Transactions on Pattern Analysis and Machine Intelligence 2022)
Open-vocabulary instance segmentation via robust cross-modal pseudo-labeling (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022)
Tigan: Text-based interactive image generation and manipulation (Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2022)
Towards language-free training for text-to-image generation (Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2022)
UNISON: Unpaired cross-lingual image captioning (Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2022)
User-Entity Differential Privacy in Learning Natural Language Models (IEEE International Conference on Big Data (Big Data) 2022)
Exploiting semantic embedding and visual feature for facial action unit detection (Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2021)
Multi-scale aligned distillation for low-resolution detection (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021)
Selfdoc: Self-supervised document representation learning (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021)
Towards interpreting and mitigating shortcut learning behavior of NLU models (Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NNACL) 2021)
Unidoc: Unified pretraining framework for document understanding (Advances in Neural Information Processing Systems (NeurIPS) 2021)
Finding it at another side: A viewpoint-adapted matching encoder for change captioning (Proceedings of the European Conference on Computer Vision (ECCV) 2020)
Resilient load restoration in microgrids considering mobile energy storage fleets: A deep reinforcement learning approach (2020 IEEE Power & Energy Society General Meeting (PESGM) 2020)
Self-supervised relationship probing (Advances in Neural Information Processing Systems (NeurIPS) 2020)
Scene graph generation with external knowledge and image reconstruction (Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 2019)
Unpaired Image Captioning via Scene Graph Alignments (Proceedings of the IEEE International Conference on Computer Vision (ICCV) 2019)
Watch It Twice: Video Captioning with a Refocused Video Encoder (Proceedings of the ACM International Conference on Multimedia (MM) 2019)
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018)
NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text. (TRECVID 2018)
Recent advances in convolutional neural networks (Pattern Recognition 2018)
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning (Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI). 2018)
Unpaired image captioning by language pivoting (Proceedings of the European Conference on Computer Vision (ECCV) 2018)
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction (Neurocomputing 2018)
An empirical study of language cnn for image captioning (Proceedings of the IEEE International Conference on Computer Vision (ICCV) 2017)
Cite
×