Jiuxiang Gu | Jiuxiang Gu

Latest

Differential Privacy Mechanisms in Neural Tangent Kernel Regression (WACV 2025)
ImageFolder: Autoregressive Image Generation with Folded Tokens (ICLR 2025)
A Multi-LLM Debiasing Framework (arXiv 2024)
ADOPD: A Large-Scale Document Page Decomposition Dataset (The Twelfth International Conference on Learning Representations 2024)
ARTIST: Improving the Generation of Text-rich Images by Disentanglement (arXiv 2024)
Category-Aware Active Domain Adaptation (Forty-first International Conference on Machine Learning 2024)
Commit: Coordinated instruction tuning for multimodal large language models (arXiv 2024)
Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers (arXiv 2024)
Customization assistant for text-to-image generation (Proceedings of the IEEE onference on Computer Vision and Pattern Recognition 2024)
DocScript: Document-level Script Event Prediction (Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING) 2024)
DocSynthv2: A Practical Autoregressive Modeling for Document Generation (arXiv 2024)
Exploring the frontiers of softmax: Provable optimization, applications in diffusion model, and beyond (arXiv 2024)
Fast John Ellipsoid Computation with Differential Privacy Optimization (arXiv 2024)
Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic (arXiv 2024)
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models (arXiv 2024)
Lrm: Large reconstruction model for single image to 3d (The Twelfth International Conference on Learning Representations (ICLR) 2024)
MMR: Evaluating Reading Ability of Large Multimodal Models (arXiv 2024)
SOHES: Self-supervised Open-world Hierarchical Entity Segmentation (The Twelfth International Conference on Learning Representations 2024)
Selective reflection-tuning: Student-selected data recycling for llm instruction-tuning (Findings of the Association for Computational Linguistics (ACL) 2024)
Self-Cleaning: Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances (Findings of the Association for Computational Linguistics (NAACL) 2024)
TRINS: Towards Multimodal Language Models that Can Read (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2024)
Tensor attention training: Provably efficient learning of higher-order transformers (arXiv 2024)
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation (arXiv 2024)
Toward Infinite-Long Prefix in Transformer (arXiv 2024)
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective (arXiv 2024)
A Critical Analysis of Document Out-of-Distribution Detection (Findings of the Association for Computational Linguistics (EMNLP) 2023)
Aims: All-inclusive multi-level segmentation for anything (Advances in Neural Information Processing Systems (NeurIPS) 2023)
DocEdit: language-guided document editing (Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2023)
LayerDoc: layer-wise extraction of spatial hierarchical structure in visually-rich documents (Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV) 2023)
Llavar: Enhanced visual instruction tuning for text-rich image understanding (arXiv 2023)
Reflection-tuning: Data recycling improves llm instruction-tuning (Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023)
Bit-aware randomized response for local differential privacy in federated learning ( 2022)
Ca-ssl: Class-agnostic semi-supervised learning for detection and segmentation (European Conference on Computer Vision (ECCV) 2022)
Delving into out-of-distribution detection with vision-language representations (Advances in neural information processing systems (NeurIPS) 2022)
DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis. (INTERSPEECH 2022)
Doctime: A document-level temporal dependency graph parser (Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NNACL) 2022)
Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2022)
FedKC: Federated knowledge composition for multilingual natural language understanding (Proceedings of the ACM Web Conference (ACM Web)) 2022)
Improving the reliability for confidence estimation (European Conference on Computer Vision (ECCV) 2022)
Learning adaptive axis attentions in fine-tuning: Beyond fixed sparse attention patterns (Findings of the Association for Computational Linguistics (ACL) 2022)
Learning the Visualness of Text Using Large Vision-Language Models (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022)
MGDoc: Pre-training with multi-granular hierarchy for document image understanding (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022)
Meta spatio-temporal debiasing for video scene graph generation (European Conference on Computer Vision (ECCV) 2022)
Open world entity segmentation (IEEE Transactions on Pattern Analysis and Machine Intelligence 2022)
Open-vocabulary instance segmentation via robust cross-modal pseudo-labeling (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2022)
Tigan: Text-based interactive image generation and manipulation (Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2022)
Towards language-free training for text-to-image generation (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2022)
UNISON: Unpaired cross-lingual image captioning (Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2022)
User-Entity Differential Privacy in Learning Natural Language Models (IEEE International Conference on Big Data (Big Data) 2022)
Exploiting semantic embedding and visual feature for facial action unit detection (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021)
Multi-scale aligned distillation for low-resolution detection (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021)
Selfdoc: Self-supervised document representation learning (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021)
Towards interpreting and mitigating shortcut learning behavior of NLU models (proceedings of the north american chapter of the association for computational linguistics 2021)
Unidoc: Unified pretraining framework for document understanding (Advances in Neural Information Processing Systems (NeurIPS) 2021)
Finding it at another side: A viewpoint-adapted matching encoder for change captioning (Proceedings of the European Conference on Computer Vision 2020)
Resilient load restoration in microgrids considering mobile energy storage fleets: A deep reinforcement learning approach (2020 IEEE Power & Energy Society General Meeting (PESGM) 2020)
Self-supervised relationship probing (Advances in Neural Information Processing Systems (NeurIPS) 2020)
Scene graph generation with external knowledge and image reconstruction (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2019)
Unpaired Image Captioning via Scene Graph Alignments (Proceedings of the IEEE International Conference on Computer Vision 2019)
Watch It Twice: Video Captioning with a Refocused Video Encoder (Proceedings of the ACM International Conference on Multimedia (MM) 2019)
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018)
NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text. (TRECVID 2018)
Recent advances in convolutional neural networks (Pattern Recognition 2018)
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning (Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence 2018)
Unpaired image captioning by language pivoting (Proceedings of the European Conference on Computer Vision 2018)
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction (Neurocomputing 2018)
An empirical study of language cnn for image captioning (Proceedings of the IEEE International Conference on Computer Vision 2017)